Course outline for Data Science(AI and ML) With Python
Goal:
The goal of this course is to examine large amounts of data to uncover hidden patterns, correlations and other insights. This help in preparing machine learning model using AI fundamentals
Audience:
This course is designed for any one willing to make career in Data Analytics and Machine Learning .
Pre-requisites:
Any Graduate or Post-Graduate having affinity with Data, Information, Knowledge and Wisdom
Basics of Python
Duration:
60 hours
Course Structure
1. Python Data Science Overview
- What is Data Science
- What Is Data Science and Machine Learning?
- Introduction of Python Data Science Tools
- Setting up environment for this course
- Python Data Science Packages to be used
2. Fundamental of Statistics
- Overview
- Basic Terminology of Statistics
- Variables
- Data
- Statistics
- Dispersion
- Scattering
- Observation
- Time Series Data
- Population
- Sample
- Variation
- Shape
- Data Collections
- Descriptive Statistics
- Data Distribution
- Confidence Interval
3. Statistical Inferences and Relationship Between Variables
- Hypothesis Testing
- Correlation Theory
- Linear regression theory
- Polynomial Regression
- Logistic Regression
4. Introduction to NumPy (Numerical Python)
- Numpy: Introduction
- Create Numpy Arrays
- Numpy Operations
- Matrix Airthmetic and Linear Systems
- Numpy for Basic Matrix Airthmetic
- Broadcasting with Numpy
- Solve Equation with Numpy
- Statistical Operations
5. Pandas
- Introduction to Data Structures
- Reading Data
- CSV Data
- Excel Data
- JSON Data
- HTML Data
- Data pre-Processing/Wrangling/Cleaning
- Removing NAs /No Values from data
- Basic Data Handling: Starting with conditional data section
- Drop Column/Rows
- Subset and Index Data
- Basic Data Grouping Based on Qualitative Attributes
- Cross tabulation
- Reshaping
- Pivoting
- Rank and sort Data
- Concatenate
- Merging and Joining frames
4. Data Visualization (plotLib)
- Introduction
- Histograms
- Box Plots
- Scatter Plots
- Bar Plot
- Pie Chart
- Line Chart
7. Machine Learning: Overview
- Machine Learning Languages, Types, and Examples
- Machine Learning vs Statistical Modelling
- Supervised vs Unsupervised Learning
- Python Libraries : skLearn, TensorFlow
8. Supervised Learnings
- K Nearest neighbours
- Decision Trees
- Using Logistic Regression as Classification Model
- RF-Classification
- SVM Linear Classification
- Knn Regression
- Gradient Boosting
9. Unsupervised Learning
- K-Means Clustering plus Advantages & Disadvantages
- Hierarchical Clustering plus Advantages & Disadvantages
- Measuring the Distances Between Clusters - Single Linkage Clustering
- Measuring the Distances Between Clusters - Algorithms for Hierarchy Clustering
- Density-Based Clustering
10. Time Series Forecasting
- Time series
- Estimating and Eliminating the Deterministic Components if they are present in the Model.
- Estimating and Eliminating Seasonality if it is present in the Model
- Modeling the Remainder using Auto Regressive Moving Average (ARMA) Models
- Identify ‘order’ of the ARMA model
- ‘Forecast’ or Predict for Future Values
11. Support Vector Machine(SVM)
- Linear Classifiers
- Margin of SVM's
- SVM optimization
- SVM for Data which is not linear separable
- Learning non-linear patterns
- Kernel Trick
- SVM Parameter Tuning
- Linear SVM using Python
12. Other models
- Market Basket Analysis
- Lasso Regression
References:
* Numpy:
https://www.datacamp.com/courses/intro-to-python-for-data-science
https://www.tutorialspoint.com/numpy/numpy_indexing_and_slicing.htm
* Pandas
https://pythonprogramming.net/comparison-operators-data-analysis-python-pandas-
tutorial/
https://pythonprogramming.net/data-analysis-tutorials/
* Machine Learning
http://www.cs.cmu.edu/~ninamf/courses/601sp15/lectures.shtml
* SciPy (Scientific Python)
* plotLib (Plotting Library)
http://www.kaggle.com