Data Science is considered as the new arena, which is the most emerging technology that can easily enhance the Organizational growth. Data Administration and Management is being the biggest challenges that can face real time challenges in the explosion of happening these days.
What is Data Science?
Data Science is the software library framework which allows for the distributing processing large sets of data across a cluster of computers by using simple programming tools. It can easily scale up from a single server to thousands of machines in an easy manner.
Prerequisites and Requirements of Data Scientist
Duration
Course Content:
What is Data Science?
Data Science is the software library framework which allows for the distributing processing large sets of data across a cluster of computers by using simple programming tools. It can easily scale up from a single server to thousands of machines in an easy manner.
Prerequisites and Requirements of Data Scientist
- There are no pre-requisites. No prior knowledge of Statistics, the language of R, Python or analytic techniques is required.
- This course covers from basic to advanced Statistics and Machine Learning Techniques
Duration
- 40 to 50 Hours
Course Content:
Introduction to Data Science
• What is Data Science?
• Role of Data Science
• Scope of Data Science
1. Descriptive and Inferential Statistics
Samples and Populations
• Sample Statistics
• Estimations of Population Parameters
• Random and Non-random Sampling
• Sampling Distributions
• The Central limit Theorem
• Degree of Freedom
Percentiles and Quartiles
Measures of Central Tendency
• Mean
• Median
• Mode
Measures of Variability/Dispersions
• Range
• IQR
• Variance
• Standard Deviation
Distributions
• Normal Distributions
• Binomial Distribution
Probability Distribution
• Events, Sample Space and Probabilities
• Conditional Probabilities
• Independence of Events
• Bayes’ Theorem
Random Variable
Confidence Intervals
Hypothesis Testing
• Null Hypothesis
• The Significance Level
• p-value
• Type I and Type II Errors
Inferential Test Metrics
• t test
• f test
• Z test
• Chi square test
• Student test
The Comparison of Two Populations
Analysis of Variance
• ANOVA Computations
• Two-way ANOVA
Similarity Metrics
• Euclidean Distance
• Jaccard Distance
• Cosine Similarity
Graphical Representation and summaries
2. Data Exploration
Variable Identification
Uni-variate Analysis
Bi-variate Analysis
Missing Values Treatment
• Imputation
• Deletion
• Prediction
Outlier Detection
• Deletion
• Binning and Transformation
Feature Engineering
• Variable transformation
• Variable / Feature creation
Dimensionality Reduction
• Missing Values
• Low Variance
• High Collinearity
• PCA
• Factor Analysis
Principal Component Analysis
Data Summaries Using Stats and plots
Covariance, Correlation, and Distances
Correlation vs Causation
3. Machine Learning: Introduction and Concepts
Differentiating algorithmic and model based frameworks
Supervised Learning with Regression and Classification
• Model Validation Approaches
• Training Set
• Validation Set
• Test Set
• Cross-Validation
• Regression Algorithms
• Linear Regression
• Ordinary Least Squares
• Ridge Regression
• Lasso Regression
Unsupervised Learning
• Clustering
• Hierarchical (Agglomerative) Clustering
• Non-Hierarchical Clustering: The k-Means Algorithm
Recommender Engines:
• Collaborative Filtering Recommenders
• Content Based Recommenders
4. R-Analytical Tool (Data Mining / Machine Learning)
Basic Data Types
R Data Structures
• Vectors
• Matrix
• Data Frames
• List
R Functions
Predictive Modeling Project based on R
Classification Model Attention:ing Project based on R
Clustering Project based on R
Association Mining Project based on R
R Visualization Packages
Machine Learning Packages in R
5. Python Scientific Libraries for Machine Learning
Scikit-Learn
Numpy
Scipy
Pandas
Matplotlib
• Rmsc
• R/Square
• K Nearest Neighbors Regression & Classification
• Classification
• Logistic Regression
• Naive Bayes
• Classifier Threshold And Interpretation
• Confusion Matrix-Error Measurement
• Roc Curve
• Accuracy, Precision, Recall
• Measuring Sensitivity And Specificity
• Regression And Classification Trees
• Decision Trees
• Recursive Portioning
• Impurity Measures (Entropy And Gini Index)
• Pruning The Tree
Support Vector Machines
Ensemble Methods
• Bagging (Parallel Ensemble) – Random Forest
• Boosting (Sequential Ensemble) – Gradient Boosting
Neural Networks
• Structure Of Neural Network
• Hidden Layers And Neurons
• Weights And Transfer Function
Deep Learning
Forecasting (Time-Series Modeling )
• Trend And Seasonal Analysis
• Different Smoothing Techniques
• Arima Modeling
6. Spark Mllib (Scalable Machine Learning)
Spark Vs Hadoop
Spark Architecture
Distributed Computing Advantages
Rdd Concept
Spark Mllib: Data Types, Algorithms, And Utilities
Keywords: Datascientist Training Course, Data Scientist Online training, DataScience Training in Hyderabad