# Data Science with R-Programming

Free

#### Data Science – Machine learning using R:

The R programming language is an important tool for development in the numeric analysis and machine learning spaces. With machines becoming more important as data generators, the popularity of the language can only be expected to grow. R’s advantages include its package ecosystem. The vastness of package ecosystem is definitely one of R’s strongest qualities — if a statistical technique exists, odds are there’s already an R package out there for it.

There’s a lot of functionality that’s built in that’s built for statisticians. R is extensible and offers rich functionality for developers to build their own tools and methods for analyzing data.

#### TARGET AUDIENCE

- Data Scientist
- Data Analyst
- Fresher from Maths, stats and engineering
- Statistian
- Data Science professionals

#### Course Content:

#### MODULE 1 : INTRODUCTION TO R

- R language for statistical programming the various features of R Introduction to R Studio The statistical packages

Familiarity with different data types and functions Learning to deploy them in various scenarios Use SQL to apply ‘join’ function Components of R Studio like code editor Visualization and debugging tools, learn about R-bind.

#### MODULE 2 : R-Packages

- R Functions
- Code compilation and data in well-defined format called R-Packages
- R-Package structure
- Package metadata and testing
- CRAN (Comprehensive R Archive Network)
- Vector creation and variables values assignment.

#### MODULE 3 : Sorting Dataframe

- R functionality
- Rep Function
- Generating Repeats
- Sorting and generating Factor Levels
- Transpose and Stack Function.

#### MODULE 4: module 4 : matrix and vector

- Introduction to matrix and vector in R
- Understanding the various functions like Merge, Strsplit, Matrix manipulation, rowSums, rowMeans, colMeans, colSums, sequencing, repetition, indexing and other functions.

#### MODULE 5 : Reading data from external files

- Understanding subscripts in plots in R
- How to obtain parts of vectors
- Using subscripts with arrays, as logical variables, with lists, understanding how to read data from external files.

#### MODULE 6 : Generating plots

- Generate plot in R
- Graphs
- Bar Plots
- Line Plots
- Histogram
- Components of Pie Chart.

#### MODULE 7 : Introduction to Data Science and Statistical Analytics ?

- Introduction to Data Science
- Use cases
- Need of Business Analytics
- Data Science Life Cycle
- Different tools available for Data Science

#### MODULE 8 : Introduction to Statistics

- Introduction to matrix and vector in R
- Understanding the various functions like Merge, Strsplit, Matrix manipulation, rowSums, rowMeans, colMeans, colSums, sequencing, repetition, indexing and other functions.

#### MODULE 9 : Introduction to R

- Understanding Analysis of Variance (ANOVA) statistical technique
- Working with Pie Charts
- Histograms
- Deploying ANOVA with R
- One way ANOVA
- Two way ANOVA.

#### MODULE 10 :K-means Clustering

- K-Means Clustering for Cluster & Affinity Analysis
- Cluster Algorithm
- Cohesive subset of items
- Solving clustering issues
- Working with large datasets
- Association rule mining affinity analysis for data mining and analysis
- Learning co-occurrence relationships.

#### MODULE 11 : Association Analysis and Recommendation engine ?

- Market Basket Analysis (MBA)
- Association Rules
- Apriori Algorithm for MBA
- Introduction of Recommendation Engine
- Types of Recommendation – User-Based and Item-Based
- Recommendation Use-case

#### MODULE 12 :Regression in R

- Understanding what is Simple Linear Regression
- Various equations of Line
- Slope
- Y-Intercept Regression Line
- Deploying analysis using Regression
- The least square criterion
- Interpreting the results
- Standard error to estimate and measure of variation.

#### MODULE 13 : Supervised Learning

- Linear Regression
- Bivariate Regression
- Multiple Regression Analysis
- Correlation( Positive, negative and neutral)
- Industrial Case Study
- Machine Learning Use-Cases
- Machine Learning Process Flow
- Machine Learning Categories

#### MODULE 14 : Analyzing Relationship with Regression

- Scatter Plots
- Two variable Relationship
- Simple Linear Regression analysis
- Line of best fit

#### MODULE 15 : Advance Regression

- Deep understanding of the measure of variation
- The concept of coefficient of determination
- F-Test
- The test statistic with an F-distribution
- Advanced regression in R
- Prediction linear regression.

#### MODULE 16 : Logistic Regression

- Logistic Regression Mean
- Logistic Regression in R

#### MODULE 17 : Advance Logistic Regression

- Advanced logistic regression
- Understanding how to do prediction using logistic regression
- Ensuring the model is accurate
- Understanding sensitivity and specificity
- Confusion matrix
- What is ROC
- A graphical plot illustrating binary classifier system
- ROC curve in R for determining sensitivity/specificity trade-offs for a binary classifier.

#### MODULE 18 : Decision Trees

- What is Classification and its use cases?
- What is Decision Tree?
- Algorithm for Decision Tree Induction
- Creating a Perfect Decision Tree
- Confusion Matrix

#### MODULE 19 : Random Forest ?

- Random Forest
- What is Naive Bayes?

#### MODULE 20 : Receiver Operating Characteristic (ROC)

- Detailed understanding of ROC
- Area under ROC Curve
- Converting the variable
- Data set partitioning
- Understanding how to check for multicollinearity
- How two or more variables are highly correlated
- Building of model
- Advanced data set partitioning
- Interpreting of the output
- Predicting the output
- Detailed confusion matrix
- Deploying the Hosmer-Lemeshow test for checking whether the observed event rates match the expected event rates.

#### MODULE 21 : Supervised Learning

- Data analysis with R
- Understanding the WALD test
- MC Fadden’s pseudo R-squared
- The significance of the area under ROC Curve
- Kolmogorov Smirnov Chart which is a non-parametric test of one dimensional probability distribution.

#### MODULE 22 : Sentiment Analysis

- Introduction to Text Mining
- Introduction to Sentiment
- Setting up API bridge, between R and Tweeter Account
- Extracting Tweet from Tweeter Acc, Scoring the tweet

#### MODULE 23 : Time Series

- What is Time Series data?
- Time Series variables
- Different components of Time Series data
- Visualize the data to identify Time Series Components
- Implement ARIMA model for forecasting,
- Exponential Smoothing models
- Identifying different time series scenario based on which different Exponential Smoothing model can be applied
- Implement respective ETS model for forecasting

#### MODULE 24 : Logistic Regression

- Connecting to various databases from the R environment
- Deploying the ODBC tables for reading the data
- Visualization of the performance of the algorithm using Confusion Matrix.

### Course Features

- Lectures 0
- Quizzes 0
- Duration 50 hours
- Skill level All levels
- Language English
- Students 0
- Assessments Yes

Curriculum is empty