Get Your Study

Automating Data Exploration with R

4.74

Updated on Dec 28, 2021

Course overview

Provider: Udemy
Course type: Paid course
Level: All Levels
Duration: 4 hours
Lessons: 21 lessons
Certificate: Available on completion
Course author: Manuel Amunategui; Build a pipeline to automate the processing of raw data for discovery and modeling
Know the main steps to prepare data for modeling
Know how to handle the different data types in R
Understand data imputation
Treat categorical data properly with binarization (making dummy columns)
Apply feature engineering to dates, integers and real numbers
Apply variable selection, correlation and significance tests
Model and measure prepared data using both supervised and unsupervised modeling

Learn more

Description

Build the tools needed to quickly turn data into model-ready data sets

As data scientists and analysts we face constant repetitive task when approaching new data sets. This class aims at automating a lot of these tasks in order to get to the actual analysis as quickly as possible. Of course, there will always be exceptions to the rule, some manual work and customization will be required. But overall a large swath of that work can be automated by building a smart pipeline. This is what we’ll do here. This is especially important in the era of big data where handling variables by hand isn’t always possible.

It is also a great learning strategy to think in terms of a processing pipeline and to understand, design and build each stage as separate and independent units.

Go to course

Similar courses

R Programming: Advanced Analytics In R For Data Science

4.66

Udemy
6 hours
53 lessons
Certificate

R Programming A-Z™: R For Data Science With Real Exercises!

4.64

Udemy
11 hours
82 lessons
Certificate

R Programming for Statistics and Data Science 2022

4.56

Udemy
7 hours
126 lessons
Certificate

English language
Recommended provider
Certificate available

Go to course