Welcome to the most comprehensive course on feature engineering for machine learning available online.
In this course, you will learn how to create end-to-end machine learning pipelines, packed with feature engineering preprocessing steps that you can easily deploy to production.
What is feature engineering?
Feature engineering consists in using domain knowledge and statistical methods to create features that make machine learning algorithms work effectively.
Feature engineering is key in applied machine learning. Raw data is almost never suitable to train machine learning models. In fact, data scientists devote a lot of effort to data analysis, data engineering and preprocessing, and feature extraction, to create the best features to train predictive models.
What will you learn in this online course?
In this course, you will learn about missing data imputation, encoding of categorical features, numerical variable transformation, discretization, and how to create new features from your dataset.
Specifically, you will learn:
- How to impute missing values
- How to encode categorical features
- How to transform and scale numerical variables
- How to perform discretization
- How to remove outliers
- How to perform feature extraction from date and time
- How to create new features from existing ones
While most online courses will teach you the very basics of feature engineering, like imputing variables with the mean or transforming categorical features using one hot encoding, this course will teach you all of that, and much, much more.
You will first learn the most popular techniques for variable engineering, like mean and median imputation, one-hot encoding, transformation with logarithm, and discretization. Then, you will discover more advanced methods that capture information while encoding or transforming your variables, to obtain better features and improve the performance of regression and classification models.
You will learn methods described in scientific articles, used in data science competitions like those hosted by Kaggle and the KDD, and that are commonly utilized in organizations. And what’s more, they can be easily implemented by utilizing Python's open-source libraries.
By the end of the course, you will be able to create end-to-end machine learning workflows that fully transform your datasets and obtain predictions from them.
Feature engineering with Python
Throughout the course, we will use Python as the main language. We will compare the feature engineering implementations of the open-source libraries Pandas, Scikit-learn, Category Encoders and Feature-engine.
Throughout the tutorials, you’ll find detailed explanations of each technique and a discussion about their advantages, limitations, and underlying assumptions, followed by the best programming practices to implement them in Python.
By the end of the course, you will be able to decide which feature engineering technique you need based on the variable characteristics and the models you wish to train. And you will also be well placed to test various transformation methods and let your models decide which ones work best.
Who is this course for?
This course is for data scientists, machine learning engineers and software engineers who want to improve their skills and advance their careers.
To make the most out of this course, learners need to have basic knowledge of machine learning, data analytics, and familiarity with the most common predictive models, like linear and logistic regression, decision trees, and random forests.
This comprehensive feature engineering course contains over 100 lectures spread across 15 hours of in-demand video, more than 10 quizzes and assessments, demonstrations using real-world use cases, and all topics include hands-on Python code examples in Jupyter notebooks that you can use for reference, practice, and reuse in your own projects.