Feature engineering with Python
Find out what you will learn throughout the course (if the video does not show, try allowing cookies in your browser).
What you'll learn
👉 Multiple methods for missing data imputation.
👉 Strategies to transform categorical variables into numbers.
👉 How to handle infrequent categories.
👉 Variance stabilizing transformations.
👉 Multiple discretization techniques.
👉 How and when to handle outliers.
👉 How to create features from dates and times.
👉 Apply transformations with Python open source libraries.
What you'll get
✔ 10+ hs. of video lectures.
✔ Presentations, quizzes and assignments.
✔ Jupyter notebooks with code.
✔ Instructor support through Q&A.
✔ Access in PC and mobile.
✔ Lifetime access to content.
✔ Certificate of completion.
What they say...
☻ More than 5k students enrolled.
☻ More than 450 student reviews.
☻ Average course rating: 4.7 out of 5.
Instructor
Soledad Galli, PhD
Sole is a lead data scientist, instructor, and developer of open source software. She created and maintains the Python library for feature engineering, Feature-engine, which allows us to impute data, encode categorical variables, transform, create, and select features. Sole is also the author of the book "Python Feature Engineering Cookbook," published by Packt.
30 days money back guarantee
If you're disappointed for whatever reason, you'll get a full refund.
So you can buy with confidence.
Course description
Welcome to the most comprehensive course on feature engineering for machine learning available online.
What is feature engineering?
Feature engineering is the process of using domain knowledge and statistical methods to create features that make machine learning algorithms work effectively.
Feature engineering is key in applied machine learning. Raw data is almost never suitable to train machine learning models. In fact, data scientists devote a lot of effort to data analysis, data preprocessing, and feature extraction, to create better features to train predictive models.
What will you learn in this online course?
In this course, you will learn about missing data imputation, encoding of categorical features, numerical variable transformation, discretization, and how to create new features from your dataset.
Specifically, you will learn:
- How to impute missing values
- How to encode categorical features
- How to transform and scale numerical variables
- How to perform discretization
- How to remove outliers
- How to perform feature extraction from date and time
- How to create new features from existing ones
While most online courses will teach you the very basics of feature engineering, like imputing variables with the mean or transforming categorical features using one hot encoding, this course will teach you all of that, and much, much more.
You will first learn the most popular techniques for variable engineering, like mean and median imputation, one-hot encoding, transformation with logarithm, and discretization. Then, you will discover more advanced methods that capture information while encoding or transforming your variables, to obtain better features and improve the performance of regression and classification models.
You will learn methods described in scientific articles, used in data science competitions like those hosted by Kaggle and the KDD, and that are commonly utilized in organizations. And what’s more, they can be easily implemented by utilizing Python's open-source libraries.
Feature engineering with Python
Throughout the course, we will use Python as the main language. We will compare the implementation of feature engineering with the open-source libraries Pandas, Scikit-learn, Category Encoders and Feature-engine.
Throughout the tutorials, you’ll find detailed explanations of each technique and a discussion about their advantages, limitations, and underlying assumptions, followed by the best programming practices to implement them in Python.
By the end of the course, you will be able to decide which feature engineering technique you need based on the variable characteristics and the models you wish to train. And you will also be well placed to test various transformation methods and let your models decide which ones work best.
Who is this course for?
This course is for data scientists and software engineers who want to improve their skills and advance their careers.
Course prerequisites
To make the most out of this course, learners need to have basic knowledge of machine learning and familiarity with the most common predictive models, like linear and logistic regression, decision trees, and random forests.
To wrap-up
This comprehensive feature engineering course contains over 100 lectures spread across approximately 10 hours of video, and all topics include hands-on Python code examples in Jupyter notebooks that you can use for reference, practice, and reuse in your own projects.
Course Curriculum
- Variable characteristics (2:44)
- Missing data (6:43)
- Cardinality (5:04)
- Rare labels (4:54)
- Variable distribution (5:13)
- Outliers (8:27)
- Linear models assumptions (8:59)
- Linear model assumptions - additional reading resources (optional)
- Variable magnitude (3:09)
- Summary table
- Additional reading resources
- How are we doing?
- Basic imputation methods (3:52)
- Mean or median imputation (4:53)
- Arbitrary value imputation (3:16)
- Frequent category imputation (3:30)
- Missing category imputation (1:22)
- Adding a missing indicator (3:42)
- Basic methods - considerations (11:15)
- Basic imputation with pandas (6:45)
- Basic imputation with pandas - demo (12:35)
- Basic methods with Scikit-learn (9:44)
- Mean or median imputation with Scikit-learn (10:53)
- Arbitrary value imputation with Scikit-learn (3:57)
- Frequent category imputation with Scikit-learn (4:38)
- Missing category imputation with Scikit-learn (2:24)
- Adding a missing indicator with Scikit-learn (4:59)
- Imputation with GrdiSearch - Scikit-learn (8:24)
- Basic methods with Feature-engine (7:19)
- Mean or median imputation with Feature-engine (6:50)
- Arbitrary value imputation with Feature-engine (3:16)
- Frequent category imputation with Feature-engine (2:34)
- Arbitrary string imputation with Feature-engine (3:24)
- Adding a missing indicator with Feature-engine (4:52)
- Wrapping up (2:19)
- How are we doing?
- Alternative imputation methods (2:59)
- Complete Case Analysis (6:30)
- CCA - considerations with code demo (3:45)
- End of distribution imputation (4:14)
- Random sample imputation (14:14)
- Random imputation - considerations with code (7:56)
- Mean or median imputation per group (4:32)
- CCA with pandas (5:19)
- End of distribution imputation with pandas (5:24)
- Random sample imputation with pandas (4:46)
- Mean imputation per group with pandas (5:34)
- CCA with Feature-engine (6:47)
- End of distribution imputation with Feature-engine (5:13)
- Random sample imputation with Feature-engine (2:25)
- Imputation - Summary table
- Wrapping up (5:52)
- Categorical encoding | Introduction (4:59)
- One hot encoding (6:03)
- One hot encoding with pandas (7:29)
- One hot encoding with sklearn (11:06)
- One hot encoding with Feature-engine (2:19)
- One hot encoding with Category encoders (5:04)
- Ordinal encoding (1:50)
- Ordinal encoding with pandas (3:16)
- Ordinal encoding with sklearn (4:05)
- Ordinal encoding with Feature-engine (1:49)
- Ordinal encoding with Category encoders (1:43)
- Count or frequency encoding (3:11)
- Count encoding with pandas (2:58)
- Count encoding with Feature-engine (1:21)
- Count encoding with Category encoders (1:42)
- How are we doing?
- Discretisation | Introduction (3:01)
- Equal-width discretisation (4:06)
- Important: Feature-engine v 1.0.0
- Equal-width discretisation | Demo (11:18)
- Equal-frequency discretisation (4:13)
- Equal-frequency discretisation | Demo (7:16)
- K-means discretisation (4:13)
- K-means discretisation| Demo (2:43)
- Discretisation plus categorical encoding (2:54)
- Discretisation plus encoding | Demo (5:45)
- Discretisation with classification trees (5:05)
- Discretisation with decision trees using Scikit-learn (11:55)
- Discretisation with decision trees using Feature-engine (3:48)
- Domain knowledge discretisation (3:52)
- Additional reading resources
- Feature scaling | Introduction (3:44)
- Standardisation (5:31)
- Standardisation | Demo (4:39)
- Mean normalisation (4:02)
- Mean normalisation | Demo (5:21)
- Scaling to minimum and maximum values (3:24)
- MinMaxScaling | Demo (3:01)
- Maximum absolute scaling (3:01)
- MaxAbsScaling | Demo (3:45)
- Scaling to median and quantiles (2:46)
- Robust Scaling | Demo (2:04)
- Scaling to vector unit length (5:51)
- Scaling to vector unit length | Demo (5:18)
- Additional reading resources
Frequently Asked Questions
When does the course begin and end?
You can start taking the course from the moment you enroll. The course is self-paced, so you can watch the tutorials and apply what you learn whenever you find it most convenient.
For how long can I access the course?
The courses have lifetime access. This means that once you enroll, you will have unlimited access to the course for as long as you like.
What if I don't like the course?
There is a 30-day money back guarantee. If you don't find the course useful, contact us within the first 30 days of purchase and you will get a full refund.
Will I get a certificate?
Yes, you'll get a certificate of completion after completing all lectures, quizzes and assignments.