Feature selection with Python

Find out what you will learn throughout the course (if the video does not show, please allow cookies in your browser).

What you'll learn

 How to build simpler, faster and robust machine learning models.

 Why feature selection matters.

 Filter, embedded and wrapper methods.

 Forward and backward search.

► Select features with Lasso and decision trees.

 Recursive feature selection.

 Apply feature selection with Python open source.

 More than 12k student enrolled.

 More than 1.7k student reviews.

☻ Average course rating: 4.8 out of 5.

What you'll get

5+ hs. of video lectures.

Presentations, quizzes and assignments.

Jupyter notebooks with code.

► Instructor support through Q&A.

Access in PC and mobile.

Lifetime access to content.

30 days money back guarantee

So you can buy with confidence.

Course description

Welcome to Feature Selection for Machine Learning, the most comprehensive course on feature selection available online.

In this course, you will learn multiple feature selection methods to select the best features in your data set and build simpler, faster, and more reliable machine learning models.

What is feature selection?

Feature selection is the process of identifying and selecting a subset of features from the original data set to use as inputs in a machine learning algorithm.

Data sets usually contain a large number of features. We can use multiple algorithms to quickly disregard irrelevant features and identify those important features in our data.

Feature selection algorithms can be divided into 1 of 3 categories: filter methods, wrapper methods, and embedded methods.

Filter methods comprise basic data preprocessing steps to remove constant and duplicated features and statistical tests to assert feature importance. Wrapper methods wrap the search around the estimator. They use backward and forward selection to examine and identify the best set of features. Embedded methods combine feature selection with the fitting of the classifier or regression model.

Why do we select features?

Feature selection is key to creating easier to interpret and faster models, as well as to avoiding overfitting. When creating machine learning models to use in the real-world, feature selection is an integral part of the machine learning pipeline.

What will you learn in this online course?

In this course, you will learn multiple feature selection techniques, gathered from scientific articles, data science competitions and my experience as a data scientist, to identify relevant features in your data sets.

You will learn the following filter methods:

  • Chi-square test for categorical variables
  • ANOVA for continuous variables and binary or multiclass target variables
  • Pearson’s correlation for continuous variables in regression
  • Information gain
  • Mutual information

You will learn the following wrapper methods:

  • Forward selection of features
  • Backward selection of variables
  • Exhaustive search

You will learn the following embedded methods:

  • Lasso regularization
  • Linear models coefficients
  • Feature importance derived from decision trees and random forests

You will learn the following hybrid methods:

  • Recursive feature elimination or addition
  • How to select features based on changes in model performance after feature shuffling

Throughout the tutorials, you will implement the feature selection methods in an elegant, efficient, and professional manner, using Python, Scikit-learn, pandas, MLXtend and Feature-engine.

At the end of the course, you will have a variety of tools to select and compare different feature subsets and identify the ones that return the simplest, yet most predictive machine learning model. This will allow you to minimize the time it takes to put your predictive models into production.

Who is this course for?

You’ve taken your first steps into data science. You know the most commonly used machine learning models. You've probably trained a few linear regression or decision trees. You are familiar with data preprocessing and feature engineering techniques like missing data imputation and encoding categorical variables. At this stage, you’ve probably realized that many data sets contain an enormous number of features, and some of them are identical or very similar. Some of them are not predictive at all, and for some others, it is harder to say.

You wonder how you can go about finding the most predictive features. Which ones are OK to keep and which ones could you do without? You also wonder how to code the methods in a professional manner. You probably did your online search and found out that there is not much around there about feature selection. So you start to wonder: how are things really done in tech companies?

This course will help you! This is the most comprehensive online course in variable selection. You will learn about a huge variety of feature selection procedures used worldwide in different organizations and in data science competitions, to select the most predictive features.

Course prerequisites

To get the most out of this course, you need to have a basic knowledge of machine learning and familiarity with the most common predictive models, like linear and logistic regression, decision trees, and random forests, and the metrics used to evaluate model performance. You also need basic knowledge of Python and the open source libraries, Numpy, Pandas, and sklearn.

To wrap-up

This comprehensive feature selection course contains approximately 70 lectures spread across 8 hours of video, and ALL topics include hands-on Python code examples that you can use for reference, practice, and re-use in your own projects.

Soledad Galli, PhD


Sole is a lead data scientist, instructor and developer of open source software. She created and maintains the Python library for feature engineering Feature-engine, which allows us to impute data, encode categorical variables, transform, create and select features. Sole is also the author of the book "Python Feature engineering Cookbook" by Packt editorial.

Course Curriculum

Available in days
days after you enroll
  Feature selection
Available in days
days after you enroll
  Filter Methods | Basics
Available in days
days after you enroll
  Filter methods | Correlation
Available in days
days after you enroll
  Filter methods | Statistical measures
Available in days
days after you enroll
  Filter Methods | Other methods and metrics
Available in days
days after you enroll
  Wrapper methods
Available in days
days after you enroll
  Embedded methods | Linear models
Available in days
days after you enroll
  Embedded methods – Lasso regularisation
Available in days
days after you enroll
  Embedded methods | Trees
Available in days
days after you enroll
  Hybrid feature selection methods
Available in days
days after you enroll
  Final section | Next steps
Available in days
days after you enroll

Frequently Asked Questions

When does the course begin and end?

You can start taking the course from the moment you enroll. The course is self-paced, so you can watch the tutorials and apply what you learn whenever you find it most convenient.

For how long can I access the course?

The courses have lifetime access. This means that once you enroll, you will have unlimited access to the course for as long as you like.

What if I don't like the course?

There is a 30-day money back guarantee. If you don't find the course useful, contact us within the first 30 days of purchase and you will get a full refund.