Welcome to Feature Selection for Machine Learning, the most comprehensive course on feature selection available online.
In this course, you will learn multiple feature selection methods to select the best features in your data set and build simpler, faster, and more reliable machine learning models.
What is feature selection?
Feature selection is the process of identifying and selecting a subset of features from the original data set to use as inputs in a machine learning algorithm.
Data sets usually contain a large number of features. We can use multiple algorithms to quickly disregard irrelevant features and identify those important features in our data.
Feature selection algorithms can be divided into 1 of 3 categories: filter methods, wrapper methods, and embedded methods.
Filter methods comprise basic data preprocessing steps to remove constant and duplicated features and statistical tests to assert feature importance. Wrapper methods wrap the search around the estimator. They use backward and forward selection to examine and identify the best set of features. Embedded methods combine feature selection with the fitting of the classifier or regression model.
Why do we select features?
Feature selection is key to creating easier to interpret and faster models, as well as to avoiding overfitting. When creating machine learning models to use in the real-world, feature selection is an integral part of the machine learning pipeline.
What will you learn in this online course?
In this course, you will learn multiple feature selection techniques, gathered from scientific articles, data science competitions and my experience as a data scientist, to identify relevant features in your data sets.
You will learn the following filter methods:
- Chi-square test for categorical variables
- ANOVA for continuous variables and binary or multiclass target variables
- Pearson’s correlation for continuous variables in regression
- Information gain
- Mutual information
You will learn the following wrapper methods:
- Forward selection of features
- Backward selection of variables
- Exhaustive search
You will learn the following embedded methods:
- Lasso regularization
- Linear models coefficients
- Feature importance derived from decision trees and random forests
You will learn the following hybrid methods:
- Recursive feature elimination or addition
- How to select features based on changes in model performance after feature shuffling
Throughout the tutorials, you will implement the feature selection methods in an elegant, efficient, and professional manner, using Python, Scikit-learn, pandas, MLXtend and Feature-engine.
At the end of the course, you will have a variety of tools to select and compare different feature subsets and identify the ones that return the simplest, yet most predictive machine learning model. This will allow you to minimize the time it takes to put your predictive models into production.
Who is this course for?
You’ve taken your first steps into data science. You know the most commonly used machine learning models. You've probably trained a few linear regression or decision trees. You are familiar with data preprocessing and feature engineering techniques like missing data imputation and encoding categorical variables. At this stage, you’ve probably realized that many data sets contain an enormous number of features, and some of them are identical or very similar. Some of them are not predictive at all, and for some others, it is harder to say.
You wonder how you can go about finding the most predictive features. Which ones are OK to keep and which ones could you do without? You also wonder how to code the methods in a professional manner. You probably did your online search and found out that there is not much around there about feature selection. So you start to wonder: how are things really done in tech companies?
This course will help you! This is the most comprehensive online course in variable selection. You will learn about a huge variety of feature selection procedures used worldwide in different organizations and in data science competitions, to select the most predictive features.
To get the most out of this course, you need to have a basic knowledge of machine learning and familiarity with the most common predictive models, like linear and logistic regression, decision trees, and random forests, and the metrics used to evaluate model performance. You also need basic knowledge of Python and the open source libraries, Numpy, Pandas, and sklearn.
This comprehensive feature selection course contains approximately 70 lectures spread across 8 hours of video, and ALL topics include hands-on Python code examples that you can use for reference, practice, and re-use in your own projects.