Feature Selection for Machine Learning
Why do we select features for business machine learning models?
I am glad to announce that the course “Feature Selection for Machine Learning” which is live in Udemy.com has enrolled 1700+ students in under a year, and continues to receive good reviews from the students. I hope that the course can help even more of you to learn and apply different techniques to select features for your machine learning models.
What is feature selection?
Feature selection is the process of identifying and selecting a subset of variables from the original dataset, to use as inputs of machine learning models. There are plenty of techniques or algorithms that can be used to select these features. Each one of them has advantages and things to consider.
Why do we select features in business?
For a variety of reasons. First, simpler models are easier to interpret. It is easier for the users of the model to understand the output of a model with 10 variables, that the output of a model with 100 variables.
Second, simpler models require shorter training times. Reducing the number of variables used to build machine learning models, reduces the computational cost, and therefore speeds up model building. And more importantly, simpler models score the new incoming data much faster, which is particularly important if the model is going to be used to make real-time decisions.
Third, simpler models show enhanced generalisation by reducing overfitting. Often, many of the variables add noise with little if any predictive value. The machine learning models learn however from this noise, causing overfitting and reducing generalisation. By eliminating irrelevant, noisy features, we can improve substantially the generalisation of a machine learning model.
Fourth, simpler models are easier to put in production. Simpler models require simpler production code, less error handling and less unit testing, and therefore are easier to implement by software developers. Not only it is faster to write code for 10-50 variables than for 100s, but also it is less prone to bugs, and therefore provides a safer environment.
Finally, simpler models reduce the risk of data dependent errors while using the model live. Often companies rely on third party data. A call to a third party data provider is made to gather the necessary variables that feed into the machine learning model. Reducing the number of variables used in the machine learning model, reduces the exposure of the business to errors in the data collection and storage of third parties.
How do we select features?
There are multiple ways in which we can select features from our datasets. A feature selection procedure or algorithm involves a combination of a search technique for proposing a new feature subset, with an evaluation measure which determines the different feature subset.
Ideally, a feature selection method would search through all the possible subsets of feature combinations that can be obtained from a given dataset, and find the feature combination that produces the best machine learning model performance. In practice, this is often impracticable.
In addition, different subset of features may produce optimal performance for different machine learning algorithms. This means that there is not only 1 subset of features, but potentially many subset of optimal features, depending on the machine learning model we intend to use.
Therefore, throughout the years very many different methods to select features have been developed, to try to accommodate as many caveats and limitations as possible.
Feature selection algorithms are grouped in three categories: filter methods, wrapper methods, and embedded methods.
Filter methods include those methods that look at each feature individually, and assess how important they are to determine the target. The advantages of these methods are that they are very fast to implement, and model agnostic. Therefore, features selected with these methods can be used to build any machine learning model. The disadvantage is that often, they do not select the best subset of features, and also it disregards feature interaction. Good examples of filter methods are chi-squared /fisher score or univariate anova.
Wrapper methods refer to those methods that will examine all or many possible feature combinations, to identify the optimal feature set. They are called greedy algorithms, precisely because they try all possible or at least several possible feature combinations, to determine which one produces the best performing algorithm. Examples of wrapper methods are the Step-wise selection algorithms. In these methods, features are removed one at a time, or are added one at a time, a machine learning model is built and the performance is determined. The selection procedure ends when the best performing model is found. The advantage of these methods is that often they find the best feature combination. The disadvantage is that as they build one machine learning model each time they remove or add a feature, the process is very time and computational resource consuming, to the point that they are often impracticable.
Embedded methods refer to those selection procedures that occur together with fitting the machine learning model. Good examples of these are the Lasso regularisation and the feature importance derived by building decision and embedded trees. After the model is fit, we obtain either the importance of the coefficient to the regression, or the importance of the feature to the decision, and we can rank them and select a subgroup. They are faster than wrapper methods, and tend to find very good feature subsets. The disadvantage is that they are not model agnostic.
What is good about the course Feature Selection for ML?
In Feature Selection for Machine Learning on Udemy, I have collected a variety of techniques used worldwide for feature selection, learnt from articles by the KDD competition winners, white papers, different blogs and forums, and from my experience as a Data Scientist. My intention is to provide a source of reference for data scientists, where they can learn and re-visit the techniques and code needed to select variables to use in Machine Learning algorithms. I have collected, altogether in one place, multiple methods -including code- that you can apply to select features from your data set.
The course starts describing simple and fast methods to quickly screen the data set and remove redundant and irrelevant features. Then it describes more complex techniques that select variables taking into account variable interaction, the feature importance and its interaction with the machine learning algorithm. Finally, it describes specific techniques used in data competitions and the industry.
The lectures include an explanation of the feature selection technique, the rationale to use it, and the advantages and limitations of the procedure. It also includes full code that you can take home and apply to your own data sets.
To learn more about the different feature selection methods check Feature Selection for Machine Learning course on Udemy.
For a visual summary of feature engineering and feature selection methods, check the slides of my last talk at the Data Science Festival meetup presentation
For more resources on data science and machine learning, visit TrainInData.
To receive our latest news and be the first one to find out about our latest courses subscribe to our mailing list.