FEATURE-ENGINE

AN OPEN SOURCE PYTHON PACKAGE TO CREATE REPRODUCIBLE FEATURE ENGINEERING STEPS AND SMOOTH MODEL DEPLOYMENT

Feature-engine allows you to design and store a feature engineering pipeline with bespoke procedures for different variable groups.

Missing Data Imputation

Feature-engine includes widely used missing data imputation methods, such as mean and median imputation, frequent category imputation, random sample imputation, and adding a missing indicator. Feature-engine also includes alternative techniques, like end of tail imputation.

tags-512.png

Categorical Encoding

Feature-engine multiple techniques for categorical variable encoding, including one hot encoding, ordinal encoding, count or frequency encoding, as well as, more powerful techniques like target encoding, and weight of evidence. Feature-engine also handles rare labels automatically.

Discretisation

Feature-engine includes the popular methods equal width and equal frequency discretisation, as well as arbitrary discretisation and a method developed during the 2009 KDD data science competition which uses decision trees to automatically find the buckets for each variable.

Outlier Handling

Feature-engine allows you to cap variables at specific values, arbitrarily or using statistical estimates. With Feature-engine you can also remove outliers from the data.

Variable Transformation

Feature-engine brings along the most used mathematical functions to transform variables, including logarithmic, exponential, reciprocal and Box-Cox.

8.png

Engineer Individual Feature Groups

Feature-engine allows you to select the group of variables you want to transform and the transformers can be integrated into a machine learning pipeline to smooth model deployment.

Feature Creation

Feature-engine allows you to combine 2 or more variables to create new features. It currently supports mathematical combinations with sum, mean , max, min, std and more.

Variable Selection

Feature-engine includes methods to remove constant, duplicated and correlated features, as well as other hybrid methods developed in the industry or data science competitions.

Wrap Sklearn classes

Feature-engine can wrap most Scikit-learn transformers so that they are applied only to a selected feature group and directly fit in a Pipeline.

Why Use Feature-engine?

LEVERAGE THE POWER OF WELL-ESTABLISHED TECHNIQUES

Feature-engine includes feature engineering techniques extensively used in the industry and in data science competitions. Most of the techniques were gathered from the series of books released after the 2009 KDD data science competition, and are widely used in data science and machine learning competitions.

SIMPLIFY YOUR MACHINE LEARNING PIPELINES

Feature-engine offers Scikit-learn like functionality to create and store feature engineering steps that learn from train data and then transform test data. Each Feature-engine transformer, learns and stores parameters from the train data through the fit() method, and transforms new data using these stored parameters with the transform() method.

Train in data - Machine Learning Pipeline

SMOOTH MODEL DEPLOYMENT

Train in data - Model Deployment

Feature-engine transformers are compatible with the Scikit-learn pipeline, allowing you to build and deploy one single Python object with all the required feature engineering, feature scaling and model training and scoring steps. You will only need to create, store and retrieve one pickle object in your APIs.

MORE...

Feature-engine is built on top of Scikit-learn, pandas, NumPy and SciPy. Feature-engine is able to take in and return pandas dataframes to smooth the research phase of your data science project. Feature-engine also integrates well with the Scikit-learn pipeline, allowing you to build simplified machine learning pipelines and reduce the overhead of model deployment.

Feature-engine is available in PyPi and Github, and it can be easily installed with pip. Feature-engine’s documentation is growing, with several Jupyter notebooks with examples on how to use it in the Github repository. Getting started with Feature-engine should be fairly easy.  

Scikit-learn
pandas Python package
NumPy Python package

REFERENCES

Feature-engine’s feature engineering and variable encoding functionality is inspired by a series of articles with the winning solutions of the 2009 KDD competition.

 

The functionality, assumptions, advantages and limitations each feature engineering step in Feature-engine are extensively covered in the course Feature Engineering for Machine Learning.

TUTORIALS

Learn how to use Feature-engine with our short videos tutorial in youtube.