AN OPEN SOURCE PYTHON PACKAGE TO CREATE REPRODUCIBLE FEATURE ENGINEERING STEPS AND SMOOTH MODEL DEPLOYMENT
Feature-engine allows you to design and store a feature engineering pipeline with bespoke procedures for different variable groups.
Missing Data Imputation
Feature-engine includes widely used missing data imputation methods, such as mean and median imputation, frequent category imputation, random sample imputation, and adding a missing indicator. Feature-engine also includes alternative techniques, like end of tail imputation.
Feature-engine multiple techniques for categorical variable encoding, including one hot encoding, ordinal encoding, count or frequency encoding, as well as, more powerful techniques like target encoding, and weight of evidence. Feature-engine also handles rare labels automatically.
Feature-engine includes the popular methods equal width and equal frequency discretisation, as well as arbitrary discretisation and a method developed during the 2009 KDD data science competition which uses decision trees to automatically find the buckets for each variable.
Engineer Individual Feature Groups
Feature-engine allows you to select the group of variables you want to transform and the transformers can be integrated into a machine learning pipeline to smooth model deployment.
Feature-engine includes methods to remove constant, duplicated and correlated features, as well as other hybrid methods developed in the industry or data science competitions.
Why Use Feature-engine?
LEVERAGE THE POWER OF WELL-ESTABLISHED TECHNIQUES
Feature-engine includes feature engineering techniques extensively used in the industry and in data science competitions. Most of the techniques were gathered from the series of books released after the 2009 KDD data science competition, and are widely used in data science and machine learning competitions.
SIMPLIFY YOUR MACHINE LEARNING PIPELINES
Feature-engine offers Scikit-learn like functionality to create and store feature engineering steps that learn from train data and then transform test data. Each Feature-engine transformer, learns and stores parameters from the train data through the fit() method, and transforms new data using these stored parameters with the transform() method.
SMOOTH MODEL DEPLOYMENT
Feature-engine transformers are compatible with the Scikit-learn pipeline, allowing you to build and deploy one single Python object with all the required feature engineering, feature scaling and model training and scoring steps. You will only need to create, store and retrieve one pickle object in your APIs.
Feature-engine is built on top of Scikit-learn, pandas, NumPy and SciPy. Feature-engine is able to take in and return pandas dataframes to smooth the research phase of your data science project. Feature-engine also integrates well with the Scikit-learn pipeline, allowing you to build simplified machine learning pipelines and reduce the overhead of model deployment.
Feature-engine is available in PyPi and Github, and it can be easily installed with pip. Feature-engine’s documentation is growing, with several Jupyter notebooks with examples on how to use it in the Github repository. Getting started with Feature-engine should be fairly easy.
Feature-engine’s feature engineering and variable encoding functionality is inspired by a series of articles with the winning solutions of the 2009 KDD competition.
The functionality, assumptions, advantages and limitations each feature engineering step in Feature-engine are extensively covered in the course Feature Engineering for Machine Learning.