FEATURE-ENGINE

AN OPEN SOURCE PYTHON PACKAGE TO CREATE REPRODUCIBLE FEATURE ENGINEERING STEPS AND SMOOTH MODEL DEPLOYMENT

Feature-engine allows you to design and store a feature engineering pipeline with bespoke procedures for different variable groups.

Missing Data Imputation

Feature-engine includes widely used techniques for missing data imputation, such as mean and median imputation, frequent category imputation, random sample imputation, and adding a missing indicator. Feature-engine also includes alternative techniques, like end of tail imputation.

Categorical Variable

Encoding

Feature-engine comprises the most extensive library for categorical variable encoding to date, including one hot encoding, ordinal numbering, count or frequency encoding, as well as, more powerful techniques like target encoding, and weight of evidence. Feature-engine also handles rare labels automatically.

Discretisation

Feature-engine comes with the most popular methods of variable discretisation: equal width and equal frequency discretisation. Feature-engine also includes a method developed during the 2009 KDD data science competition which uses decision trees to automatically find the buckets for each variable.

Outlier Handling

Feature-engine allows you to cap variables at specific arbitrary values, or it automatically determines the capping values for you, using the inter-quantal range proximity rule.

Variable Transformation

Feature-engine brings along the most used variable transformation mathematical functions, including logarithmic, exponential, reciprocal and Box-Cox transformations.

Engineer Individual Feature Groups

Feature-engine allows you to select a subset of variables for each engineering step. All engineering steps can be integrated into a machine learning pipeline to smooth model deployment.

Why Use Feature-engine?

LEVERAGE THE POWER OF WELL-ESTABLISHED TECHNIQUES

Feature-engine includes feature engineering techniques extensively used in the industry and in data science competitions. Most of the techniques were gathered from the series of books released after the 2009 KDD data science competition, and are widely used in data science and machine learning competitions.

SIMPLIFY YOUR MACHINE LEARNING PIPELINES

Feature-engine offers Scikit-learn like functionality to create and store feature engineering steps that learn from train data and then transform test data. Each Feature-engine transformer, learns and stores parameters from the train data through the fit() method, and transforms new data using these stored parameters with the transform() method.

SMOOTH MODEL DEPLOYMENT

Feature-engine transformers are compatible with the Scikit-learn pipeline, allowing you to build and deploy one single Python object with all the required feature engineering, feature scaling and model training and scoring steps. You will only need to create, store and retrieve one pickle object in your APIs.

MORE...

Feature-engine is built on top of Scikit-learn, pandas, NumPy and SciPy. Feature-engine is able to take in and return pandas dataframes to smooth the research phase of your data science project. Feature-engine also integrates well with the Scikit-learn pipeline, allowing you to build simplified machine learning pipelines and reduce the overhead of model deployment.

Feature-engine is available in PyPi and Github, and it can be easily installed with pip. Feature-engine’s documentation is growing, with several Jupyter notebooks with examples on how to use it in the Github repository. Getting started with Feature-engine should be fairly easy.  

REFERENCES

Feature-engine’s feature engineering and variable encoding functionality is inspired by a series of articles with the winning solutions of the 2009 KDD competition.

 

The functionality, assumptions, advantages and limitations each feature engineering step in Feature-engine are extensively covered in the course Feature Engineering for Machine Learning.

TUTORIALS

Learn how to use Feature-engine with our short videos tutorial in youtube.

Privacy statement: By providing us with your email address, you are giving us permission to contact you with news related to our courses, books, open-source packages, and related notifications.

We will not share your information with third-parties. You can unsubscribe anytime. For more info, read our full Privacy Policy.

© 2018 - 2020 Train In Data

  • YouTube - Grey Circle
  • Soledad Galli - Twitter
  • LinkedIn - Grey Circle