Automated Feature Engineering and Selection in Python
Franziska Horn
While there already exist several libraries for automatically selecting the best ML model and its hyperparameters for a prediction task, feature engineering is still mostly a manual task. I will present different options for automating the feature engineering and selection process in Python with a focus on the open source autofeat
library, which provides a scikit-learn
style linear regression model with automated feature engineering and selection capabilities.
Complex non-linear machine learning models such as neural networks are in practice often difficult to train and even harder to explain to non-statisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. The autofeat
library provides a multi-step feature engineering and selection process, where first a large pool of non-linear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability.
Franziska Horn
Affiliation: TU Berlin
Franzi has several years of experience tackling machine learning problems in both research and application contexts. She has specialised in natural language processing, representation learning, and data visualisation. She holds a BSc in cognitive science, a MSc in computer science, and is currently completing her PhD in machine learning, while also working as a freelance data science consultant.