Automated Feature Engineering and Selection in Python Franziska Horn PyConDE & PyDataBerlin 2019 conference

Automated Feature Engineering and Selection in Python

Franziska Horn

Wednesday 14:00 in Saal 5 wednesday wednesday-1400

Type/Track Talk PyData

While there already exist several libraries for automatically selecting the best ML model and its hyperparameters for a prediction task, feature engineering is still mostly a manual task. I will present different options for automating the feature engineering and selection process in Python with a focus on the open source autofeat library, which provides a scikit-learn style linear regression model with automated feature engineering and selection capabilities.

Complex non-linear machine learning models such as neural networks are in practice often difficult to train and even harder to explain to non-statisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. The autofeat library provides a multi-step feature engineering and selection process, where first a large pool of non-linear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability.

Tags Data Science Machine Learning Science Data Engineering Statistics

Level Domain Expertise some Python Skill Level basic

Franziska Horn

Affiliation: TU Berlin

Franzi has several years of experience tackling machine learning problems in both research and application contexts. She has specialised in natural language processing, representation learning, and data visualisation. She holds a BSc in cognitive science, a MSc in computer science, and is currently completing her PhD in machine learning, while also working as a freelance data science consultant.

visit the speaker at: Github • Homepage