Data Engineering Session List
10 Years of Automated Category Classification for Product Data
Johannes Knopp
Artificial Intelligence, Deep Learning, Data Science, Infrastructure, Machine Learning, Data Engineering10 years ago we built a classifier for categorizing product data. Let's take a journey through the lessons we learned over the years about building, maintaining, and modernizing the category classifier.
Airflow: your ally for automating machine learning and data pipelines
Enrica Pasqua, Bahadir Uyarer
Big Data, Infrastructure, Machine Learning, Data EngineeringAutomate your machine learning and data pipelines with Apache Airflow
Automated Feature Engineering and Selection in Python
Franziska Horn
Data Science, Machine Learning, Science, Data Engineering, StatisticsAutomated feature engineering and selection in Python with the autofeat library.
Automating feature engineering for supervised learning? Methods, open-source tools and prospects.
Thorben Jensen
Artificial Intelligence, Algorithms, Data Science, Machine Learning, Data EngineeringHow to automate the labor-intensive task of feature engineering for Machine Learning? This talk gives an overview on methods, presents open-source libraries for Python, and compares their performance.
Decentralized and Privacy-Preserving ML via TensorFlow Federated
Peter Kairouz, Amlan Chakraborty
Artificial Intelligence, Deep Learning, Data Science, Machine Learning, Data EngineeringMeet TensorFlow Federated: an open-source framework for machine learning and other computations on decentralized data.
Fighting fraud: finding duplicates at scale
Alexey Grigorev
Data Science, Infrastructure, Machine Learning, Data EngineeringFight fraudsters at scale: use machine learning to find duplicates in 10 million ads daily
Introduction to automated testing with pytest
Raphael Pierzina
DevOps, Web, Data EngineeringLearn how to get started with developing automated tests in Python with the pytest test framework!
Kartothek – Table management for cloud object stores powered by Apache Arrow and Dask
Florian Jetter
Big Data, Data EngineeringKartothek - Table management for cloud object stores powered by @ApacheArrow and @dask_dev
Managing the end-to-end machine learning lifecycle with MLFlow
Tobias Sterbak
Data Science, Infrastructure, Machine Learning, Data EngineeringHow to manage the end-to-end machine learning lifecycle with MLflow.
Mock Hell
Edwin Jung
Code-Review, Web, Data EngineeringMock Hell: How to escape and avoid it, and improve your design in the process.
Production-level data pipelines that make everyone happy using Kedro
Yetunde Dada
Data Science, DevOps, Machine Learning, Data EngineeringLearn how easy it is to apply software engineering principles to your data science and data engineering code. Expect an overview of Kedro, a library that implements best practices for data pipelines with an eye towards productionizing ML models.
Transforming a Legacy System into a Bias-Mitigating AI Solution for Debt Repayment
Avaré Stewart
Artificial Intelligence, Data Science, Natural Language Processing, Machine Learning, Data EngineeringUnleash Intelligence in you Data Transform a Legacy System into Bias-Mitigating AI Solution for Debt Repayment with Tesseract, SpaCy, & AI Fairness 360
What we learned from scraping 1 billion webpages every month
Samet Atdag
Business & Start-Ups, Big Data, Infrastructure, Web, Data EngineeringWe broke the web via simple hacks. Instead of order, we caused chaos. How to fix that?
🌈Apache Airflow for beginners
Varya
Infrastructure, Data EngineeringAirflow can sound more complicated than it is. Learn the basics on the practical example.
Filter