Leveraging ML to obtain fine-grained (yet reliable) causal estimates from A/B tests and experiments
Maximilian Eber
Randomized experiments (including A/B tests) are simple yet powerful for establishing the causal effects of different treatments. Pharmaceutical companies use them to establish the effectiveness of medications, governments use them to evaluate the effectiveness of social programs, and online shops use them to optimize the layout of their website.
While it is easy to establish the average effect of a treatment, it is notoriously difficult to tease out which subgroups benefit the most. However, often this is the most relevant question: In medical research, for example, it is important to find the groups for which a novel drug is most powerful (as well as groups for which it might be harmful). Similarly, knowing which layout works best for a certain sub-population allows for effective personalization of apps, websites, and marketing programs. As experimentation is becoming pervasive, it is is crucial to develop robust tools for establishing fine-grained (yet reliable) treatment effects.
In this talk, I will survey a series of methods for combining experimental data with ML techniques to discover fine-grained, heterogeneous treatment effects. I will present one particular approach in detail that is based on repeated sample splitting and provide Python code for replication. The method is general enough for a wide range of applications and does not rely on a particular type of ML algorithm being used in the prediction stage.