Making the complex simple in data viz
Tania Vasilikioti
When building a data visualization in Python from scratch, we quickly stumble across multiple questions: "What type of plot to use?" "How to scale the axes?" "How many dimensions?" "Should I use colors?" In trying to answer these, we can easily get lost in multiple exploratory trips to Stack Overflow, documentation pages, example galleries, and tutorials, often resulting in code that utilizes three different libraries, is more than 50 lines long, and leaves us with a feeling that our graphic's language is lacking structure.
Part of the problem is the abundance of data visualization packages available in Python and their very different syntactic patterns. Each package uses different ways to change elements like axis features, labels, annotations, titles, or even gridlines. Even the way the input data needs to be structured can differ. As a result, the analyst is often absorbed more by figuring out how to adjust each visual detail, instead of thinking of the graphic as a system of logically structured elements, and mapping the data to each of them separately.
Enter The Grammar of Graphics, a framework conceptualized by L. Wilkinson in 1999, which helps us better understand the underlying structure of every graphic. The talk will introduce the framework by deconstructing a simple chart into its constituent "grammatical elements": Aesthetics, Algebra, Scales, Statistics, Geometry, and Coordinates. I will discuss each and explain how every element directly translates to the decisions we take when designing a graphic.
Then, I will practically demonstrate how plotting with a grammar can be highly liberating, as it makes otherwise complex plots easy to think about and then to create. Even though The Grammar of Graphics (and its sister The Layered Grammar of Graphics, Wickham 2010) is most famously implemented in R's ggplot2
, that doesn't mean the framework is language-specific – it can be used with any Python visualization package. To demonstrate, I will show an application of the framework by building a chart in the Python plotnine
package, and then explaining how we can use the grammar as a guide to let us build the same chart in matplotlib
– one grammatical element at a time.
Tania Vasilikioti
Affiliation: Babbel
First exposed to programming with R in an econometrics course in university, Tania was quickly fascinated by the beautifully (and sensibly!) structured world of ggplot2
. Moving to Python last year made her realize that building graphics is not at all as easy as Hadley Wickham made it seem. This spurred a quest to find the best framework to think through when designing graphics, and a way to apply it practically in Python.
Tania is a Data Scientist at Babbel in Berlin and fights daily for the best way to visualize complex data in simple ways.