In this post we will walk through basics of Gadfly – visualization package written in Julia. Gadfly is Julia implementation of layered grammar of graphics proposed by Hadley Wickham who implemented his idea into ggplot2 package being the main visualization library in R. One spicy note, the original inventor of “grammar of graphics” (the one who was inspiration for Wickham) is now hired by Tableau Software – leading company in data visualization.
The main motivation for grammar of graphics is to formalize visualization for statistics. Authors use word “grammar” so one can think of set of rules that let you build “correct” (with respect to given grammar) sentences. In this case though sentence is graphical so one can see the output in a form of a plot.
Lets now try to provide declarative description of what a plot is and then use this knowledge to actually plot stuff.
Plot consists of:
- Aesthetics – it can be understood as plot interface for data. Data is binded to aesthetics. Different aesthetics are expected for different kinds of plots. For example to plot set of points one can use geometry Geom.point (don’t worry yet – geometry is explained in a minute) that requires aesthetics x and y . In other words these aesthetics are always known at the time of plot creation. Knowing what you want to plot there is always specification of what aethetics chosen geometry requires – so it is not an art but rather a craft to choose proper aesthetics.
- Geometries – geometry is what defines what will be plotted, what is the geometry of your data. Each geometry requires set of aesthetics to work. Please take a look at specification of Geom.point – it requires aethetics x and y as was noted above. Different kind of geometries define different plots. The geometry is then a central point of your plot – geometries and aesthetics define what you want to plot while other components specify how you want to do it.
- Statistics – It is a middle layer between aesthetics provided and geometry. So whenever you provide aesthetics for given geometry there is corresponding statistics in the middle – very often that statistics is simply “identity” (like in case of Geom.point)
- Scales – to transform axes of your plot, (to land with log-scale of x for scatterplot one can use Scale.x_log10)
- Guides – elements responsible for plotting axis labels, titles etc.