Basic visualization in Julia – Gadfly

In this post we will walk through basics of Gadfly – visualization package written in Julia. Gadfly is Julia implementation of layered grammar of graphics proposed by Hadley Wickham who implemented his idea into ggplot2 package being the main visualization library in R. One spicy note, the original inventor of “grammar of graphics” (the one who was inspiration for Wickham) is now hired by Tableau Software – leading company in data visualization.

The main motivation for grammar of graphics is to formalize visualization for statistics. Authors use word “grammar” so one can think of set of rules that let you build “correct” (with respect to given grammar) sentences. In this case though sentence is graphical so one can see the output in a form of a plot.

Lets now try to provide declarative description of what a plot is and then use this knowledge to actually plot stuff.

Plot consists of:

  • Aesthetics – it can be understood as plot interface for data. Data is binded to aesthetics. Different aesthetics are expected for different kinds of plots. For example to plot set of points one can use geometry Geom.point (don’t worry yet – geometry is explained in a minute) that requires aesthetics x and y . In other words these aesthetics are always known at the time of plot creation. Knowing what you want to plot there is always specification of what aethetics chosen geometry requires – so it is not an art but rather a craft to choose proper aesthetics.
  • Geometries – geometry is what defines what will be plotted, what is the geometry of your data. Each geometry requires set of aesthetics to work. Please take a look at specification of Geom.point – it requires aethetics x and y as was noted above. Different kind of geometries define different plots. The geometry is then a central point of your plot – geometries and aesthetics define what you want to plot while other components specify how you want to do it.
  • Statistics – It is a middle layer between aesthetics provided and geometry. So whenever you provide aesthetics for given geometry there is corresponding statistics in the middle – very often that statistics is simply “identity” (like in case of Geom.point)
  • Scales – to transform axes of your plot, (to land with log-scale of x for scatterplot one can use Scale.x_log10)
  • Guides – elements responsible for plotting axis labels, titles etc.

Ok, having that basic knowledge lets try to play with Gadfly a bit. Installation of the package is as easy as:

Pkg.add("Gadfly")

Its recommended to install Cairo that serves as PDF/PS/PNG backend (in case you want to export your plots)

Pkg.add("Cairo")

We will need a dataset to play with. Its probably good time to mention Rdatasets package that gives Julia programmers access to R datasets

Lets quickly install and load it:

Pkg.add("RDatasets")
using RDatasets

For the pursposes of this post we will work with sleepstudy dataset from package “lme4”. We will be looking at certain tasks reaction times of people who were sleeping less then recommended 9 days in a row (details of dataset here)

sleep = dataset("lme4","sleepstudy")

Lets first list columns of sleep DataFrame:

names(sleep)
3-element Array{Symbol,1}:
 :Reaction
 :Days
 :Subject

Reaction is a reaction time in milliseconds, Days represents day number of sleep deprivation and Subject is the subject id.

Ok lets first take a look at scatterplot of day number and reaction time with no subject distinction. To plot scatterplot we need to use Geom.point geometry and attach proper columns to x and y

plot(sleep, x = "Days", y = "Reaction", Geom.point)

plot

We can append another layer to plot function, Geom.smooth – to fit smooth curve to dataset provided, it does not require additional aesthetics.

plot(sleep, x = "Days", y = "Reaction", Geom.point, Geom.smooth)

plot

Not perfect yet, we still would like to have x scale to be discrete, one can use Scale.x_discrete to force it. Moreover lets point out Reaction time is measured in ms and set a title for our plot. Guide will help us here.

plot(sleep, x = "Days", y = "Reaction", Geom.point, Geom.smooth, Scale.x_discrete, 
     Guide.ylabel("Reaction - ms"), Guide.title("Reaction time across days of sleep deprivation"))

plot

Now lets say we would like to investigate general reaction ability for every subject. Lets say we want a density plot that shows reaction time

plot(sleep, x = "Reaction", Geom.density, color = "Subject", Scale.x_continuous(minvalue= 0, maxvalue= 500))

plot

This shows individual variances of reaction time. You can note that Subject 309 is probably a machine or at least immortal human.

How these values look across differet days

plot(sleep, x = "Days", y ="Reaction", Geom.point, Geom.smooth, color = "Subject", Scale.y_continuous(minvalue = 200, maxvalue = 500))

plot

(btw if you are reading it late at night it is probably good reason to go to sleep)

All plots above required dataframe as input. It is not mandatory of course. One can provide data vector directly to aesthetics

plot(x = sleep[:Reaction], Geom.histogram(bincount = 30), Scale.x_continuous(minvalue = 200), color = sleep[:Days])

plot

Another convenient way is to stack many layers on one plot. Lets say we want to compare reaction time of subject 309 and 310 on one plot

plot(
     layer(x = sleep[:Days][sleep[:Subject] .== "309"], y = sleep[:Reaction][sleep[:Subject] .== "309"], Geom.point, Geom.smooth, Theme(default_color=color("red"))),
     layer(x = sleep[:Days][sleep[:Subject] .== "310"], y = sleep[:Reaction][sleep[:Subject] .== "310"], Geom.point, Geom.smooth, Theme(default_color=color("blue")))
     ,Guide.XLabel("Days"), Guide.YLabel("Reaction time - ms"))

plot

Please note Theme argument provided to each layer, using Theme you can set plot parameters such as: default_color, point size, font size of x label, font size of y label and many more. More about themes here

As you can see Gadfly gives you quite flexible environment when it comes to visualization. Lets try to summarize plotting in Julia with Gadfly. It is not painful at all, very friendly for R users. It gives flexibility to the users, after some time of playing with it I find it more and more convenient

If you want to can check out some more advanced examples of Gadfly plotting take a look here. Grammar of Graphics – which I would guess is a classic is available on Amazon

---