Post

Reading CSV file into Julia

As someone experienced in R, I naturally look for data.frame-like structures in Julia to load a CSV file into. Luckily, it is present and seems to work pretty well. You need to install a package called DataFrames to operate on R-like dataframes:

1
Pkg.add("DataFrames")

and load it after installation:

1
using DataFrames;

The whole documentation is available here. For now, we will try to load a simple CSV file and play with it. You can use the iris dataset. It is a toy dataset meant for various machine learning tasks. Let’s download it and read it into a variable called iris:

1
iris = readtable("iris.csv")

Having your variable ready, let’s see what we can do with it. First, take a look at its size:

1
2
size(iris)
(150, 5)

150 rows and 5 columns. What are the column names?

1
2
3
4
5
6
7
8
names(iris)

5-element Array{Symbol,1}:
 :Sepal_Length
 :Sepal_Width
 :Petal_Length
 :Petal_Width
 :Species

As you can see, columns are represented as Symbols. DataFrame lets you access its column by name (represented as a Symbol):

1
sepal_length_column = iris[:Sepal_Length]

Let’s see the type of the resulting column:

1
2
typeof(iris[:Sepal_Length])
DataArray{Float64,1} (constructor with 1 method)

Another way to access a data frame column is by using an index. In Julia, all built-in indexing starts with 1. To access the sepal length (first) column, you can use:

1
sepal_length_column = iris[1]

Can we select a region of the data frame as is possible in R? Julia gives you that too. Accessing the 2nd and 3rd columns of the last 10 rows is as easy as:

1
iris_sub = iris[end-10:end, 2:3]

What about writing to a DataFrame? Can you replace a whole column? Yes, to replace it with a randomly generated vector, try:

1
iris[1] = randn(nrow(iris))

What about replacing a row? Let’s try to copy the first row and write it as the last one.

1
iris[end, :] = iris[1, :]

Are they equal now?

1
2
iris[end, :] == iris[1, :]
true

It is also easy to convert a DataFrame to a matrix using the convert function:

1
iris_matrix = convert(Array, iris)

The type of iris_matrix is then a square Array of Any. Julia will specify the resulting type as much as possible. So if your input DataFrame consists of floats only, it will convert it to a square Array of Float64.

1
iris_matrix = convert(Array, iris[1:2])

In summary, it seems like all basic R data.frame-like operations are supported in Julia too. Of course, data.frame in R is not just a data type/structure; it is built-in, and many functions in R assume it as input, so it is pretty natural to use data.frames in R. It is your basic structure, in fact. The existence of the same interface in Julia does not constitute its power, of course. It is the number of functions around data.frame in R that does. And I am not sure if DataFrames are that highly supported in Julia. The final point, anyway, is that the DataFrames package is a good starting point for someone who has been using R and wants to jump into Julia quickly.


By int8

This post is licensed under CC BY 4.0 by the author.