Reading CSV file into Julia
As someone experienced in R, I naturally look for data.frame
-like structures in Julia to load a CSV file into. Luckily, it is present and seems to work pretty well. You need to install a package called DataFrames
to operate on R-like dataframes:
1
Pkg.add("DataFrames")
and load it after installation:
1
using DataFrames;
The whole documentation is available here. For now, we will try to load a simple CSV file and play with it. You can use the iris dataset. It is a toy dataset meant for various machine learning tasks. Let’s download it and read it into a variable called iris
:
1
iris = readtable("iris.csv")
Having your variable ready, let’s see what we can do with it. First, take a look at its size:
1
2
size(iris)
(150, 5)
150 rows and 5 columns. What are the column names?
1
2
3
4
5
6
7
8
names(iris)
5-element Array{Symbol,1}:
:Sepal_Length
:Sepal_Width
:Petal_Length
:Petal_Width
:Species
As you can see, columns are represented as Symbols. DataFrame
lets you access its column by name (represented as a Symbol):
1
sepal_length_column = iris[:Sepal_Length]
Let’s see the type of the resulting column:
1
2
typeof(iris[:Sepal_Length])
DataArray{Float64,1} (constructor with 1 method)
Another way to access a data frame column is by using an index. In Julia, all built-in indexing starts with 1. To access the sepal length (first) column, you can use:
1
sepal_length_column = iris[1]
Can we select a region of the data frame as is possible in R? Julia gives you that too. Accessing the 2nd and 3rd columns of the last 10 rows is as easy as:
1
iris_sub = iris[end-10:end, 2:3]
What about writing to a DataFrame
? Can you replace a whole column? Yes, to replace it with a randomly generated vector, try:
1
iris[1] = randn(nrow(iris))
What about replacing a row? Let’s try to copy the first row and write it as the last one.
1
iris[end, :] = iris[1, :]
Are they equal now?
1
2
iris[end, :] == iris[1, :]
true
It is also easy to convert a DataFrame
to a matrix using the convert
function:
1
iris_matrix = convert(Array, iris)
The type of iris_matrix
is then a square Array
of Any
. Julia will specify the resulting type as much as possible. So if your input DataFrame
consists of floats only, it will convert it to a square Array
of Float64
.
1
iris_matrix = convert(Array, iris[1:2])
In summary, it seems like all basic R data.frame
-like operations are supported in Julia too. Of course, data.frame
in R is not just a data type/structure; it is built-in, and many functions in R assume it as input, so it is pretty natural to use data.frames
in R. It is your basic structure, in fact. The existence of the same interface in Julia does not constitute its power, of course. It is the number of functions around data.frame
in R that does. And I am not sure if DataFrames
are that highly supported in Julia. The final point, anyway, is that the DataFrames
package is a good starting point for someone who has been using R and wants to jump into Julia quickly.
By int8