As for someone experienced in R I naturally look for data.frame-like structure in Julia to load csv file into it. And luckily it is present and seems to work pretty well. You need to install package called “DataFrames” to operate on R-like dataframes:
Pkg.add("DataFrames")
and load it after installation:
using DataFrames;
The whole documentation is available here. For now we will try to load simple CSV file and play with it. You can use iris dataset. It is a toy dataset meant for various machine learning tasks. Lets download it and read into a variable called iris
iris = readtable("iris.csv")
Having your variable ready lets see what we can do with it. Lets first take a look at its size:
size(iris) (150,5)
150 rows and 5 columns. What are the column names?
names(iris) 5-element Array{Symbol,1}: :Sepal_Length :Sepal_Width :Petal_Length :Petal_Width :Species
As you can see columns are represented as Symbols. DataFrame let you access its column by name (represented as Symbol):
sepal_length_column = iris[:Sepal_Length]
lets see the type of resulting column:
typeof(iris[:Sepal_Length]) DataArray{Float64,1} (constructor with 1 method)
Another way to access data frame column is by using index. In Julia all built-in indexing starts with 1, then to ask for sepal length (first) column you can use:
sepal_length_column = iris[1]
Can we select a region of data frame as it is possible in R? Julia gives you that too. Accessing 2th and 3rd column of last 10 rows is as easy as:
iris_sub = iris[end-10:end,2:3]
What about writing to DataFrame. Can you replace whole column? Yes, to replace it with randomly generated vector try:
iris[1] = randn(nrow(iris))
What about replacing a row? Lets try to copy first row and write it as the last one.
iris[end,:] = iris[1,:]
Are they equal now?
iris[end,:] == iris[1,:] true
It is also easy to convert DataFrame to matrix using convert function
iris_matrix = convert(DataFrame, iris)
The type of iris_matrix is then square Array of Any. Julia will specify resulting type as much as possible. So if your input DataFrame consists of Floats only it will convert it to square Array of Float64.
iris_matrix = convert(DataFrame, iris[1:2])
In summary it seems like all basic R-data.frame like operations are supported in Julia too. Of course data.frame in R is not just a data type/structure, it is built-in and many functions in R assume it as input so it is pretty natural to use data.frames in R. It is your basic structure in fact. Existance of same interface in Julia does not constitute about its power, of course. It is the number of functions around data.frame in R that does. And I am not sure if DataFrames are that highly supported in Julia. The final point anyway is that DataFrames package is a good starting point for someone who has been using R and wants to jump into Julia quickly.