[R Course] Data Importation with R

R Courses

Import data from different files type: CSV, Gsheet and Excel.

Thierry Warin https://warin.ca/aboutme.html (HEC Montréal and CIRANO (Canada))https://www.hec.ca/en/profs/thierry.warin.html
05-08-2019

Static data

Comma Delimited Files

To be able to execute the following example, download the data called dataForGraph in a .csv format. Once you’ve done that, let’s import the file in Rstudio..

# Import csv files with comma as separator
dataset <- read_csv("dataForGraph.csv")
date country variable value
2010 australia GDP 1142250506
2011 australia GDP 1389919156
2012 australia GDP 1537477830
2013 australia GDP 1563950959
2014 australia GDP 1454675480
2015 australia GDP 1339539063
2010 belgium GDP 483577483
2011 belgium GDP 526975257
2012 belgium GDP 497815990
2013 belgium GDP 521370528
2014 belgium GDP 531234804
2015 belgium GDP 454039037
2010 canada GDP 1613406135
2011 canada GDP 1788703386
2012 canada GDP 1824288757
2013 canada GDP 1837443487
2014 canada GDP 1783775591
2015 canada GDP 1550536520

The read_csv() function contains useful arguments:

No header

What happened if we declare no header:

# Import a csv file with no column names
dataset <- read_csv("dataForGraph.csv", col_names = FALSE)
X1 X2 X3 X4
date country variable value
2010 australia GDP 1142250506
2011 australia GDP 1389919156
2012 australia GDP 1537477830
2013 australia GDP 1563950959
2014 australia GDP 1454675480
2015 australia GDP 1339539063
2010 belgium GDP 483577483
2011 belgium GDP 526975257
2012 belgium GDP 497815990
2013 belgium GDP 521370528
2014 belgium GDP 531234804
2015 belgium GDP 454039037
2010 canada GDP 1613406135
2011 canada GDP 1788703386
2012 canada GDP 1824288757
2013 canada GDP 1837443487
2014 canada GDP 1783775591
2015 canada GDP 1550536520

As you can see, the columns’ names are considered as a row and not the header.

Provide header

What if we provide a header:

# Provide the column names: "a", "x", "y", "z" to the dataframe
dataset <- read_csv("dataForGraph.csv", col_names = c("a", "x", "y", "z"))
a x y z
date country variable value
2010 australia GDP 1142250506
2011 australia GDP 1389919156
2012 australia GDP 1537477830
2013 australia GDP 1563950959
2014 australia GDP 1454675480
2015 australia GDP 1339539063
2010 belgium GDP 483577483
2011 belgium GDP 526975257
2012 belgium GDP 497815990
2013 belgium GDP 521370528
2014 belgium GDP 531234804
2015 belgium GDP 454039037
2010 canada GDP 1613406135
2011 canada GDP 1788703386
2012 canada GDP 1824288757
2013 canada GDP 1837443487
2014 canada GDP 1783775591
2015 canada GDP 1550536520

The original columns’ names are automatically put as a row and the names we provided are the header.

Skip lines

Let’s see, what happened if we skip the first row:

# Skip (delete) the first line of a csv file
dataset <- read_csv("dataForGraph.csv", skip = 1)
2010 australia GDP 1142250506
2011 australia GDP 1389919156
2012 australia GDP 1537477830
2013 australia GDP 1563950959
2014 australia GDP 1454675480
2015 australia GDP 1339539063
2010 belgium GDP 483577483
2011 belgium GDP 526975257
2012 belgium GDP 497815990
2013 belgium GDP 521370528
2014 belgium GDP 531234804
2015 belgium GDP 454039037
2010 canada GDP 1613406135
2011 canada GDP 1788703386
2012 canada GDP 1824288757
2013 canada GDP 1837443487
2014 canada GDP 1783775591
2015 canada GDP 1550536520

The names of the columns have been deleted and the first row of the data has been put as the header.

Read in a subset

Now, what if we want to read only the first row of the dataset:

dataset <- read_csv("dataForGraph.csv", n_max = 1)
date country variable value
2010 australia GDP 1142250506

The header is kept and the first row.

Finally, let’s use multiple options together:

# Read the first 6 lines
dataset <- read_csv("dataForGraph.csv", skip = 1,
                                        col_names = c("Years", "Countries", "Variables", "Values"), 
                                        n_max = 6)
Years Countries Variables Values
2010 australia GDP 1142250506
2011 australia GDP 1389919156
2012 australia GDP 1537477830
2013 australia GDP 1563950959
2014 australia GDP 1454675480
2015 australia GDP 1339539063

Semi-colon Delimited Files

To import csv files with semi column as separator:

read_csv2("file2.csv")

Files with Any Delimiter

To import txt files with any delimiter:

read_delim("file.txt", delim = "|")

Tab Delimited Files

To import tsv files (tab as separator):

read_tsv("file.tsv")

read_table("file.tsv").

Xslx

# Loading
library("readxl")
# xls files
my_data <- read_excel("my_file.xls")
# xlsx files
my_data <- read_excel("my_file.xlsx")

Semi dynamic data

Gsheet

To read a Google Sheet, load the gsheet library first and then by using the gsheet2tbl() function. You have to copy the link between parenthesis and quotation marks as following:

# Loading the gsheet library
library(gsheet)
data <- gsheet2tbl("https://docs.google.com/spreadsheets/d/1uLaXke-KPN28-ESPPoihk8TiXVWp5xuNGHW7w7yqLCc/edit?usp=sharing")
date country GDP section
2010 australia 1142250506 a
2011 australia 1389919156 d
2012 australia 1537477830 c
2013 australia 1563950959 e
2014 australia 1454675480 e
2015 australia 1339539063 f
2010 belgium 483577483 c
2011 belgium 526975257 e
2012 belgium 497815990 a
2013 belgium 521370528 f
2014 belgium 531234804 d
2015 belgium 454039037 d
2010 canada 1613406135 c
2011 canada 1788703386 f
2012 canada 1824288757 a
2013 canada 1837443487 a
2014 canada 1783775591 b
2015 canada 1550536520 b


Citation

For attribution, please cite this work as

Warin (2019, May 8). Thierry Warin, PhD: [R Course] Data Importation with R. Retrieved from https://warin.ca/posts/rcourse-dataimportationwithr/

BibTeX citation

@misc{warin2019[r,
  author = {Warin, Thierry},
  title = {Thierry Warin, PhD: [R Course] Data Importation with R},
  url = {https://warin.ca/posts/rcourse-dataimportationwithr/},
  year = {2019}
}