Thierry Warin, PhD: [R Course] Data Importation with R

Thierry Warin

doi:10.6084/m9.figshare.8794253.v2

Static data

Comma Delimited Files

To be able to execute the following example, download the data called dataForGraph in a .csv format. Once you’ve done that, let’s import the file in Rstudio..

library(readr)

# Import csv files with comma as separator
dataset <- read_csv("dataForGraph.csv")

date	country	variable	value
2010	australia	GDP	1142250506
2011	australia	GDP	1389919156
2012	australia	GDP	1537477830
2013	australia	GDP	1563950959
2014	australia	GDP	1454675480
2015	australia	GDP	1339539063
2010	belgium	GDP	483577483
2011	belgium	GDP	526975257
2012	belgium	GDP	497815990
2013	belgium	GDP	521370528
2014	belgium	GDP	531234804
2015	belgium	GDP	454039037
2010	canada	GDP	1613406135
2011	canada	GDP	1788703386
2012	canada	GDP	1824288757
2013	canada	GDP	1837443487
2014	canada	GDP	1783775591
2015	canada	GDP	1550536520

The read_csv() function contains useful arguments:

No header

What happened if we declare no header:

# Import a csv file with no column names
dataset <- read_csv("dataForGraph.csv", col_names = FALSE)

X1	X2	X3	X4
date	country	variable	value
2010	australia	GDP	1142250506
2011	australia	GDP	1389919156
2012	australia	GDP	1537477830
2013	australia	GDP	1563950959
2014	australia	GDP	1454675480
2015	australia	GDP	1339539063
2010	belgium	GDP	483577483
2011	belgium	GDP	526975257
2012	belgium	GDP	497815990
2013	belgium	GDP	521370528
2014	belgium	GDP	531234804
2015	belgium	GDP	454039037
2010	canada	GDP	1613406135
2011	canada	GDP	1788703386
2012	canada	GDP	1824288757
2013	canada	GDP	1837443487
2014	canada	GDP	1783775591
2015	canada	GDP	1550536520

As you can see, the columns’ names are considered as a row and not the header.

Provide header

What if we provide a header:

# Provide the column names: "a", "x", "y", "z" to the dataframe
dataset <- read_csv("dataForGraph.csv", col_names = c("a", "x", "y", "z"))

a	x	y	z
date	country	variable	value
2010	australia	GDP	1142250506
2011	australia	GDP	1389919156
2012	australia	GDP	1537477830
2013	australia	GDP	1563950959
2014	australia	GDP	1454675480
2015	australia	GDP	1339539063
2010	belgium	GDP	483577483
2011	belgium	GDP	526975257
2012	belgium	GDP	497815990
2013	belgium	GDP	521370528
2014	belgium	GDP	531234804
2015	belgium	GDP	454039037
2010	canada	GDP	1613406135
2011	canada	GDP	1788703386
2012	canada	GDP	1824288757
2013	canada	GDP	1837443487
2014	canada	GDP	1783775591
2015	canada	GDP	1550536520

The original columns’ names are automatically put as a row and the names we provided are the header.

Skip lines

Let’s see, what happened if we skip the first row:

# Skip (delete) the first line of a csv file
dataset <- read_csv("dataForGraph.csv", skip = 1)

2010	australia	GDP	1142250506
2011	australia	GDP	1389919156
2012	australia	GDP	1537477830
2013	australia	GDP	1563950959
2014	australia	GDP	1454675480
2015	australia	GDP	1339539063
2010	belgium	GDP	483577483
2011	belgium	GDP	526975257
2012	belgium	GDP	497815990
2013	belgium	GDP	521370528
2014	belgium	GDP	531234804
2015	belgium	GDP	454039037
2010	canada	GDP	1613406135
2011	canada	GDP	1788703386
2012	canada	GDP	1824288757
2013	canada	GDP	1837443487
2014	canada	GDP	1783775591
2015	canada	GDP	1550536520

The names of the columns have been deleted and the first row of the data has been put as the header.

Read in a subset

Now, what if we want to read only the first row of the dataset:

dataset <- read_csv("dataForGraph.csv", n_max = 1)

date	country	variable	value
2010	australia	GDP	1142250506

The header is kept and the first row.

Finally, let’s use multiple options together:

# Read the first 6 lines
dataset <- read_csv("dataForGraph.csv", skip = 1,
                                        col_names = c("Years", "Countries", "Variables", "Values"), 
                                        n_max = 6)

Years	Countries	Variables	Values
2010	australia	GDP	1142250506
2011	australia	GDP	1389919156
2012	australia	GDP	1537477830
2013	australia	GDP	1563950959
2014	australia	GDP	1454675480
2015	australia	GDP	1339539063

Semi-colon Delimited Files

To import csv files with semi column as separator:

read_csv2("file2.csv")

Files with Any Delimiter

To import txt files with any delimiter:

read_delim("file.txt", delim = "|")

Tab Delimited Files

To import tsv files (tab as separator):

read_tsv("file.tsv")

read_table("file.tsv").

Xslx

# Loading
library("readxl")

# xls files
my_data <- read_excel("my_file.xls")

# xlsx files
my_data <- read_excel("my_file.xlsx")

Semi dynamic data

Gsheet

To read a Google Sheet, load the gsheet library first and then by using the gsheet2tbl() function. You have to copy the link between parenthesis and quotation marks as following:

# Loading the gsheet library
library(gsheet)

data <- gsheet2tbl("https://docs.google.com/spreadsheets/d/1uLaXke-KPN28-ESPPoihk8TiXVWp5xuNGHW7w7yqLCc/edit?usp=sharing")

date	country	GDP	section
2010	australia	1142250506	a
2011	australia	1389919156	d
2012	australia	1537477830	c
2013	australia	1563950959	e
2014	australia	1454675480	e
2015	australia	1339539063	f
2010	belgium	483577483	c
2011	belgium	526975257	e
2012	belgium	497815990	a
2013	belgium	521370528	f
2014	belgium	531234804	d
2015	belgium	454039037	d
2010	canada	1613406135	c
2011	canada	1788703386	f
2012	canada	1824288757	a
2013	canada	1837443487	a
2014	canada	1783775591	b
2015	canada	1550536520	b

[R Course] Data Importation with R