Import data from different files type: CSV, Gsheet and Excel.
To be able to execute the following example, download the data called dataForGraph in a .csv format. Once you’ve done that, let’s import the file in Rstudio..
# Import csv files with comma as separator
dataset <- read_csv("dataForGraph.csv")
date | country | variable | value |
---|---|---|---|
2010 | australia | GDP | 1142250506 |
2011 | australia | GDP | 1389919156 |
2012 | australia | GDP | 1537477830 |
2013 | australia | GDP | 1563950959 |
2014 | australia | GDP | 1454675480 |
2015 | australia | GDP | 1339539063 |
2010 | belgium | GDP | 483577483 |
2011 | belgium | GDP | 526975257 |
2012 | belgium | GDP | 497815990 |
2013 | belgium | GDP | 521370528 |
2014 | belgium | GDP | 531234804 |
2015 | belgium | GDP | 454039037 |
2010 | canada | GDP | 1613406135 |
2011 | canada | GDP | 1788703386 |
2012 | canada | GDP | 1824288757 |
2013 | canada | GDP | 1837443487 |
2014 | canada | GDP | 1783775591 |
2015 | canada | GDP | 1550536520 |
The read_csv()
function contains useful arguments:
What happened if we declare no header:
# Import a csv file with no column names
dataset <- read_csv("dataForGraph.csv", col_names = FALSE)
X1 | X2 | X3 | X4 |
---|---|---|---|
date | country | variable | value |
2010 | australia | GDP | 1142250506 |
2011 | australia | GDP | 1389919156 |
2012 | australia | GDP | 1537477830 |
2013 | australia | GDP | 1563950959 |
2014 | australia | GDP | 1454675480 |
2015 | australia | GDP | 1339539063 |
2010 | belgium | GDP | 483577483 |
2011 | belgium | GDP | 526975257 |
2012 | belgium | GDP | 497815990 |
2013 | belgium | GDP | 521370528 |
2014 | belgium | GDP | 531234804 |
2015 | belgium | GDP | 454039037 |
2010 | canada | GDP | 1613406135 |
2011 | canada | GDP | 1788703386 |
2012 | canada | GDP | 1824288757 |
2013 | canada | GDP | 1837443487 |
2014 | canada | GDP | 1783775591 |
2015 | canada | GDP | 1550536520 |
As you can see, the columns’ names are considered as a row and not the header.
What if we provide a header:
a | x | y | z |
---|---|---|---|
date | country | variable | value |
2010 | australia | GDP | 1142250506 |
2011 | australia | GDP | 1389919156 |
2012 | australia | GDP | 1537477830 |
2013 | australia | GDP | 1563950959 |
2014 | australia | GDP | 1454675480 |
2015 | australia | GDP | 1339539063 |
2010 | belgium | GDP | 483577483 |
2011 | belgium | GDP | 526975257 |
2012 | belgium | GDP | 497815990 |
2013 | belgium | GDP | 521370528 |
2014 | belgium | GDP | 531234804 |
2015 | belgium | GDP | 454039037 |
2010 | canada | GDP | 1613406135 |
2011 | canada | GDP | 1788703386 |
2012 | canada | GDP | 1824288757 |
2013 | canada | GDP | 1837443487 |
2014 | canada | GDP | 1783775591 |
2015 | canada | GDP | 1550536520 |
The original columns’ names are automatically put as a row and the names we provided are the header.
Let’s see, what happened if we skip the first row:
# Skip (delete) the first line of a csv file
dataset <- read_csv("dataForGraph.csv", skip = 1)
2010 | australia | GDP | 1142250506 |
---|---|---|---|
2011 | australia | GDP | 1389919156 |
2012 | australia | GDP | 1537477830 |
2013 | australia | GDP | 1563950959 |
2014 | australia | GDP | 1454675480 |
2015 | australia | GDP | 1339539063 |
2010 | belgium | GDP | 483577483 |
2011 | belgium | GDP | 526975257 |
2012 | belgium | GDP | 497815990 |
2013 | belgium | GDP | 521370528 |
2014 | belgium | GDP | 531234804 |
2015 | belgium | GDP | 454039037 |
2010 | canada | GDP | 1613406135 |
2011 | canada | GDP | 1788703386 |
2012 | canada | GDP | 1824288757 |
2013 | canada | GDP | 1837443487 |
2014 | canada | GDP | 1783775591 |
2015 | canada | GDP | 1550536520 |
The names of the columns have been deleted and the first row of the data has been put as the header.
Now, what if we want to read only the first row of the dataset:
dataset <- read_csv("dataForGraph.csv", n_max = 1)
date | country | variable | value |
---|---|---|---|
2010 | australia | GDP | 1142250506 |
The header is kept and the first row.
Finally, let’s use multiple options together:
Years | Countries | Variables | Values |
---|---|---|---|
2010 | australia | GDP | 1142250506 |
2011 | australia | GDP | 1389919156 |
2012 | australia | GDP | 1537477830 |
2013 | australia | GDP | 1563950959 |
2014 | australia | GDP | 1454675480 |
2015 | australia | GDP | 1339539063 |
To import csv files with semi column as separator:
read_csv2("file2.csv")
To import txt files with any delimiter:
read_delim("file.txt", delim = "|")
To import tsv files (tab as separator):
# xls files
my_data <- read_excel("my_file.xls")
# xlsx files
my_data <- read_excel("my_file.xlsx")
To read a Google Sheet, load the gsheet library first and then by using the gsheet2tbl()
function. You have to copy the link between parenthesis and quotation marks as following:
data <- gsheet2tbl("https://docs.google.com/spreadsheets/d/1uLaXke-KPN28-ESPPoihk8TiXVWp5xuNGHW7w7yqLCc/edit?usp=sharing")
date | country | GDP | section |
---|---|---|---|
2010 | australia | 1142250506 | a |
2011 | australia | 1389919156 | d |
2012 | australia | 1537477830 | c |
2013 | australia | 1563950959 | e |
2014 | australia | 1454675480 | e |
2015 | australia | 1339539063 | f |
2010 | belgium | 483577483 | c |
2011 | belgium | 526975257 | e |
2012 | belgium | 497815990 | a |
2013 | belgium | 521370528 | f |
2014 | belgium | 531234804 | d |
2015 | belgium | 454039037 | d |
2010 | canada | 1613406135 | c |
2011 | canada | 1788703386 | f |
2012 | canada | 1824288757 | a |
2013 | canada | 1837443487 | a |
2014 | canada | 1783775591 | b |
2015 | canada | 1550536520 | b |
For attribution, please cite this work as
Warin (2019, May 8). Thierry Warin, PhD: [R Course] Data Importation with R. Retrieved from https://warin.ca/posts/rcourse-dataimportationwithr/
BibTeX citation
@misc{warin2019[r, author = {Warin, Thierry}, title = {Thierry Warin, PhD: [R Course] Data Importation with R}, url = {https://warin.ca/posts/rcourse-dataimportationwithr/}, year = {2019} }