R nanocourse 4: Dynamic Documents

Introduction

After the three first parts of the nanocourse, you have learned the basics of manipulating R and the Markdown language in the context of International Business. This session will consolidate all previous steps in order to create dynamic documents from an initial RMarkdown document.

Goals

At the end of the lab, you should be able to:

explore the YAML options;
understand the details of the R chunk settings;
{import | transform | visualize} your data in a reproducible process.
export your RMarkdown document in a {PDF | .html | .doc} file;
transform your RMarkdown document into a ioslide format (powerpoint-like);

The goal of this session is to create a RMarkdown document regarding the analysis of a specific industrial sector in Russia. This RMarkdown document will produce both a PDF file and an ioslide presentation.

Keywords: RMarkdown; ioslide; beamer; RStudio; reproducible research

List of all commands studied so far

Throughout the last three Nanocourses, we have seen a total of 28 different commands. With these commands, you will be able to perform your analysis.

Command	Detail	R nanocourse
	Embedding an image	Nanocourse 1
	Text in bold	Nanocourse 1
	Text in italic	Nanocourse 1
	URL link	Nanocourse 1
	First level title	Nanocourse 1
	Second level title	Nanocourse 1
	Third level title	Nanocourse 1
	Equation in LaTeX format	Nanocourse 1
library()	Loading library in the environment	Nanocourse 2
read.csv()	Reading .csv file	Nanocourse 2
readGoogleSheet()	Reading Google Sheet document	Nanocourse 2
cleanGoogleTable()	Cleaning Google Sheet document	Nanocourse 2
head()	Show first lines of a dataframe	Nanocourse 2
summary()	Description of the dataframe	Nanocourse 2
as.numeric()	Treat data as numeric	Nanocourse 2
as.factor()	Treat data as factor	Nanocourse 2
geom_bar()	Bar chart	Nanocourse 2
geom_line()	Line chart	Nanocourse 2
geom_point()	Point chart	Nanocourse 2
gsheet2tbl()	Load Google Sheet document	Nanocourse 3
dataframe$newColumn	Create new column in dataframe	Nanocourse 3
$newColumn <- NULL	Erase column	Nanocourse 3
dim()	Size of the dataframe	Nanocourse 3
filter()	Subset of a dataframe	Nanocourse 3
arrange()	Sort by ascending value	Nanocourse 3
arrange(,desc())	Sort by descending value	Nanocourse 3
dcast()	Long to wide format	Nanocourse 3
melt()	Wide to long format	Nanocourse 3
full_join()	Merge 2 dataframes based on common columns	Nanocourse 3

Syntax options

YAML

The YAML is the first lines of code telling how your document will be rendered. It lies in the top of your document between 3 dashed lines.

---
title: "R nanocourse 4: Dynamic Documents"
author: "Thierry Warin"
date: "12/02/2020"
output: 
  html_document:
    toc: yes
    toc_depth:3
  pdf_document:
    toc: yes
    toc_depth:3
---

In the output section, you have several options:

toc: yes/no, which enable/disable the table of content
toc_depth: number, which set the depth of the table of content

You can set a specific date to your document, but also change it so that it will render the actual date of rendering. For example, you are writing a document that will be compiled in three months, the date that will be shown will be the actual date. To do so, you can enter a command in R code that will seek for the actual date of your console, using the format() function.

date: `r format(Sys.time(), '%d %B, %Y')`

Task:

Create a new document (.Rmd): File > New File > R Markdown...

Set your YAML settings according to the date of knitting

R chunk settings

Every line of R code has to be confined between dashed lines for the RStudio console to interpret them.

However, it is possible to set different parameters in order to render different outputs for each R chunk. These settings have to specified in the first line of the R chunk, such as:

echo = FALSE/TRUE: if FALSE, the code will not been shown
warning = FALSE/TRUE: if FALSE, warnings will not been shown
message = FALSE/TRUE: if FALSE, messages generated by the code will not been shown
fig.align = 'center'/'left'/'right': will align the figure generated depending on the setting

Analysis of an industrial sector

With your previous nanocourses, you have in hand all the algorithms (see first section of this nanocourse: List of all command lines studied so far) needed for your analysis. Let's take the UNIDO database and focus on a particular industrial sector (sugar industry - ISIC1542) to reveal interesting insights. For the details of each line of code, please refer to the Laboratory Nanocourse 3.

Task:

Load the UNIDO database regarding the overall industrial sector (gs15x)

Subset the dataframe in order to keep only data appropriate (IsicCode = 1542)

# Loading packages
library(gsheet)
library(dplyr)

# URL of the UNIDO dataset
gs15x <- "https://docs.google.com/spreadsheets/d/1aTJFKmkH2oxYcg0aiWeMAttGWKdM1u2KS5OyIlUkI6Q/edit?usp=sharing"

# Using the gsheet2tbl function to import the UNIDO dataset into the RStudio console
dataUnido <- gsheet2tbl(gs15x)

# Transform variables into numeric values
dataUnido$Value <- as.numeric(dataUnido$Value)
dataUnido$Tablecode <- as.numeric(dataUnido$Tablecode)
dataUnido$CountryCode <- as.numeric(dataUnido$CountryCode)
dataUnido$Year <- as.numeric(dataUnido$Year)
dataUnido$IsicCode <- as.numeric(dataUnido$IsicCode)
dataUnido$Unit <- NULL

# Subset concerning only data for the IsicCode = 1542
dataUnidoSubset <- filter(dataUnido, IsicCode == 1542)

Number of employees

Task:

Select a subset of the dataset regarding only the number of employees (Tablecode == 04)

Provide for 2010 a ranking of the country with the most important number of employees

Visualize and compare the top 7 countries in terms of employees

# Data regarding the number of employees
dataEmployees <- filter(dataUnidoSubset, Tablecode == 4)

# Data regarding 2010
dataEmployees2010 <- filter(dataEmployees, Year == 2010)

# List the 10 most important countries in terms of employees in 2010
ranking <- arrange(dataEmployees2010, desc(Value))

The list of the top 7 countries in terms of employees in the Sugar industry in 2010 are :

head(ranking, n=7)

Now, let's visualize these data in a bar chart.

library(ggplot2)
library(ggthemes)
library(reshape2)

# Transform the column 'CountryCode' in a factor type
ranking$CountryCode <- as.factor(ranking$CountryCode)

# Produce a bar chart
ggplot(data = ranking[1:7,], aes(x = CountryCode, y = Value, fill = CountryCode)) + 
  geom_bar(stat = "identity", width = 0.5, position = "dodge")  +  
  ylab("Number of employees")  +
  xlab("") +
  guides(col = guide_legend(row = 1)) +
  theme_hc() +
  scale_fill_brewer(direction = -1)

So the most important countries in terms of employees in the sugar industry in 2010 are:

China
Russia
Mexico
Iran
Vietnam
Ukraine
Colombia

Number of establishments

Based on previous results, we would like to observe the evolution of the number of establishments in the top 3 countries as of 2010 in terms of employees (i.e. China, Russia, Mexico).

Task:

From the sugar industry dataset, select data corresponding to these 3 countries for all available years

Visualize the evolution of the number of establishments through time

# Subset of the dataEmployees dataframe concerning only the three selected countries
dataEmployeesCountries <- filter(dataEmployees, CountryCode == 156 | CountryCode == 643 | CountryCode == 484)

# Transform the column 'CountryCode' in a factor type
dataEmployeesCountries$CountryCode <- as.factor(dataEmployeesCountries$CountryCode)

# Produce a line chart
ggplot(data = dataEmployeesCountries, aes(x = Year, y = Value, color = CountryCode)) +
  geom_line()  + 
  ylab("")  +
  xlab("") +
  geom_smooth(span = 0.8) +
  ggtitle("") +
  theme_hc() +
  scale_color_brewer(direction = -1) +
  guides(fill=FALSE) +
  geom_point(colour = "blue", size = 2,shape = 22)

File format

PDF / HTML / doc

Now that your analysis has been completed, you can export your document in different format from the same RMarkdown file. To do so, click on the arrow on the right of the "Knit HTML" buttom and select the appropriate format (HTML, PDF, doc).

Task:

Render your RMarkdown document into a PDF file

Render your RMarkdown document into a HTML file

Ioslide / Beamer presentation

From the same RMarkdown file, it is possible to generate a "powerpoint" presentation. To do so, please consider the following instructions. First, you have to change the YAML options: in the output field, insert:

output: beamer_presentation (powerpoint)
output: ioslides_presentation (interactive presentation)

Secondly, a proper typology has to be adopted:

For a first-level slide, insert "#" before the title
For a second-level slide, insert "##" before the title

Task:

Open a new RMarkdown document

Select the code corresponding to the industrial analysis

Create an ioslide/beamer document

Showcase your results in the two different formats

References

Resources

For more on the RMarkdown syntax, please refer to:

https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf

Packages

ggplot2: H. Wickham & W. Chang
ggthemes: J. Arnold et al.
reshape2: H. Wickham
gsheet: M. Conway
dplyr: H. Wickham & R. Francois
tidyr: H. Wickham

Acknowledgments

To cite this course:

Warin, Thierry. 2020. “Nüance-R: R Nanocourses.” doi:10.6084/m9.figshare.11842416.v2.