Chapter 8 Data Wrangling 3/4
What makes R a compiling programming language is its facility to wrangle data on the fly. In this chapter, you will learn the basics of data manipulation. Based on the knowledge acquired in the previous chapters, you will transform datasets in order to prepare them for the chapter 10 which is all about data visualization!
We will use the United Nations Industrial Development Organization (UNIDO) dataset to illustrate this session.
At the end of the chapter, you should be able to:
- transform your dataframe from long to wide form;
You will go from a database of 655’350 points to a graphic made of 6 observations.
8.2 Long and wide form
We can observe two types of layouts in a dataset:
- wide form: 1 column per variable (longitudinal data)
- long form: 1 column with all information (panel data)
8.2.1 From Long to Wide
Presently, our dataset
dataSorted (obtained in the previous chapter) is presented in a long form. It could be interesting to switch its layout. In order to do so, you will use the
pivot_wider() function of the
Here, we want to obtain the number of establishments and the number of employees per isicCode for each year.
The dataframe wideData is a dataframe composed of 6 lines and 165 columns.
8.2.2 From Wide to Long
We can do the opposite, i.e. presenting data from a wide format to a long format, using the
pivot_longer() function. Please note the columns preceded by an exclamation mark.
In order to visualize your data in R, it is important to present your dataframe in the long format.
# Loading reshape2 library(tidyr) # Using pivot_wider() to transform a long dataframe into a wide dataframe wideData <- dataSorted %>% pivot_wider(names_from = isicCode, values_from = value) # First 6 lines head(wideData) # Dimension of the dataset dim(wideData) # Loading reshape2 library(reshape2) # Using pivot_longer() to transform from wide to long data longData <- wideData %>% pivot_longer(!c(year, tableCode, countryCode), names_to = "isicCode", values_to = "value") # Dimension of the dataframe dim(longData) # First 6 lines head(longData)
Code learned in this chapter
|head()||Returns the first rows|
|dim()||Retrieve or set the dimension of an object|
Getting your hands dirty
It’s time to practice! This exercise begins in Chapter 6 and continues through Chapter 9. This exercise is therefore divided into 4 parts. For this exercise, you’ll work with a csv file available on Github in the chapter6 folder.
Before starting the third part of this exercise, let’s remember the first two parts:
- Step 1 : Import via a csv
Import the csv file called
- Step 2 : Import via a gsheet
Import a dataset containing longitude and latitude from this gsheet: https://docs.google.com/spreadsheets/d/1nehKEBKTQx11LZuo5ZJFKTVS0p5y1ysMPSOSX_m8dS8/edit?usp=sharing
- Step 3 : Delete the column
Delete the column
- Step 4 : Filter the data
Filter the data to only keep the following countries: “United States”, “Canada”, “Japan”, “Belgium” and “France”.
Now, let’s begin with the third part of this exercise:
- Step 5 : “Lengthens” the data
You need to “lengthens” (modify from wide to long) the dataframe “gdp2” to get three column: “country”, “year”, “gdp”.