[R Course] Data Wrangling with R: Tidyverse

A beginner’s guide.

Thierry Warin https://warin.ca/aboutme.html (HEC Montréal and CIRANO (Canada))https://www.hec.ca/en/profs/thierry.warin.html
03-10-2020

Table of Contents


This course will show you different ways of manipulating data thanks to the following Dynamic Animations. (Aden-Buie 2020)

Tidy Animated Verbs

Tidy Data

Tidy data follows the following three rules:

  1. Each variable has its own column.
  2. Each observation has its own row.
  3. Each value has its own cell.

Many of the tools in the tidyverse expect data to be formatted as a tidy dataset and the tidyr package provides functions to help you organize your data into tidy data.

To go from wide to long or to long to wide, you’ll need the spread() and gather() functions.

Spread

The spread function allows you to transform your data from a long format to a wide format.

Gather

The gather function allows you to transform your data from a wide format to a long format.

Mutating Joins

To join two tables together : left_join(), right_join(), inner_join(), semi_join etc.

Left Join

All rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns.

Right Join

All rows from y, and all columns from x and y. Rows in y with no match in x will have NA values in the new columns.

Inner Join

All rows from x where there are matching values in y, and all columns from x and y.

Full Joing

All rows and all columns from both x and y. Where there are not matching values, returns NA for the one missing.

Filtering Joins

Semi Join

All rows from x where there are matching values in y, keeping just columns from x.

Anti Join

All rows from x where there are not matching values in y, keeping just columns from x.

Set operations

Intersect

Common rows in both x and y, keeping just unique rows.

Union

All unique rows from x and y.

Aden-Buie, Garrick. 2020. “Gadenbuie/Tidyexplain.” https://github.com/gadenbuie/tidyexplain.

Citation

For attribution, please cite this work as

Warin (2020, March 10). Thierry Warin: [R Course] Data Wrangling with R: Tidyverse. Retrieved from https://warin.ca/posts/datawaranglingwithr-tidyverse/

BibTeX citation

@misc{warin2020[r,
  author = {Warin, Thierry},
  title = {Thierry Warin: [R Course] Data Wrangling with R: Tidyverse},
  url = {https://warin.ca/posts/datawaranglingwithr-tidyverse/},
  year = {2020}
}