[R Course] Data Wrangling with R: Combining Tables

Use different techniques of combining tables such as mutating join, filtering join and biding.

Thierry Warin https://warin.ca/aboutme.html (HEC Montréal and CIRANO (Canada))https://www.hec.ca/en/profs/thierry.warin.html
05-27-2019

Table of Contents


Set up

To show you how the following functions work, let’s create two dataframes:


# Create a dataframe called "group1"
group1 <- data.frame(name=c("Laura", "Paul", "Ashley", "Sarah", "Mark", "Lori"),
                     age=c(22, 28, 19, 25, 27, 18),
                     gender=c("female","male","female","female","male", "female"))

    name age gender
1  Laura  22 female
2   Paul  28   male
3 Ashley  19 female
4  Sarah  25 female
5   Mark  27   male
6   Lori  18 female

# Create a dataframe called "group2"
group2 <- data.frame(name=c("Laura", "Paul", "Ashley", "Sarah", "Mark", "Sacha"),
                     car=c("Mazda", "Toyota", "Nissan", "Toyota", "Chevrolet", "Honda"))

    name       car
1  Laura     Mazda
2   Paul    Toyota
3 Ashley    Nissan
4  Sarah    Toyota
5   Mark Chevrolet
6  Sacha     Honda

Mutating join

Left join

To join matching rows from group2 to group1:


dplyr::left_join(group1, group2, by = "name")

    name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet
6   Lori  18 female      <NA>

Right join

To join matching rows from group1 to group2:


dplyr::right_join(group1, group2, by = "name")

    name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet
6  Sacha  NA   <NA>     Honda

Inner join

To join data by retaining only rows in both sets:


dplyr::inner_join(group1, group2, by = "name")

    name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet

Full join

To join data by retaining all values, all rows:


dplyr::full_join(group1, group2, by = "name")

    name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet
6   Lori  18 female      <NA>
7  Sacha  NA   <NA>     Honda

Filtering join

Semi join

To join all rows in group1 that have a match in group2.


dplyr::semi_join(group1, group2, by = "name")

    name age gender
1  Laura  22 female
2   Paul  28   male
3 Ashley  19 female
4  Sarah  25 female
5   Mark  27   male

Anti join

To join all rows in group1 that do not have a match in group2.


dplyr::anti_join(group1, group2, by = "name")

  name age gender
1 Lori  18 female

To join all rows in group2 that do not have a match in group1, reverse the names.


dplyr::anti_join(group2, group1, by = "name")

   name   car
1 Sacha Honda

Set operations

To use these functions you need to have same column names in each data frames, unless it won’t work.

Let’s create others dataframes:


# Create a dataframe called "group1"

y <- data.frame(letter=c("A", "B", "C"), 
                          number=c(1, 2, 3))
y

  letter number
1      A      1
2      B      2
3      C      3

# Create a dataframe called "group2"
z <- data.frame(letter=c("B", "C", "D"), 
                          number=c(2, 3, 3))
z

  letter number
1      B      2
2      C      3
3      D      3

Intersect

To get rows that appear in both y and z:


dplyr::intersect(y, z)

  letter number
1      B      2
2      C      3

Union

To get rows that appear in either or both y and z.


dplyr::union(y, z)

  letter number
1      A      1
2      B      2
3      C      3
4      D      3

Setdiff

To get rows that appear in y but not z.


dplyr::setdiff(y, z)

  letter number
1      A      1

Binding

Rows

To append z to y as new rows:


dplyr::bind_rows(y, z)

  letter number
1      A      1
2      B      2
3      C      3
4      B      2
5      C      3
6      D      3

Columns

To append z to y as new columns:


dplyr::bind_cols(y, z)

  letter...1 number...2 letter...3 number...4
1          A          1          B          2
2          B          2          C          3
3          C          3          D          3

Caution: matches rows by position!


Citation

For attribution, please cite this work as

Warin (2019, May 27). Thierry Warin: [R Course] Data Wrangling with R: Combining Tables. Retrieved from https://warin.ca/posts/datawranglingwithr-combining/

BibTeX citation

@misc{warin2019[r,
  author = {Warin, Thierry},
  title = {Thierry Warin: [R Course] Data Wrangling with R: Combining Tables},
  url = {https://warin.ca/posts/datawranglingwithr-combining/},
  year = {2019}
}