[R Course] Data Wrangling with R: Combining Tables

Use different techniques of combining tables such as mutating join, filtering join and biding.

05-27-2019

Set up

To show you how the following functions work, let’s create two dataframes:

``````# Create a dataframe called "group1"
group1 <- data.frame(name=c("Laura", "Paul", "Ashley", "Sarah", "Mark", "Lori"),
age=c(22, 28, 19, 25, 27, 18),
gender=c("female","male","female","female","male", "female"))
``````
``````    name age gender
1  Laura  22 female
2   Paul  28   male
3 Ashley  19 female
4  Sarah  25 female
5   Mark  27   male
6   Lori  18 female``````
``````# Create a dataframe called "group2"
group2 <- data.frame(name=c("Laura", "Paul", "Ashley", "Sarah", "Mark", "Sacha"),
car=c("Mazda", "Toyota", "Nissan", "Toyota", "Chevrolet", "Honda"))
``````
``````    name       car
1  Laura     Mazda
2   Paul    Toyota
3 Ashley    Nissan
4  Sarah    Toyota
5   Mark Chevrolet
6  Sacha     Honda``````

Mutating join

Left join

To join matching rows from group2 to group1:

``````dplyr::left_join(group1, group2, by = "name")
``````
``````    name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet
6   Lori  18 female      <NA>``````

Right join

To join matching rows from group1 to group2:

``````dplyr::right_join(group1, group2, by = "name")
``````
``````    name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet
6  Sacha  NA   <NA>     Honda``````

Inner join

To join data by retaining only rows in both sets:

``````dplyr::inner_join(group1, group2, by = "name")
``````
``````    name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet``````

Full join

To join data by retaining all values, all rows:

``````dplyr::full_join(group1, group2, by = "name")
``````
``````    name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet
6   Lori  18 female      <NA>
7  Sacha  NA   <NA>     Honda``````

Filtering join

Semi join

To join all rows in group1 that have a match in group2.

``````dplyr::semi_join(group1, group2, by = "name")
``````
``````    name age gender
1  Laura  22 female
2   Paul  28   male
3 Ashley  19 female
4  Sarah  25 female
5   Mark  27   male``````

Anti join

To join all rows in group1 that do not have a match in group2.

``````dplyr::anti_join(group1, group2, by = "name")
``````
``````  name age gender
1 Lori  18 female``````

To join all rows in group2 that do not have a match in group1, reverse the names.

``````dplyr::anti_join(group2, group1, by = "name")
``````
``````   name   car
1 Sacha Honda``````

Set operations

To use these functions you need to have same column names in each data frames, unless it won’t work.

Let’s create others dataframes:

``````# Create a dataframe called "group1"

y <- data.frame(letter=c("A", "B", "C"),
number=c(1, 2, 3))
y
``````
``````  letter number
1      A      1
2      B      2
3      C      3``````
``````# Create a dataframe called "group2"
z <- data.frame(letter=c("B", "C", "D"),
number=c(2, 3, 3))
z
``````
``````  letter number
1      B      2
2      C      3
3      D      3``````

Intersect

To get rows that appear in both y and z:

``````dplyr::intersect(y, z)
``````
``````  letter number
1      B      2
2      C      3``````

Union

To get rows that appear in either or both y and z.

``````dplyr::union(y, z)
``````
``````  letter number
1      A      1
2      B      2
3      C      3
4      D      3``````

Setdiff

To get rows that appear in y but not z.

``````dplyr::setdiff(y, z)
``````
``````  letter number
1      A      1``````

Binding

Rows

To append z to y as new rows:

``````dplyr::bind_rows(y, z)
``````
``````  letter number
1      A      1
2      B      2
3      C      3
4      B      2
5      C      3
6      D      3``````

Columns

To append z to y as new columns:

``````dplyr::bind_cols(y, z)
``````
``````  letter...1 number...2 letter...3 number...4
1          A          1          B          2
2          B          2          C          3
3          C          3          D          3``````

Caution: matches rows by position!

Citation

`Warin (2019, May 27). Thierry Warin, PhD: [R Course] Data Wrangling with R: Combining Tables. Retrieved from https://warin.ca/posts/rcourse-datawranglingwithr-combining/`
```@misc{warin2019[r,