# [R Course] Data Wrangling with R: Combining Tables

Use different techniques of combining tables such as mutating join, filtering join and biding.

05-27-2019

## Set up

To show you how the following functions work, let’s create two dataframes:

``````
# Create a dataframe called "group1"
group1 <- data.frame(name=c("Laura", "Paul", "Ashley", "Sarah", "Mark", "Lori"),
age=c(22, 28, 19, 25, 27, 18),
gender=c("female","male","female","female","male", "female"))``````
``````
name age gender
1  Laura  22 female
2   Paul  28   male
3 Ashley  19 female
4  Sarah  25 female
5   Mark  27   male
6   Lori  18 female``````
``````
# Create a dataframe called "group2"
group2 <- data.frame(name=c("Laura", "Paul", "Ashley", "Sarah", "Mark", "Sacha"),
car=c("Mazda", "Toyota", "Nissan", "Toyota", "Chevrolet", "Honda"))``````
``````
name       car
1  Laura     Mazda
2   Paul    Toyota
3 Ashley    Nissan
4  Sarah    Toyota
5   Mark Chevrolet
6  Sacha     Honda``````

## Mutating join

### Left join

To join matching rows from group2 to group1:

``````
dplyr::left_join(group1, group2, by = "name")``````
``````
name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet
6   Lori  18 female      <NA>``````

### Right join

To join matching rows from group1 to group2:

``````
dplyr::right_join(group1, group2, by = "name")``````
``````
name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet
6  Sacha  NA   <NA>     Honda``````

### Inner join

To join data by retaining only rows in both sets:

``````
dplyr::inner_join(group1, group2, by = "name")``````
``````
name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet``````

### Full join

To join data by retaining all values, all rows:

``````
dplyr::full_join(group1, group2, by = "name")``````
``````
name age gender       car
1  Laura  22 female     Mazda
2   Paul  28   male    Toyota
3 Ashley  19 female    Nissan
4  Sarah  25 female    Toyota
5   Mark  27   male Chevrolet
6   Lori  18 female      <NA>
7  Sacha  NA   <NA>     Honda``````

## Filtering join

### Semi join

To join all rows in group1 that have a match in group2.

``````
dplyr::semi_join(group1, group2, by = "name")``````
``````
name age gender
1  Laura  22 female
2   Paul  28   male
3 Ashley  19 female
4  Sarah  25 female
5   Mark  27   male``````

### Anti join

To join all rows in group1 that do not have a match in group2.

``````
dplyr::anti_join(group1, group2, by = "name")``````
``````
name age gender
1 Lori  18 female``````

To join all rows in group2 that do not have a match in group1, reverse the names.

``````
dplyr::anti_join(group2, group1, by = "name")``````
``````
name   car
1 Sacha Honda``````

## Set operations

To use these functions you need to have same column names in each data frames, unless it won’t work.

Let’s create others dataframes:

``````
# Create a dataframe called "group1"

y <- data.frame(letter=c("A", "B", "C"),
number=c(1, 2, 3))
y``````
``````
letter number
1      A      1
2      B      2
3      C      3``````
``````
# Create a dataframe called "group2"
z <- data.frame(letter=c("B", "C", "D"),
number=c(2, 3, 3))
z``````
``````
letter number
1      B      2
2      C      3
3      D      3``````

### Intersect

To get rows that appear in both y and z:

``````
dplyr::intersect(y, z)``````
``````
letter number
1      B      2
2      C      3``````

### Union

To get rows that appear in either or both y and z.

``````
dplyr::union(y, z)``````
``````
letter number
1      A      1
2      B      2
3      C      3
4      D      3``````

### Setdiff

To get rows that appear in y but not z.

``````
dplyr::setdiff(y, z)``````
``````
letter number
1      A      1``````

## Binding

### Rows

To append z to y as new rows:

``````
dplyr::bind_rows(y, z)``````
``````
letter number
1      A      1
2      B      2
3      C      3
4      B      2
5      C      3
6      D      3``````

### Columns

To append z to y as new columns:

``````
dplyr::bind_cols(y, z)``````
``````
letter...1 number...2 letter...3 number...4
1          A          1          B          2
2          B          2          C          3
3          C          3          D          3``````

Caution: matches rows by position!

### Citation

`Warin (2019, May 27). Thierry Warin: [R Course] Data Wrangling with R: Combining Tables. Retrieved from https://warin.ca/posts/datawranglingwithr-combining/`
```@misc{warin2019[r,