Use different techniques of combining tables such as mutating join, filtering join and biding.
To show you how the following functions work, let’s create two dataframes:
# Create a dataframe called "group1"
group1 <- data.frame(name=c("Laura", "Paul", "Ashley", "Sarah", "Mark", "Lori"),
age=c(22, 28, 19, 25, 27, 18),
gender=c("female","male","female","female","male", "female"))
name age gender
1 Laura 22 female
2 Paul 28 male
3 Ashley 19 female
4 Sarah 25 female
5 Mark 27 male
6 Lori 18 female
# Create a dataframe called "group2"
group2 <- data.frame(name=c("Laura", "Paul", "Ashley", "Sarah", "Mark", "Sacha"),
car=c("Mazda", "Toyota", "Nissan", "Toyota", "Chevrolet", "Honda"))
name car
1 Laura Mazda
2 Paul Toyota
3 Ashley Nissan
4 Sarah Toyota
5 Mark Chevrolet
6 Sacha Honda
To join matching rows from group2 to group1:
dplyr::left_join(group1, group2, by = "name")
name age gender car
1 Laura 22 female Mazda
2 Paul 28 male Toyota
3 Ashley 19 female Nissan
4 Sarah 25 female Toyota
5 Mark 27 male Chevrolet
6 Lori 18 female <NA>
To join matching rows from group1 to group2:
dplyr::right_join(group1, group2, by = "name")
name age gender car
1 Laura 22 female Mazda
2 Paul 28 male Toyota
3 Ashley 19 female Nissan
4 Sarah 25 female Toyota
5 Mark 27 male Chevrolet
6 Sacha NA <NA> Honda
To join data by retaining only rows in both sets:
dplyr::inner_join(group1, group2, by = "name")
name age gender car
1 Laura 22 female Mazda
2 Paul 28 male Toyota
3 Ashley 19 female Nissan
4 Sarah 25 female Toyota
5 Mark 27 male Chevrolet
To join data by retaining all values, all rows:
dplyr::full_join(group1, group2, by = "name")
name age gender car
1 Laura 22 female Mazda
2 Paul 28 male Toyota
3 Ashley 19 female Nissan
4 Sarah 25 female Toyota
5 Mark 27 male Chevrolet
6 Lori 18 female <NA>
7 Sacha NA <NA> Honda
To join all rows in group1 that have a match in group2.
dplyr::semi_join(group1, group2, by = "name")
name age gender
1 Laura 22 female
2 Paul 28 male
3 Ashley 19 female
4 Sarah 25 female
5 Mark 27 male
To join all rows in group1 that do not have a match in group2.
dplyr::anti_join(group1, group2, by = "name")
name age gender
1 Lori 18 female
To join all rows in group2 that do not have a match in group1, reverse the names.
dplyr::anti_join(group2, group1, by = "name")
name car
1 Sacha Honda
To use these functions you need to have same column names in each data frames, unless it won’t work.
Let’s create others dataframes:
# Create a dataframe called "group1"
y <- data.frame(letter=c("A", "B", "C"),
number=c(1, 2, 3))
y
letter number
1 A 1
2 B 2
3 C 3
# Create a dataframe called "group2"
z <- data.frame(letter=c("B", "C", "D"),
number=c(2, 3, 3))
z
letter number
1 B 2
2 C 3
3 D 3
To get rows that appear in both y and z:
dplyr::intersect(y, z)
letter number
1 B 2
2 C 3
To get rows that appear in either or both y and z.
dplyr::union(y, z)
letter number
1 A 1
2 B 2
3 C 3
4 D 3
To get rows that appear in y but not z.
dplyr::setdiff(y, z)
letter number
1 A 1
To append z to y as new rows:
dplyr::bind_rows(y, z)
letter number
1 A 1
2 B 2
3 C 3
4 B 2
5 C 3
6 D 3
To append z to y as new columns:
dplyr::bind_cols(y, z)
letter...1 number...2 letter...3 number...4
1 A 1 B 2
2 B 2 C 3
3 C 3 D 3
Caution: matches rows by position!
For attribution, please cite this work as
Warin (2019, May 27). Thierry Warin, PhD: [R Course] Data Wrangling with R: Combining Tables. Retrieved from https://warin.ca/posts/rcourse-datawranglingwithr-combining/
BibTeX citation
@misc{warin2019[r, author = {Warin, Thierry}, title = {Thierry Warin, PhD: [R Course] Data Wrangling with R: Combining Tables}, url = {https://warin.ca/posts/rcourse-datawranglingwithr-combining/}, year = {2019} }