Learn how to rename, create and delete variables. Also learn how to convert text case, replace or remove specifics characters. Finally, learn how to handle missing values.
R follows a set of conventions that makes one layout of tabular data much easier to work with than others. Your data will be easier to work with in R if it follows three rules:
To have access to the functions of specifics packages, you have two ways of doing it:
The first option allow you to load your package by calling the function library()
and then use functions from it as you want without having to reload the package.
library(package)
This second option allow you to have access to the wanted package and function. But you can’t use another function from the same package without using this same syntax or by using the previous one.
Now, let’s load the data on which we will do some data wrangling.
dataFull <- gsheet::gsheet2tbl("https://docs.google.com/spreadsheets/d/1uLaXke-KPN28-ESPPoihk8TiXVWp5xuNGHW7w7yqLCc/edit#gid=329712902")
rank | company | country | industrial.sector | RD.usd | sales.usd | year |
---|---|---|---|---|---|---|
1 | VOLKSWAGEN | Germany | Automobiles Parts | 14712470286 | NA | 2015 |
2 | DAIMLER | Germany | Automobiles Parts | 6335781792 | 145635513797 | 2015 |
3 | ROBERT BOSCH | Germany | Automobiles Parts | 5653984389 | 54892540624 | 2015 |
4 | SANOFI | France | Pharmaceuticals Biotechnology | 5396067608 | 37868911705 | 2015 |
5 | BMW | Germany | Automobiles Parts | 5120208790 | 90159856973 | 2015 |
6 | SIEMENS | Germany | Electronic Electrical Equipment | 4908268479 | 80649456021 | 2015 |
To convert data to tbl class (tbl’s are easier to examine than data frames):
dplyr::tbl_df(dataFull)
To obtain an information dense summary of tbl data:
dplyr::glimpse(dataFull)
Rows: 24,334
Columns: 7
$ rank <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13…
$ company <chr> "VOLKSWAGEN", "DAIMLER", "ROBERT BOSCH", …
$ country <chr> "Germany", "Germany", "Germany", "France"…
$ industrial.sector <chr> "Automobiles Parts", "Automobiles Parts",…
$ RD.usd <dbl> 14712470286, 6335781792, 5653984389, 5396…
$ sales.usd <dbl> NA, 145635513797, 54892540624, 3786891170…
$ year <dbl> 2015, 2015, 2015, 2015, 2015, 2015, 2015,…
This section will present you how to rename columns with different functions.
names(dataFull)[2] <- "Company"
dataFull <- dplyr::rename(dataFull, Year = year)
rank | Company | Country | Industry | RD.usd | sales.usd | Year |
---|---|---|---|---|---|---|
1 | VOLKSWAGEN | Germany | Automobiles Parts | 14712470286 | NA | 2015 |
2 | DAIMLER | Germany | Automobiles Parts | 6335781792 | 145635513797 | 2015 |
3 | ROBERT BOSCH | Germany | Automobiles Parts | 5653984389 | 54892540624 | 2015 |
4 | SANOFI | France | Pharmaceuticals Biotechnology | 5396067608 | 37868911705 | 2015 |
5 | BMW | Germany | Automobiles Parts | 5120208790 | 90159856973 | 2015 |
6 | SIEMENS | Germany | Electronic Electrical Equipment | 4908268479 | 80649456021 | 2015 |
To add a new column to your dataset, enter the name of your new column after the $ symbol and assign a value to it. The new column will have the same value with this technique and be placed at the end of your dataframe!
dataFull$newColumn <- 42
rank | Company | Country | Industry | RD.usd | sales.usd | Year | newColumn |
---|---|---|---|---|---|---|---|
1 | VOLKSWAGEN | Germany | Automobiles Parts | 14712470286 | NA | 2015 | 42 |
2 | DAIMLER | Germany | Automobiles Parts | 6335781792 | 145635513797 | 2015 | 42 |
3 | ROBERT BOSCH | Germany | Automobiles Parts | 5653984389 | 54892540624 | 2015 | 42 |
4 | SANOFI | France | Pharmaceuticals Biotechnology | 5396067608 | 37868911705 | 2015 | 42 |
5 | BMW | Germany | Automobiles Parts | 5120208790 | 90159856973 | 2015 | 42 |
6 | SIEMENS | Germany | Electronic Electrical Equipment | 4908268479 | 80649456021 | 2015 | 42 |
A new column called ‘newColumn’ will be created containing the value 42.
To delete a column it’s simple. Assign the value “NULL” to the column and it’s done!
dataFull$newColumn <- NULL
Note: The column called ‘newColumn’ is deleted.
You can delete multiple columns at once by using this code :
dataFull[,5:7] <- NULL
Note: The columns 5 to 7 will be removed. We will actually keep these columns for the rest of the course.
Put in upper case:
dataFull$Industry <- stringr::str_to_upper(dataFull$Industry)
rank | Company | Country | Industry | RD.usd | sales.usd | Year |
---|---|---|---|---|---|---|
1 | VOLKSWAGEN | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 |
2 | DAIMLER | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 |
3 | ROBERT BOSCH | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 |
4 | SANOFI | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 |
5 | BMW | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 |
6 | SIEMENS | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 |
Put in lowercase:
dataFull$Company <- stringr::str_to_lower(dataFull$Company)
rank | Company | Country | Industry | RD.usd | sales.usd | Year |
---|---|---|---|---|---|---|
1 | volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 |
2 | daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 |
3 | robert bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 |
4 | sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 |
5 | bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 |
6 | siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 |
To put the first lette of “principal” words in capital and common words like “for” in lowercase:
dataFull$Company <- stringr::str_to_title(dataFull$Company)
rank | Company | Country | Industry | RD.usd | sales.usd | Year |
---|---|---|---|---|---|---|
1 | Volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 |
2 | Daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 |
3 | Robert Bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 |
4 | Sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 |
5 | Bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 |
6 | Siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 |
The function is intended for English text only.
Only put the first letter in capital:
dataFull$Country <- stringr::str_to_sentence(dataFull$Country)
rank | Company | Country | Industry | RD.usd | sales.usd | Year |
---|---|---|---|---|---|---|
1 | Volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 |
2 | Daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 |
3 | Robert Bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 |
4 | Sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 |
5 | Bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 |
6 | Siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 |
To replace all instances:
dataFull$Country <- gsub("UK", "United Kingdom", dataFull$Country)
rank | Company | Country | Industry | RD.usd | sales.usd | Year |
---|---|---|---|---|---|---|
1 | Volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 |
2 | Daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 |
3 | Robert Bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 |
4 | Sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 |
5 | Bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 |
6 | Siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 |
dataFull$code <- "Abc12d345efg6"
rank | Company | Country | Industry | RD.usd | sales.usd | Year | code |
---|---|---|---|---|---|---|---|
1 | Volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 | Abc12d345efg6 |
2 | Daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 | Abc12d345efg6 |
3 | Robert Bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 | Abc12d345efg6 |
4 | Sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 | Abc12d345efg6 |
5 | Bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 | Abc12d345efg6 |
6 | Siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 | Abc12d345efg6 |
To replace all numbers:
dataFull$code <- gsub("[0-9]*", "", dataFull$code)
rank | Company | Country | Industry | RD.usd | sales.usd | Year | code |
---|---|---|---|---|---|---|---|
1 | Volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 | Abcdefg |
2 | Daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 | Abcdefg |
3 | Robert Bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 | Abcdefg |
4 | Sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 | Abcdefg |
5 | Bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 | Abcdefg |
6 | Siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 | Abcdefg |
dataFull$code2 <- "Abc12D345eF6g"
rank | Company | Country | Industry | RD.usd | sales.usd | Year | code | code2 |
---|---|---|---|---|---|---|---|---|
1 | Volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 | Abcdefg | Abc12D345eF6g |
2 | Daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 | Abcdefg | Abc12D345eF6g |
3 | Robert Bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 | Abcdefg | Abc12D345eF6g |
4 | Sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 | Abcdefg | Abc12D345eF6g |
5 | Bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 | Abcdefg | Abc12D345eF6g |
6 | Siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 | Abcdefg | Abc12D345eF6g |
To replace all letters in lowercase:
dataFull$code2 <- gsub("[a-z]*", "", dataFull$code2)
rank | Company | Country | Industry | RD.usd | sales.usd | Year | code | code2 |
---|---|---|---|---|---|---|---|---|
1 | Volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 | Abcdefg | A12D345F6 |
2 | Daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 | Abcdefg | A12D345F6 |
3 | Robert Bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 | Abcdefg | A12D345F6 |
4 | Sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 | Abcdefg | A12D345F6 |
5 | Bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 | Abcdefg | A12D345F6 |
6 | Siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 | Abcdefg | A12D345F6 |
To replace all letters in capital:
dataFull$code2 <- gsub("[A-Z]*", "", dataFull$code2)
rank | Company | Country | Industry | RD.usd | sales.usd | Year | code | code2 |
---|---|---|---|---|---|---|---|---|
1 | Volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 | Abcdefg | 123456 |
2 | Daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 | Abcdefg | 123456 |
3 | Robert Bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 | Abcdefg | 123456 |
4 | Sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 | Abcdefg | 123456 |
5 | Bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 | Abcdefg | 123456 |
6 | Siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 | Abcdefg | 123456 |
dataFull$CompanyWithoutDash <- gsub("\\-"," ", dataFull$Company)
rank | Company | Country | Industry | RD.usd | sales.usd | Year | code | code2 | CompanyWithoutDash |
---|---|---|---|---|---|---|---|---|---|
1 | Volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 | Abcdefg | 123456 | Volkswagen |
2 | Daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 | Abcdefg | 123456 | Daimler |
3 | Robert Bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 | Abcdefg | 123456 | Robert Bosch |
4 | Sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 | Abcdefg | 123456 | Sanofi |
5 | Bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 | Abcdefg | 123456 | Bmw |
6 | Siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 | Abcdefg | 123456 | Siemens |
7 | Astrazeneca | Uk | PHARMACEUTICALS BIOTECHNOLOGY | 4669863663 | 24102098903 | 2015 | Abcdefg | 123456 | Astrazeneca |
8 | Glaxosmithkline | Uk | PHARMACEUTICALS BIOTECHNOLOGY | 4487751988 | 33165406723 | 2015 | Abcdefg | 123456 | Glaxosmithkline |
9 | Ericsson | Sweden | TECHNOLOGY HARDWARE EQUIPMENT | 4324815865 | 27217621479 | 2015 | Abcdefg | 123456 | Ericsson |
10 | Bayer | Germany | PHARMACEUTICALS BIOTECHNOLOGY | 4136760891 | 47365856130 | 2015 | Abcdefg | 123456 | Bayer |
11 | Fiat Chrysler | The netherlands | AUTOMOBILES PARTS | 4109847835 | 107753145565 | 2015 | Abcdefg | 123456 | Fiat Chrysler |
12 | Airbus | The netherlands | AEROSPACE DEFENCE | 4054900347 | 68082180525 | 2015 | Abcdefg | 123456 | Airbus |
13 | Nokia | Finland | TECHNOLOGY HARDWARE EQUIPMENT | 3047903524 | 17033721315 | 2015 | Abcdefg | 123456 | Nokia |
14 | Boehringer Ingelheim | Germany | PHARMACEUTICALS BIOTECHNOLOGY | 2976135377 | 14933381616 | 2015 | Abcdefg | 123456 | Boehringer Ingelheim |
15 | Sap | Germany | SOFTWARE COMPUTER SERVICES | 2587017450 | 19691385536 | 2015 | Abcdefg | 123456 | Sap |
Note: It removes a dash character
dataFull$CompanyFirstName <- gsub("\\-.*"," ", dataFull$Company)
rank | Company | Country | Industry | RD.usd | sales.usd | Year | code | code2 | CompanyWithoutDash | CompanyFirstName |
---|---|---|---|---|---|---|---|---|---|---|
1 | Volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 | Abcdefg | 123456 | Volkswagen | Volkswagen |
2 | Daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 | Abcdefg | 123456 | Daimler | Daimler |
3 | Robert Bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 | Abcdefg | 123456 | Robert Bosch | Robert Bosch |
4 | Sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 | Abcdefg | 123456 | Sanofi | Sanofi |
5 | Bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 | Abcdefg | 123456 | Bmw | Bmw |
6 | Siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 | Abcdefg | 123456 | Siemens | Siemens |
7 | Astrazeneca | Uk | PHARMACEUTICALS BIOTECHNOLOGY | 4669863663 | 24102098903 | 2015 | Abcdefg | 123456 | Astrazeneca | Astrazeneca |
8 | Glaxosmithkline | Uk | PHARMACEUTICALS BIOTECHNOLOGY | 4487751988 | 33165406723 | 2015 | Abcdefg | 123456 | Glaxosmithkline | Glaxosmithkline |
9 | Ericsson | Sweden | TECHNOLOGY HARDWARE EQUIPMENT | 4324815865 | 27217621479 | 2015 | Abcdefg | 123456 | Ericsson | Ericsson |
10 | Bayer | Germany | PHARMACEUTICALS BIOTECHNOLOGY | 4136760891 | 47365856130 | 2015 | Abcdefg | 123456 | Bayer | Bayer |
11 | Fiat Chrysler | The netherlands | AUTOMOBILES PARTS | 4109847835 | 107753145565 | 2015 | Abcdefg | 123456 | Fiat Chrysler | Fiat Chrysler |
12 | Airbus | The netherlands | AEROSPACE DEFENCE | 4054900347 | 68082180525 | 2015 | Abcdefg | 123456 | Airbus | Airbus |
13 | Nokia | Finland | TECHNOLOGY HARDWARE EQUIPMENT | 3047903524 | 17033721315 | 2015 | Abcdefg | 123456 | Nokia | Nokia |
14 | Boehringer Ingelheim | Germany | PHARMACEUTICALS BIOTECHNOLOGY | 2976135377 | 14933381616 | 2015 | Abcdefg | 123456 | Boehringer Ingelheim | Boehringer Ingelheim |
15 | Sap | Germany | SOFTWARE COMPUTER SERVICES | 2587017450 | 19691385536 | 2015 | Abcdefg | 123456 | Sap | Sap |
Note: It removes everything after a dash character
dataFull$CompanyLastName <- gsub(".*\\-","", dataFull$Company)
rank | Company | Country | Industry | RD.usd | sales.usd | Year | code | code2 | CompanyWithoutDash | CompanyFirstName | CompanyLastName |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Volkswagen | Germany | AUTOMOBILES PARTS | 14712470286 | NA | 2015 | Abcdefg | 123456 | Volkswagen | Volkswagen | Volkswagen |
2 | Daimler | Germany | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 | Abcdefg | 123456 | Daimler | Daimler | Daimler |
3 | Robert Bosch | Germany | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 | Abcdefg | 123456 | Robert Bosch | Robert Bosch | Robert Bosch |
4 | Sanofi | France | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 | Abcdefg | 123456 | Sanofi | Sanofi | Sanofi |
5 | Bmw | Germany | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 | Abcdefg | 123456 | Bmw | Bmw | Bmw |
6 | Siemens | Germany | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 | Abcdefg | 123456 | Siemens | Siemens | Siemens |
7 | Astrazeneca | Uk | PHARMACEUTICALS BIOTECHNOLOGY | 4669863663 | 24102098903 | 2015 | Abcdefg | 123456 | Astrazeneca | Astrazeneca | Astrazeneca |
8 | Glaxosmithkline | Uk | PHARMACEUTICALS BIOTECHNOLOGY | 4487751988 | 33165406723 | 2015 | Abcdefg | 123456 | Glaxosmithkline | Glaxosmithkline | Glaxosmithkline |
9 | Ericsson | Sweden | TECHNOLOGY HARDWARE EQUIPMENT | 4324815865 | 27217621479 | 2015 | Abcdefg | 123456 | Ericsson | Ericsson | Ericsson |
10 | Bayer | Germany | PHARMACEUTICALS BIOTECHNOLOGY | 4136760891 | 47365856130 | 2015 | Abcdefg | 123456 | Bayer | Bayer | Bayer |
11 | Fiat Chrysler | The netherlands | AUTOMOBILES PARTS | 4109847835 | 107753145565 | 2015 | Abcdefg | 123456 | Fiat Chrysler | Fiat Chrysler | Fiat Chrysler |
12 | Airbus | The netherlands | AEROSPACE DEFENCE | 4054900347 | 68082180525 | 2015 | Abcdefg | 123456 | Airbus | Airbus | Airbus |
13 | Nokia | Finland | TECHNOLOGY HARDWARE EQUIPMENT | 3047903524 | 17033721315 | 2015 | Abcdefg | 123456 | Nokia | Nokia | Nokia |
14 | Boehringer Ingelheim | Germany | PHARMACEUTICALS BIOTECHNOLOGY | 2976135377 | 14933381616 | 2015 | Abcdefg | 123456 | Boehringer Ingelheim | Boehringer Ingelheim | Boehringer Ingelheim |
15 | Sap | Germany | SOFTWARE COMPUTER SERVICES | 2587017450 | 19691385536 | 2015 | Abcdefg | 123456 | Sap | Sap | Sap |
Note: It removes everything after a dash character
dataFull$Country <- substr(dataFull$Country, 1, 2)
rank | Company | Country | Industry | RD.usd | sales.usd | Year | code | code2 | CompanyWithoutDash | CompanyFirstName | CompanyLastName |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Volkswagen | Ge | AUTOMOBILES PARTS | 14712470286 | NA | 2015 | Abcdefg | 123456 | Volkswagen | Volkswagen | Volkswagen |
2 | Daimler | Ge | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 | Abcdefg | 123456 | Daimler | Daimler | Daimler |
3 | Robert Bosch | Ge | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 | Abcdefg | 123456 | Robert Bosch | Robert Bosch | Robert Bosch |
4 | Sanofi | Fr | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 | Abcdefg | 123456 | Sanofi | Sanofi | Sanofi |
5 | Bmw | Ge | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 | Abcdefg | 123456 | Bmw | Bmw | Bmw |
6 | Siemens | Ge | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 | Abcdefg | 123456 | Siemens | Siemens | Siemens |
Note: It keeps the first two characters.
To drop rows containing NA’s in columns:
dataDrop <- tidyr::drop_na(dataFull)
rank | Company | Country | Industry | RD.usd | sales.usd | Year | code | code2 | CompanyWithoutDash | CompanyFirstName | CompanyLastName |
---|---|---|---|---|---|---|---|---|---|---|---|
2 | Daimler | Ge | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 | Abcdefg | 123456 | Daimler | Daimler | Daimler |
3 | Robert Bosch | Ge | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 | Abcdefg | 123456 | Robert Bosch | Robert Bosch | Robert Bosch |
4 | Sanofi | Fr | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 | Abcdefg | 123456 | Sanofi | Sanofi | Sanofi |
5 | Bmw | Ge | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 | Abcdefg | 123456 | Bmw | Bmw | Bmw |
6 | Siemens | Ge | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 | Abcdefg | 123456 | Siemens | Siemens | Siemens |
7 | Astrazeneca | Uk | PHARMACEUTICALS BIOTECHNOLOGY | 4669863663 | 24102098903 | 2015 | Abcdefg | 123456 | Astrazeneca | Astrazeneca | Astrazeneca |
To replace NA’s by column with a chosen value:
dataReplace <- tidyr::replace_na(dataFull, list(sales.usd=0))
rank | Company | Country | Industry | RD.usd | sales.usd | Year | code | code2 | CompanyWithoutDash | CompanyFirstName | CompanyLastName |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Volkswagen | Ge | AUTOMOBILES PARTS | 14712470286 | 0 | 2015 | Abcdefg | 123456 | Volkswagen | Volkswagen | Volkswagen |
2 | Daimler | Ge | AUTOMOBILES PARTS | 6335781792 | 145635513797 | 2015 | Abcdefg | 123456 | Daimler | Daimler | Daimler |
3 | Robert Bosch | Ge | AUTOMOBILES PARTS | 5653984389 | 54892540624 | 2015 | Abcdefg | 123456 | Robert Bosch | Robert Bosch | Robert Bosch |
4 | Sanofi | Fr | PHARMACEUTICALS BIOTECHNOLOGY | 5396067608 | 37868911705 | 2015 | Abcdefg | 123456 | Sanofi | Sanofi | Sanofi |
5 | Bmw | Ge | AUTOMOBILES PARTS | 5120208790 | 90159856973 | 2015 | Abcdefg | 123456 | Bmw | Bmw | Bmw |
6 | Siemens | Ge | ELECTRONIC ELECTRICAL EQUIPMENT | 4908268479 | 80649456021 | 2015 | Abcdefg | 123456 | Siemens | Siemens | Siemens |
NA’s in the column “sales.usd” will be filled by the value 0.
For attribution, please cite this work as
Warin (2019, May 24). Thierry Warin, PhD: [R Course] Data Wrangling with R. Retrieved from https://warin.ca/posts/rcourse-datawranglingwithr/
BibTeX citation
@misc{warin2019[r, author = {Warin, Thierry}, title = {Thierry Warin, PhD: [R Course] Data Wrangling with R}, url = {https://warin.ca/posts/rcourse-datawranglingwithr/}, year = {2019} }