Chapter 12 API and Packages
12.1 Introduction
Interesting data sets are the fuel of a good data science project. And while downloading csv from different websites to get free data works well to accomplish a project, APIs (Application Programming Interface) are another very common way to access and acquire interesting and free data. Therefore, in this chapter, we’ll focus on API to get data.
At the end of the chapter, you should be able to:
- know what is an argument
- import data using an API
12.2 WDI
12.2.1 Database description
The World Development Indicators is a compilation of relevant, high-quality, and internationally comparable statistics about global development and the fight against poverty. The database contains 1,600 time series indicators for 217 economies and more than 40 country groups, with data for many indicators going back more than 50 years.
12.2.2 Functions
This library gives access to all indicators provided by the World Bank. The functions listed below allow you to search and download specific data from the WDI database.
- WDIsearch()
- WDI()
Each of these functions are detailed in this course and some examples are provided.
12.2.2.1 WDIsearch()
The function WDIsearch() takes as an input any string of character and will provide the list of indicators containing this string of character.
For example, we would like to obtain all indicators using the term “GDP” from the database WDI.
12.2.2.2 WDI()
The function WDI() takes as an input the indicator’s code and the country of the data wanted. It returns the value of the indicator for the countries selected. To search specific dates, it is possible to add as inputs the starting year and the ending year of the data.
For example, it would be interesting to evaluate the total amount of stocks traded in percentage of GDP (CM.MKT.TRAD.GD.ZS) for 4 countries (France - FR; Canada - CA; USA - US; China - CN) from 2000 to 2014. This could be obtained by using the function WDI() with the following inputs:
indicator
= “CM.MKT.TRAD.GD.ZS”country
= c(“FR”, “CA”, “US”,“CN”)start
= 2000end
= 2014
library(WDI)
# Access and store data concerning Stocks traded in total value (% of GDP)
stockTraded <- WDI(indicator = "CM.MKT.TRAD.GD.ZS", country = c("FR", "CA", "US","CN"), start = 2000, end = 2016)
head(stockTraded)
Access a more in-depth application of the WDI API here.
12.2.3 tl;dr
#Loading the WDI library
library(WDI)
# Search all indicators with the term "GDP"
listOfIndicators <- WDIsearch("GDP")
# List the first 5 indicators
listOfIndicators[1:5,]
stockTraded <- WDI(indicator = "CM.MKT.TRAD.GD.ZS", country = c("FR", "CA", "US","CN"), start = 2000, end = 2016)
head(stockTraded)
12.3 OECD
12.3.1 Database description
The Organisation for Economic Co-operation and Development (OECD) database contains almost 300 indicators under 12 categories including agriculture, finance, health and education, etc.
12.3.2 Functions
- search_dataset()
- get_data_structure()
- get_datasets()
Each of these functions are detailed in this course and some examples are provided.
12.3.2.1 search_dataset()
The function search_dataset() searches for OECD indicators. It takes as an input the indicator that would be useful for your analysis. It searches and returns a table of all the available indicator related to the input inserted.
12.3.2.2 get_data_structure()
The function get_data_structure takes as an input the id associate with the dataset and returns the structure of any query made with the OECD package.
12.3.2.3 get_datasets()
The function get_datasets takes as an input the dataset id and filters. Add a variable to list all of the specific filters allows to simplify the input of the function. get_datasets returns a dataframe containing the selected data.
12.3.3 tl;dr
# Loading OECD library
library(OECD)
# List all available datasets
dataset_list <- get_datasets()
# Search all indicators with the term "unemployment"
search_dataset("unemployment", data = dataset_list)
# Structure of a query
dstruc <- get_data_structure("DUR_D")
str(dstruc, max.level = 1)
# Filter use to narrow the research (Canada-Germany-France-USA; male and female; 20-24 years old)
filter_list <- list(c("DEU", "FRA", "CAN", "USA"), "MW", "2024")
# Dataframe containing selected data
unemployementOECD <- get_dataset(dataset = "DUR_D", filter = filter_list)
unemployementOECD[1:6,]
12.4 spiR
12.4.1 Database description
The Social Progress Index is an index created to show a country’s human development. The index was being created between 2009 and 2013. The index thus includes 52 indicators. It is published by the Social Progress Imperative.
The index is based on three axes including 52 indicators:
- Basic Human Needs, based on food, health, sanitation, housing, access to electricity, security etc.
- Foundations of Well-being, based on literacy, education, access to media, life expectancy, suicide rate, obesity, pollution, environment, etc.
- Opportunity, based on political rights, property rights, corruption, social tolerance, access to higher education, etc.
12.4.2 Functions
This package lets you recreate impactful dashboards and visualizations as the ones found on the Social Progress Imperative. This API provides one main function, spir_data(), which lets you extract the data in a convenient format and two other functions, spir_country() and spir_indicator(), that can assist you finding the appropriate arguments for the API.
- spir_country()
- spir_indicator()
- spir_data()
Some examples are provided below.
12.4.2.1 spir_country()
This function allows you to find and search the right country code associated with the Social Progress Index’s Data. If no argument is filed, all indicators will be displayed.
12.4.2.2 spir_indicator()
This function allows you to find and search the right indicator code from the Social Progress Index you want to use. If no argument is filed, all indicators will be displayed.
12.4.2.3 spir_data()
First, the function spir_data() takes as an input the countries we’re interested in. We specify this argument with the countries ISO code, as such: c(“USA”, “FRA”, “BRA”, “CHN”, “ZAF”, “CAN”). The second argument is dedicated for the years for which we want data. Finally, we need to specify the indicator from Social Progress we would like to extract.
For example, let’s take a look at the spir indicator (Social Progress Index) for the countries listed above.
#Extracting the data
myData <- spir_data(country = c("USA", "FRA", "BRA", "CHN", "ZAF", "CAN"),
years = c("2014","2015","2016", "2017", "2018", "2019"),
indicators = "SPI")
head(myData)
Access a more in-depth application of the spiR API here.
12.4.3 tl;dr
#Loading the spir package
library(spir)
#Get the ISO code for a specific country
mycountry <- spir_country("Canada")
mycountry
#Search for an indicator
myIndicator <- spir_indicator("mortality")
myIndicator
#Extracting the data
myData <- spir_data(country = c("USA", "FRA", "BRA", "CHN", "ZAF", "CAN"),
years = c("2014","2015","2016", "2017", "2018", "2019"),
indicators = "SPI")
head(myData)
12.5 statcanR
12.5.1 Database description
Statistics Canada database contains about 30 subjects including agriculture, energy, environment and education for 5 geographical levels (Canada, Provinces, CMA, etc.)
12.5.2 Functions
StatcanR provides the R user with a consistent process to collect data from Statistics Canada’s data portal. It provides access to all Statistics Canada’ open economic data (formerly known as CANSIM tables) now identified by product IDs (PID) by the new Statistics Canada’s Web Data Service.
This tutorial presents how to use the statcanR R package and its function statcan_data(). The use of this package is separated into two parts. You first have to search the desired table, and then you are able to fetch the data from the statcan_data() function.
- Search for data
- statcan_data()
Some examples are provided below.
12.5.2.1 Search for data
In order to search for the desired information, Statistic Canada provides a search engine which indicates us the table number we are looking for. If we were interested in the federal expenditures on science and technology by socio-economic objectives, we would visit https://www150.statcan.gc.ca/n1/en/type/data?MM=1 and type in the search box the data’s description.
For this example the table number is ‘27-10-0014-01’. With the table number associated with our search, we can move on to extracting data with the API.
12.5.2.2 statcan_data()
The statcan_data() function takes as an input the table number obtained earlier and the data’s display language (french or english). The lang argument is either “fra” or “eng”.
For example, we can now extract the data associated with the federal expenditures on science and technology by socio-economic objectives.
#Loading the statCanR package
library(statcanR)
# Get data with statcan_data function
mydata <- statcan_data("27-10-0014-01", "eng")
head(mydata)
Access a more in-depth application of the statcanR API here.
12.6 EpiBibR
12.6.1 Database description
EpiBibR is a R wrapper to easily access bibliographic data on Covid-19 and other medical references. In this global crisis, knowledge and open data can have an impact. In this regard, our team thought it could be significant to make available more than 100 000 references (journal articles, letter, news) through R.
12.6.1.1 Features
Table 1. Features accessible through the package.
Field Tags | Descriptions | Field Tags | Descriptions |
---|---|---|---|
AU | Authors | ISSN | Source Code |
TI | Document Title | VOL | Volume |
AB | Abstract | ISSUE | Issue Number |
PY | Year | LT | Language |
DT | Document Type | C1 | Author Address |
MESH | Medical Subject Headings Vocabulary | RP | Reprint Address |
TC | Times Cited | ID | PubMed ID |
SO | Publication Name (or Source) | DE | Authors’ Keywords |
J9 | Source Abbreviation | UT | Unique Article Identifier |
JI | ISO Source Abbreviation | AU_CO | Author’s Country of Origin |
DI | Digital Object Identifier (DOI) | DB | Bibliographic Database |
12.6.2 Functions
EpiBibR allows you to search bibligraphic references using several arguments : Author, author’s country of origin, year, keywords in the title, keywords in the abstract and source name. The function listed below allow you to retrieve these informations and each some examples are provided.
- epibibr_data()
12.6.2.1 epibibr_data()
To get the entire bibliographic dataframe contaning more than 80 000 references, use the epibib_data function.
But, it can be truly helpful to search references by the name of the author. For example, we will search all the articles written by Philippe Colson.
You can also search by author’s name and year of publication.
Another interesting search would be by author’s country of origin.
It would be also interesting to search by keywords in title.
As you may have noticed, you can keep more than one argument to refine your search. Let’s use 3 arguments this time by searching by author, title and year.
We can easily use a fourth argument by adding a source.
yangcovid2020bio_articles <- epibibr_data(author = "Yang", title = "covid", year = "2020", source = "bio")
Finally, you can search for keywords in the abstract.
12.6.3 tl;dr
library(EpiBibR)
epidata <- epibibr_data()
complete_data <- epibibr_data()
colson_articles <- epibibr_data(author = "Colson")
yang2020 <- epibibr_data(author = "Yang", year = "2020")
canada_articles <- epibibr_data(country = "Canada")
covid_articles <- epibibr_data(title = "covid")
yangcovid2020_articles <- epibibr_data(author = "Yang", title = "covid", year = "2020")
yangcovid2020bio_articles <- epibibr_data(author = "Yang", title = "covid", year = "2020", source = "bio")
coronavirus_articles <- epibibr_data(abstract = "coronavirus")