[API] coronavirus

Access a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic through the coronavirus API.

Thierry Warin https://warin.ca/aboutme.html (HEC Montréal and CIRANO (Canada))https://www.hec.ca/en/profs/thierry.warin.html
04-02-2020

Table of Contents


Database description

The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The raw data pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.

A csv format of the package dataset available here.

A summary dashboard is available here.

Functions

This package gives access a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The function below allows you to download the data.

Each of these functions are detailed in this course and some examples are provided.

data(“coronavirus”)

This is a basic example which shows you how to get the data:


library(coronavirus)

data("coronavirus")

This coronavirus dataset has the following fields:


head(coronavirus) 

        date province     country lat long      type cases
1 2020-01-22          Afghanistan  33   65 confirmed     0
2 2020-01-23          Afghanistan  33   65 confirmed     0
3 2020-01-24          Afghanistan  33   65 confirmed     0
4 2020-01-25          Afghanistan  33   65 confirmed     0
5 2020-01-26          Afghanistan  33   65 confirmed     0
6 2020-01-27          Afghanistan  33   65 confirmed     0

tail(coronavirus)

            date province country     lat     long      type cases
87803 2020-05-07 Zhejiang   China 29.1832 120.0934 recovered     0
87804 2020-05-08 Zhejiang   China 29.1832 120.0934 recovered     0
87805 2020-05-09 Zhejiang   China 29.1832 120.0934 recovered     0
87806 2020-05-10 Zhejiang   China 29.1832 120.0934 recovered     0
87807 2020-05-11 Zhejiang   China 29.1832 120.0934 recovered     0
87808 2020-05-12 Zhejiang   China 29.1832 120.0934 recovered     0

Here is an example of a summary total cases by region and type (top 20):


library(dplyr)

summary_df <- coronavirus %>% group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20)

# A tibble: 20 x 3
# Groups:   country [12]
   country        type      total_cases
   <chr>          <chr>           <int>
 1 US             confirmed     1369376
 2 Russia         confirmed      232243
 3 US             recovered      230287
 4 Spain          confirmed      228030
 5 United Kingdom confirmed      227741
 6 Italy          confirmed      221216
 7 France         confirmed      178349
 8 Brazil         confirmed      178214
 9 Germany        confirmed      173171
10 Germany        recovered      147200
11 Turkey         confirmed      141475
12 Spain          recovered      138980
13 Iran           confirmed      110767
14 Italy          recovered      109039
15 Turkey         recovered       98889
16 Iran           recovered       88357
17 China          confirmed       84018
18 US             death           82356
19 China          recovered       79222
20 India          confirmed       74292

Summary of new cases during the past 24 hours by country and type (as of 2020-03-26):


library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country = country, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)

# A tibble: 187 x 4
# Groups:   country [187]
   country        confirmed death recovered
   <chr>              <int> <int>     <int>
 1 US                 21495  1674     -2446
 2 Russia             10899   107      3711
 3 Brazil              8620   808      5213
 4 India               3524   121      1871
 5 United Kingdom      3409   628         8
 6 Peru                3237    96       918
 7 Pakistan            2255    31       257
 8 Mexico              1997   353      2835
 9 Saudi Arabia        1911     9      2520
10 Turkey              1704    53      3109
# … with 177 more rows

tl;dr


library(coronavirus)

data("coronavirus")

head(coronavirus) 
tail(coronavirus)

library(dplyr)

summary_df <- coronavirus %>% group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20)

library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country = country, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)

Code learned this week

Command Detail
data(“coronavirus”) Get data for of all Corona Virus cases

References

This tutorial uses the coronavirus package, created by Rami Krispin.


Citation

For attribution, please cite this work as

Warin (2020, April 2). Thierry Warin: [API] coronavirus. Retrieved from https://warin.ca/posts/api-coronavirus/

BibTeX citation

@misc{warin2020[api],
  author = {Warin, Thierry},
  title = {Thierry Warin: [API] coronavirus},
  url = {https://warin.ca/posts/api-coronavirus/},
  year = {2020}
}