Skip to Tutorial Content

Content

Plan

In Part 1, called Context, you'll learn about your mission. In section 2, called Using Microsoft Excel, you'll use Microsoft's Excel spreadsheet to create a visual. In section 3, Using R, you'll learn how to visualize data using the R language.

Instructions

The exercise is divided into several parts offering different modes of interaction. The interactions provide you with additional information, test your knowledge or ask you to write your own code. You don't have to solve the exercises in the order given, but as they complement each other, it's advisable to do them in order. Below you'll find the different types of interaction with their corresponding functions:

  • Information boxes provide additional information on technical terms or explanations of R functions.

  • Code boxes require you to interact with pieces of code and are marked as Tasks. The process of resolving code boxes is fairly intuitive:

  • startover: Cleans up your code box to keep only the preset code.
  • solution: Displays the task solution.
  • run code: Executes the code without checking its correctness.
  • submit answer: Similar to run code, you execute the chunk, but this time your answer is checked for correctness.

Context

In these uncertain times, you have been hired by the World Health Organization (WHO) to gather information about Covid-19. As an analyst, you decide to look at bibliographic data, since hundreds of articles have been written on this subject since the start of the pandemic.

Using Microsoft Excel

After calculating the number of articles published for each year, you want to create a graph to better visualize the evolution of the number of articles over time.

Task: Using Microsoft Excel, produce an online graph of the number of articles published each year in 5 minutes.

Using the R language

After several attempts to visualize the evolution of the number of items using Microsoft Excel, you want to use R to do so.

Task 1: Let's download the data again into a variable called mydata by performing the line of code below.You can finally display the result by simply writing mydata in the code box and rolling it over.

mydata <- 
mydata <- EpiBib_data

Note: if you've displayed the result by typing mydata in the code box, a table will appear, and you can click on the triangle at the top right of the results table to navigate through the table's columns.* Task 2 : Next, you need to set the PY variable, which contains the years, to numeric format, using the as.numeric() function.

Task 2: Next, you need to set the PY variable, which contains the years, to numeric format, using the as.numeric() function.

mydata <-
mydata$___ <- as.numeric(mydata$___)
mydata <- EpiBib_data
mydata$PY <- as.numeric(mydata$PY)

Note: You can check whether the column is in numeric format by typing this line of code is.numeric(mydata$PY) in the code box. If the answer displayed is [1] FALSE, then the column is not in numeric format. Conversely, if the answer is [1] TRUE, then the column is in numeric format. ATTENTION: Be careful to distinguish between as.numeric(), which applies the numeric format, and is.numeric, which allows you to query whether the column format is indeed numeric..

Task 3: Finally, complete the graph code with the PY variable as abscissa (x=), which now contains the years in numeric format. The argument data= takes as data the variable mydata containing our data array.

mydata <-
mydata$___ <- as.numeric(mydata$___)

ggplot(data = ___, aes(x = ___)) +
  geom_line(aes(fill=..count..), stat="bin", bins = 30, size = 0.8, color = "olivedrab") + 
  geom_point(aes(fill=..count..), stat="bin", bins = 30, size = 2.5, color = "olivedrab") + 
  xlab("Years") + ylab("Number of articles") + 
  theme_minimal() + 
  theme(legend.position = "none")
mydata <- EpiBib_data
mydata$PY <- as.numeric(mydata$PY)

ggplot(data = mydata, aes(x = PY)) +
  geom_line(aes(fill=..count..), stat="bin", bins = 30, size = 0.8, color = "olivedrab") + 
  geom_point(aes(fill=..count..), stat="bin", bins = 30, size = 2.5, color = "olivedrab") + 
  xlab("Years") + ylab("Number of articles") + 
  theme_minimal() + 
  theme(legend.position = "none")

*Note: Explanation of code elements.

  • geom_line(): draws the line
  • geom_point(): constructs points on the line
  • fill=..count..: counts items by year (PY)
  • xlab & ylab: rename x and y axes
  • theme_minimal()`: defines a theme
  • legend.position: removes the legend

Acknowledgments

To cite this course:

Warin, Thierry. 2020. “Nüance-R: R Courses.” doi:10.6084/m9.figshare.11744013.v2.

Data visualization