[Article] Twitter Analysis on Anti and Pro Vaccination

Research Article_s NLP

In these trying times, we collected and analysed the tweets on anti and pro vaccination.

Thierry Warin https://www.warin.ca (HEC Montréal & CIRANO (Canada))https://www.hec.ca/en/profs/thierry.warin.html

We collected almost 657,000 vaccination-related tweets between January 1, 2018 and November 09, 2020 using a set of predefined search terms (“vaccine” OR “vaccines” OR “vaccinate” OR “vaccination” OR “vaccineswork” OR “antivax” OR “vaccinesdontwork” OR “provax” OR “vaxwithme” OR “antivaxxers” OR “immunization”) [exact total number: 656 994 tweets].

We have selected the English language to be processed and removed the other languages from the analysis. We identified and removed retweets from the analysis, as well as punctuation, Twitter user mentions (@username), numbers, html links and pictwitter. We also removed the keywords used to download tweets and stop words (for each language) such as “an” and “the” in order to better highlight the most recurring words.

An exploratory data analysis of the tweets posted by date, time, common, positive/negative words and other parameters was done. This analysis revealed an overview of the apprehension of the vaccination around the world.

This article is inspired by the data pipeline presented in Warin (2020).

Tidy Text

The methodology used for this section is based on the work of Silge and Robinson (2019).

Number of tweets

Common, positive and negative words


Document-feature matrix (DFM)

Top Hashtags

The keywords used to downloads the tweets were removed to produced this visual.

Top Users

The Twitter user mentions (@username) were kept in this case to produce the visual.

Structutal Topic Modeling (STM)

The methodology used for this section is based on the work of Roberts, Stewart, and Tingley (2019).

Top Topics

Topics Correlations

Roberts, Margaret E., Brandon M. Stewart, and Dustin Tingley. 2019. “Stm: An R Package for Structural Topic Models.” Journal of Statistical Software 91 (2).

Silge, Julia, and David Robinson. 2019. Text Mining with R. https://www.tidytextmining.com/.

Warin, Thierry. 2020. “Global Research on Coronaviruses: An R Package.” Journal of Medical Internet Research 22 (8).



For attribution, please cite this work as

Warin, "www.warin.ca: [Article] Twitter Analysis on Anti and Pro Vaccination", , 2021

BibTeX citation

  author = {Warin, Thierry},
  title = {www.warin.ca: [Article] Twitter Analysis on Anti and Pro Vaccination},
  journal = {},
  year = {2021},
  note = {https://warin.ca/posts/article-vaccination-anti-pro/},
  doi = {}