In these trying times, we collected and analysed the tweets on anti and pro vaccination.
We collected vaccines’ name related tweets between January 10, 2020 and March 24, 2021 using the list of vaccines’ name currently “in use” provided by the London School of Hygiene & Tropical Medicine (2021) as predefined search terms (“Cormirnaty” OR “CoronaVac” OR “Covaxin” OR “Covishield” OR “SputnikV”).
Institutes | Vaccine in use |
---|---|
Beijing Institute of Biological Products/Sinopharm | BBIBP-CorV |
Bharat Biotech/ICMR/National Institute of Virology | Covaxin |
Chumakov Center,/Russian Academy of Sciences | CoviVac |
Sinovac | CoronaVac |
Wuhan Institute of Biological Products/Sinopharm | WIBP vaccine |
Anhui Zhifei Longcom Biopharmaceutical/Chinese Academy of Sciences | ZF2001 |
Vector Institute (peptide) | EpiVacCorona |
BioNTech/Pfizer/Fosun Pharma | BNT162 |
Moderna/NIAID | mRNA-1273 |
CanSino Biological Inc/Beijing Institute of Biotechnology | Ad5-nCoV |
Gamaleya Research Institute | Gam-COVID-Vac/Sputnik V |
Janssen Pharmaceutical Companies | Ad26.COV2.S |
University of Oxford/AstraZeneca | ChAdOx1-S |
A total of 50711 tweets was collected, with a number of tweets for each vaccine respectively: Cormirnaty (431); CoronaVac (21973); Covaxin (11068); Covishield (7154) and SputnikV (10085).
We have selected the English language and the official language of each vaccin’s origin country to be processed. We removed the other languages from the analysis. We identified and removed retweets from the analysis, as well as punctuation, Twitter user mentions (@username), numbers, html links and pictwitter. We also removed stop words (for each language) such as “an” and “the” in order to better highlight the most recurring words.
The Structural Topic Model stm
R package developed by Roberts, Stewart, and Tingley (2019) was used to flexibly estimate a topic model that includes document-level metadata on the tweets. We ran a stm
model on each vaccine’s name data (.csv file) with the language as a covariate in the topic prevalence portion of the model. We will let prevalence be a function of the “language” variable, which is coded as either “en” or the country official language code (ex: “hi” for Hindi, “de” for “German”, etc.) and the variable “day” which is an integer measure of days running from the first to the last day of a year.
This article is inspired by the data pipeline presented in Warin (2020).
Topic 6 is showed in this visual.
Topic 6 is showed in this visual.
Topic 6 is showed in this visual.
Topic 6 is showed in this visual.
Topic 8 is showed in this visual.
Topic 8 is showed in this visual.
Topic 7 is showed in this visual.
Topic 7 is showed in this visual.
Topic 9 is showed in this visual.
Topic 9 is showed in this visual.
London School of Hygiene & Tropical Medicine, Vaccine Centre at the. 2021. “COVID-19 Vaccine Tracker.” https://github.com/vac-lshtm/VaC_tracker.
Roberts, Margaret E., Brandon M. Stewart, and Dustin Tingley. 2019. “Stm: An R Package for Structural Topic Models.” Journal of Statistical Software 91 (2).
Warin, Thierry. 2020. “Global Research on Coronaviruses: An R Package.” Journal of Medical Internet Research 22 (8): e19615. https://doi.org/10.2196/19615.
For attribution, please cite this work as
Warin, "Thierry Warin, PhD: [Article] Twitter Analysis on Vaccines Name", , 2021
BibTeX citation
@article{warin2021[article], author = {Warin, Thierry}, title = {Thierry Warin, PhD: [Article] Twitter Analysis on Vaccines Name}, journal = {}, year = {2021}, note = {https://warin.ca/posts/article-vaccines-name/}, doi = {} }