[Article] Twitter Analysis on Covid-19 Pandemic

Research NLP

In these trying times, we collected and analysed the tweets on Covid-19 Pandemic.

Nathalie de Marcellis-Warin https://ivado.ca/en/person/nathalie-de-marcellis-warin/ (Polytechnique Montreal & CIRANO (Canada))https://ivado.ca/en/person/nathalie-de-marcellis-warin/ , Thierry Warin https://www.warin.ca (HEC Montréal and CIRANO (Canada))https://www.hec.ca/en/profs/thierry.warin.html
07-31-2020

We collected almost 6.5 million coronavirus-related tweets between December 1, 2019 and June 01, 2020, using a set of predefined search terms (“COVID-19” OR Coronavirus OR “2019-nCoV”) [exact total number: 6 664 956 tweets].

We have selected 5 languages to be processed which are found in the tweets: English, Spanish, Chinese, German, French and Italian. We removed the other languages from the analysis. We identified and removed retweets from the analysis, as well as punctuation, Twitter user mentions (@username), numbers, html links and pictwitter. We also removed the keywords used to download tweets and stop words (for each language) such as “an” and “the” in order to better highlight the most recurring words.

An exploratory data analysis of the tweets posted by date, time, country, language and other parameters was done. This analysis revealed an overview of the apprehension of the pandemic around the world.

Moreover, we extracted from the global database specific Tweets about the province of Quebec (Canada) with search terms (“polqc” OR “polQC” OR “quebec” OR “québec” OR “legault” OR “arruda” OR “chsld” OR “CHSLD” OR ”herron”).

Analysis of the Worldwide Dataset

[Worldwide Dataset]: Number of tweets

Number of tweets per month
Number of tweets per time

[Worldwide Dataset]: The most common words

English most common words
French most common words
Spanish most common words
Italian most common words
Dutch most common words
Chinese most common words

[Worldwide Dataset]: Wordclouds

English most common words
French most common words
Spanish most common words
Italian most common words
Dutch most common words
Chinese most common words

[Worldwide Dataset]: Bigram Tokenization

Bigrams Network
Bigrams Sentiment Analysis: Words preceded by “containment”

[Worldwide Dataset]: Latent Dirichlet Allocation

The terms that are most common within each topic Words with the greatest difference in β between topic 2 and topic 1

Analysis of the Quebec Dataset

[Quebec Dataset]: Number of tweets

Number of tweets per month
Number of tweets per time

[Quebec Dataset]: The most common words

English most common words
French most common words

[Quebec Dataset]: Wordclouds

English most common wordclouds
French most common wordclouds

[Quebec Dataset]: The most positive and negative words

English most positive and negative words
French most positive and negative words

[Quebec Dataset]: The most positive and negative wordclouds

English most positive and negative wordclouds
French most positive and negative wordclouds

Citation

For attribution, please cite this work as

Marcellis-Warin & Warin (2020, July 31). www.warin.ca: [Article] Twitter Analysis on Covid-19 Pandemic. Retrieved from https://warin.ca/posts/article-twitter-analysis-covid-19/

BibTeX citation

@misc{marcellis-warin2020[article],
  author = {Marcellis-Warin, Nathalie de and Warin, Thierry},
  title = {www.warin.ca: [Article] Twitter Analysis on Covid-19 Pandemic},
  url = {https://warin.ca/posts/article-twitter-analysis-covid-19/},
  year = {2020}
}