In these trying times, we collected and analysed the tweets on Covid-19 Pandemic.
We collected almost 6.5 million coronavirus-related tweets between December 1, 2019 and June 01, 2020, using a set of predefined search terms (“COVID-19” OR Coronavirus OR “2019-nCoV”) [exact total number: 6 664 956 tweets].
We have selected 5 languages to be processed which are found in the tweets: English, Spanish, Chinese, German, French and Italian. We removed the other languages from the analysis. We identified and removed retweets from the analysis, as well as punctuation, Twitter user mentions (@username), numbers, html links and pictwitter. We also removed the keywords used to download tweets and stop words (for each language) such as “an” and “the” in order to better highlight the most recurring words.
An exploratory data analysis of the tweets posted by date, time, country, language and other parameters was done. This analysis revealed an overview of the apprehension of the pandemic around the world.
Moreover, we extracted from the global database specific Tweets about the province of Quebec (Canada) with search terms (“polqc” OR “polQC” OR “quebec” OR “québec” OR “legault” OR “arruda” OR “chsld” OR “CHSLD” OR ”herron”).
For attribution, please cite this work as
Marcellis-Warin & Warin (2020, July 31). Thierry Warin, PhD: [Article] Twitter Analysis on Covid-19 Pandemic. Retrieved from https://warin.ca/posts/article-twitter-analysis-covid-19/
BibTeX citation
@misc{marcellis-warin2020[article], author = {Marcellis-Warin, Nathalie de and Warin, Thierry}, title = {Thierry Warin, PhD: [Article] Twitter Analysis on Covid-19 Pandemic}, url = {https://warin.ca/posts/article-twitter-analysis-covid-19/}, year = {2020} }