Based on the methodology of the book “Text Mining with R”, you will learn to do text mining on tweets.
Silge and Robinson (2016) developed the tidytext R package to use tidy data principles to make many text mining tasks easier, more effective, and consistent with tools already in wide use. They wrote a book that serves as an introduction of text mining using the tidytext package and other tidy tools in R. To understand the methodology developed by Silge and Robinson (2016), this course will teach you how to use it to analyse a sample of tweets on AI.
Familiarize yourself with the tidy text format. Convert your tweets in tibble, break it into individual tokens with the process of tokenization and then find the most common words of the tweets.
Evaluate the opinion or emotion of the tweets by using sentiment lexicons. Find the most common positive and negative words and visualize this output in a wordcloud.
Quantify what the tweets are about by: (1) measuring the frequency of a word i.e. term frequency (tf), (2) evaluating the relationship between the frequency of a word and its rank i.e. Zipf’s law and (3) decreasing the weight for commonly used words and increasing the weight for words that are not used very much in tweets i.e. the idea of tf-idf.
Tokenize into consecutive sequences of words, called n-grams, to see how often word X is followed by word Y. Count and correlate pairs of words to find words that tend to co-occur within the tweets.
Tidy a document-term matrix and convert tweets into a matrix.
Use the function LDA() to find the mixture of words that is associated with each topic and determine the mixture of topics that describes the tweets.
Calculate the word frequencies, compare word usage and changes in word use. Explore which words are more likely to be retweeted and favorited (liked).
For attribution, please cite this work as
Warin (2019, Dec. 6). Thierry Warin, PhD: [R Course] Text Mining with R: The case of Tweets. Retrieved from https://warin.ca/posts/rcourse-text-mining-with-r/
BibTeX citation
@misc{warin2019[r, author = {Warin, Thierry}, title = {Thierry Warin, PhD: [R Course] Text Mining with R: The case of Tweets}, url = {https://warin.ca/posts/rcourse-text-mining-with-r/}, year = {2019} }