Date of Completion

Spring 5-9-2022

Thesis Advisor(s)

Joseph Johnson, Swapna Gokhale

Honors Major

Computer Science and Engineering


Artificial Intelligence and Robotics | Theory and Algorithms


My research mines public emotion toward the Covid-19 vaccine based on Twitter data collected over the past 6-12 months. This project is centered around building and developing machine learning and deep learning models to perform natural language processing of short-form text, which in our case tweets. These tweets are all vaccine-related tweets and the goal of the classification task is for our models to accurately classify a tweet into one of four emotion groups: Apprehension/Anticipation, Sadness/Anger/Frustration, Joy/Humor/Sarcasm, and Gratitude/Relief. Given this data and the goal of the paper, we aim to answer the following questions: (1) Can a framework be developed for machine learning and deep learning multiclass classification models to accurately infer one of four listed emotion groups represented by a vaccine-related tweet? A follow-up to this question is: Can we improve the overall model performance by clustering the emotions into a ternary classification problem? (2) Is there a significant binary distinction that can be made between tweets that express “negative” emotions (Apprehension, Anticipation, Sadness, Anger, and Frustration) and “positive” emotions (Joy, Humor, Sarcasm, Gratitude, and Relief)? This research will present a framework that takes in the raw tweet data and through a pipeline that applies data preprocessing, feature extraction, data splitting & sampling, and ultimately emotion classification. Through these questions, the aim is not only to determine the overall acceptance and sentiment of the vaccines by the public but also to understand the steps public health officials can take to further educate hesitant and/or fearful citizens while also incentivizing it.