Date of Completion


Embargo Period



Twitter, Stock market, Stock options, Machine learning, Data mining

Major Advisor

Bing Wang

Associate Advisor

Mukul Bansal

Associate Advisor

Swapna Gokhale

Associate Advisor

Song Han

Associate Advisor

Mohammad Khan

Field of Study

Computer Science and Engineering


Doctor of Philosophy

Open Access

Open Access


Twitter has rapidly gained popularity since its creation in March 2006. Stock is a popular topic in Twitter. Many traders, investors, financial analysts and news agencies post tweets about various stocks on a daily basis. These tweets reflect their collective wisdom, and may provide important insights on the stock market. In this dissertation work, we investigate using the tweets concerning Standard & Poor 500 (S&P 500) stocks to analyze the stock markets and assist stock and option trading.

The first part of the dissertation focuses on understanding the correlation between Twitter data and stock trading volume, and predicting stock trading volume using Twitter data. We first investigate whether the daily number of tweets that mention S&P 500 stocks is correlated with the stock trading volume. Our results indicate correlation at three different levels, from the stock market to industry sector and individual company stocks. We then develop two models, one based on linear regression and the other based on multinomial logistic regression, to predict individual stock trading volume into three categories, namely, low, normal and high. We find that the multinomial logistic regression model outperforms the linear regression model, and adding Twitter data to the prediction models is indeed beneficial. For the 78 individual stocks that have significant number of daily tweets, the multinomial logistic regression model achieves 57.3% precision for predicting low trading volume and 67.2% precision for predicting high volume.

The number of tweets concerning a stock varies over days, and sometimes exhibits a significant spike. In the second part of the dissertation, we investigate Twitter volume spikes related to S&P 500 stocks, and whether they are useful for stock trading. Through correlation analysis, we provide insight on when Twitter volume spikes occur and possible causes of these spikes. We further explore whether these spikes are surprises to market participants by comparing the implied volatility of a stock before and after a Twitter volume spike. Moreover, we develop a Bayesian classifier that uses Twitter volume spikes to assist stock trading, and show that it can provide substantial profit. We further develop an enhanced strategy that combines the Bayesian classifier and a stock bottom picking method, and demonstrate that it can achieve significant gain in a short amount of time. Simulation over a half year’s stock market data indicates that it achieves on average 8.6% gain in 27 trading days and 15.0% gain in 55 trading days. Statistical tests show that the gain is statistically significant, and the enhanced strategy significantly outperforms the strategy that only uses the Bayesian classifier as well as a bottom picking method that only uses trading volume spikes.

In the third part of the dissertation, we investigate the relationship between Twitter volume spikes and stock options pricing. We start with the underlying assumption of the Black-Scholes model, the most widely used model for stock options pricing, and investigate when this assumption holds for stocks that have Twitter volume spikes. We find that the assumption is less likely to hold in the time period before a Twitter volume spike, and is more likely to hold afterwards. In addition, the volatility of a stock is significantly lower after a Twitter volume spike than that before the spike. We also find that implied volatility increases sharply before a Twitter volume spike and decreases quickly afterwards. In addition, put options tend to be priced higher than call options. Last, we find that right after a Twitter volume spike, options may still be overpriced. Based on the above findings, we propose a put spread selling strategy for stock options trading. Realistic simulation of a portfolio using one year stock market data demonstrates that, even in a conservative setting, this strategy achieves a 34.3% gain when taking account of commissions and ask-bid spread, while S&P 500 only increases 12.8% in the same period.