Real-time Twitter Sentiment Classification based on Apache Storm

Thesis Type	Master
Thesis Status	Finished
Student	Martin Illecker
Final	18.03.2015 12:00
Start	07.10.2014 12:00
Thesis Supervisor	Assoc. Prof. Dr. Eva Zangerle
Contact	eva.zangerle@uibk.ac.at
Research Field	Microblog Analyses and Recommendations

The main goal of this master’s thesis is to integrate techniques of sen- timent classification within a real-time processing system. Therefore, it presents an approach called SentiStorm, which is based on Apache Storm and uses different machine learning techniques to identify the sentiment of a tweet. SentiStorm uses Part-of-Speech (POS) tags, Term Frequency–Inverse Document Frequency (TF-IDF) and multiple sentiment lexica to extract a feature vector out of a tweet. This extracted feature vector is processed by a Support Vector Machine (SVM), which predicts the sentiment based on a trained dataset.

Finally, this thesis will present the evaluation of SentiStorm based on the Semantic Evaluation (SemEval) dataset of 2013. The quality evaluation shows that SentiStorm is comparable with state-of-art sentiment classification systems. In addition to its high prediction quality, the per- formance results proof the possibility to run this sentiment classification also in real-time.