Real-time Twitter Sentiment Classification based on Apache Storm
The main goal of this master’s thesis is to integrate techniques of sen- timent classification within a real-time processing system. Therefore, it presents an approach called SentiStorm, which is based on Apache Storm and uses different machine learning techniques to identify the sentiment of a tweet. SentiStorm uses Part-of-Speech (POS) tags, Term Frequency–Inverse Document Frequency (TF-IDF) and multiple sentiment lexica to extract a feature vector out of a tweet. This extracted feature vector is processed by a Support Vector Machine (SVM), which predicts the sentiment based on a trained dataset.
Finally, this thesis will present the evaluation of SentiStorm based on the Semantic Evaluation (SemEval) dataset of 2013. The quality evaluation shows that SentiStorm is comparable with state-of-art sentiment classification systems. In addition to its high prediction quality, the per- formance results proof the possibility to run this sentiment classification also in real-time.