Predicting the Quality of Articles using Machine Learning Techniques

Thesis Type Master
Thesis Status
Student Manuel Schmidt
Thesis Supervisor
Research Field

This work should contribute a machine learning algorithm which is able to give a quality measure for each Wikipedia article. For that, we use a Doc2Vec representation of each article and combine it with existing feature metrics. All the data will be entered into several machine learning algorithms which automatically learn from the data and give us accurate ratings over the Wikipedia articles. State of the art classifiers are able to generate around 65% accuracy at predicting the Wikipedia defined quality labels. The goal is to find new ways to determine the quality metric and optimize accuracy as much as possible.