Automatic Quality Assessment of Wikipedia Articles using Deep Neural Networks

Thesis Type Master
Thesis Status
Currently running
Student Manuel Schmidt
Thesis Supervisor
Research Field

To date, the quality of Wikipedia articles is estimated manually by the Wikipedia community. These manual quality grading varies across different Wikipedia editions and naturally is highly subjective. Therefore, we aim to automate this process by utilizing machine learning algorithms to estimate the quality of articles. We use Doc2Vec to compute latent representations of each article and combine it with existing quality feature metrics. All data will be entered into a deep neural network (DNN) that automatically learns from the data and provides us with precise gradings for Wikipedia articles. State of the art classifiers are able to generate 60% accuracy at predicting the Wikipedia defined quality labels. The goal is to find new ways to determine the quality metric and optimize the accuracy as much as possible. Further, we aim to perform a thorough evaluation of the proposed approaches and the impact of different techniques on the prediction quality.