Collection of Negative Samples for Hit-Song Prediction

Thesis Type Master
Thesis Status
Student Federica Cenzuales
Thesis Supervisor
Research Field

Hit song prediction is the task of predicting whether a given song is going to be a hit -- e.g., make it into the charts. One way of realizing this is as a binary classification model which is able to assign a given song to one of two classes: hit or non-hit.

To train such a machine learning model, positive (hits) as well as negative samples (non-hits) are required. Obtaining positive samples is relatively straightforward -- we can define all songs that made it into the charts (e.g., the Billboard Hot 100) as positive samples. Negative samples, on the other hand, are more tricky:

  • We have to ensure that the songs we choose as negative samples had the chance to make it into the charts we use as our source for the positive samples.

  • For positive samples, there is a natural measure for positivity -- the peak position the song reached on the charts. No such natural measure exists for negative samples.

In this thesis, the goals are to devise a method for collecting true negative samples as well as to develop an effective measure for negativity.