Style Change Detection for Author Identification

Thesis Type Master
Thesis Status
Currently running
Student Pedro Marques Costa
Thesis Supervisor
Research Field

PAN is a renown initiative in the field of text mining, including authorship identification, author profiling or plagiarism detection. On an annual basis, different tasks are proposed and competitions are held, i.e., participants develop algorithms, which are evaluated on the same data sets, making them comparable. In its 2018 edition, PAN proposed a Style Change Detection task, which has a simple description: given a previously unseen text document, decide whether it was written by one author or by multiple authors.

In this thesis, one or more algorithms should be developed for the style change detection task and evaluated on the dataset provided by PAN. Besides using state-of-the-art machine learning methods, also existing techniques from the related field of text segmentation should be utilized or even included in the algorithms, if they work well.