Creation of a Multi-Author Analysis Dataset

Thesis Type Bachelor
Thesis Status
Student Andreas Pittl
Thesis Supervisor
Research Field

The goal of multi-author analysis is to investigate methods to analyze and characterize the writing style of authors. Multi-author analysis can pave the way for tasks like detecting the positions at which the author changes, or authorship attribution (determining the author of a given text). Developing and training models for multi-author analysis requires a sufficient amount of training data containing texts written by multiple authors with labels specifying the author of each section. The goal of this thesis is the implementation of a dataset generator that is based on the social media platform Reddit. The generator should allow to reproducibly create diverse datasets based on a set of parameters such as the number of different authors, text length, the number of switches between authors, etc.