Creation of a Multi-Author Analysis Dataset
Thesis Type | Bachelor |
Thesis Status |
Finished
|
Student | Andreas Pittl |
Init |
|
Final |
|
Start |
|
Thesis Supervisor | |
Contact | |
Research Field |
The goal of multi-author analysis is to investigate methods to analyze and characterize the writing style of authors. Multi-author analysis can pave the way for tasks like detecting the positions at which the author changes, or authorship attribution (determining the author of a given text). Developing and training models for multi-author analysis requires a sufficient amount of training data containing texts written by multiple authors with labels specifying the author of each section. The goal of this thesis is the implementation of a dataset generator that is based on the social media platform Reddit. The generator should allow to reproducibly create diverse datasets based on a set of parameters such as the number of different authors, text length, the number of switches between authors, etc.