Creation of a Multi-Author Analysis Dataset
| Thesis Type | Bachelor |
| Thesis Status |
Finished
|
| Student | Andreas Pittl |
| Init |
|
| Final |
|
| Start |
|
| Thesis Supervisor | |
| Contact | |
| Research Field |
The goal of multi-author analysis is to investigate methods to analyze and characterize the writing style of authors. Multi-author analysis can pave the way for tasks like detecting the positions at which the author changes, or authorship attribution (determining the author of a given text). Developing and training models for multi-author analysis requires a sufficient amount of training data containing texts written by multiple authors with labels specifying the author of each section. The goal of this thesis is the implementation of a dataset generator that is based on the social media platform Reddit. The generator should allow to reproducibly create diverse datasets based on a set of parameters such as the number of different authors, text length, the number of switches between authors, etc.