Authorship Obfuscated Vector Representations for Text Mining

Thesis Type	Master
Thesis Status	Finished
Student	Daniel Egger
Final	24.01.2023 12:00
Start	01.11.2020 12:00
Thesis Supervisor	Assoc. Prof. Dr. Eva Zangerle
Contact	eva.zangerle@uibk.ac.at
Research Field	Authorship Analysis and Cross-Language Grammar Features

Analyzing a text’s linguistic style can be a threat to the privacy of authors who wish to conceal their identity. Automated authorship attribution methods using text mining techniques are getting increasingly more accurate. Still, research on authorship obfuscation shows that authorship attribution methods can be disturbed by altering the linguistic style of texts. Currently, research on such authorship obfuscation methods focuses mainly on producing obfuscated, but human readable transcriptions of its input texts. State-of-the-art authorship obfuscation methods struggle with a negative correlation of obfuscation safety and human readability, often needing to sacrifice safety to keep obfuscated texts human readable. The generation of obfuscated text representations for use in text mining tasks such as topic classification or sentiment analysis is less explored. In text mining, texts are commonly represented as numerical vectors. Since human readability is not a concern in that case, obfuscation methods can focus on obfuscation safety. This thesis develops and explores methods for generating obfuscated text vector representations for use in utility text mining tasks other than authorship attribution and author profiling. The discussed methods are evaluated regarding their safety against authorship attribution attacks as well as their accuracy in utility text mining tasks.