Authorship Analysis and Cross-Language Grammar Features

Intrinsic Plagiarism Detection and Authorship Analysis

Capturing the essence of the writing style of authors is an important research area in natural language processing. It allows to identify and attribute the author of a previously unseen document, perform so-called style change detection (find the positions at which the author changes within a document), detect plagiarism intrinsically, develop new technology for writing support, or perform forensic analyses.

To date, detecting variations in the writing style belongs to the most difficult and most interesting challenges in authorship analyses. The task of authorship attribution is particularly challenging in scenarios where ground truth textual data is only available in different languages (for instance, for bilingual authors). Moreover, style change detection is the only means to detect plagiarism in a document if no comparison texts are available.

In our research, we focus on utilizing grammar features for several of the above-mentioned tasks. Thereby, we have pioneered work in cross-language scenarios, where authors have written documents in multiple languages. Current research in this field also covers the detection of social media bots, which have become a more pressing matter in recent years. 

At DBIS, we are part of PAN, an international group of scientists focusing on the writing styles and habits of authors. The PAN initiative organizes shared tasks, where many researchers from across the world compete against each other in finding the best strategies to tackle problems in Authorship Attribution, Author Profiling as well as Multi-Author-Decomposition. Particularly, we are co-organizers of the Style Change Detection task at PAN.

 

Team

Current Theses

Open Currently running

Publications

2024

Bib Link

Eva Zangerle, Maximilian Mayerl, Martin Potthast and Benno Stein: Overview of the Multi-Author Writing Style Analysis Task at PAN 2024. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble, France, 9-12 September, 2024, vol. 3740, pages 2424-2431. CEUR-WS.org, 2024

Bib Link

Janek Bevendorff, Xavier Bonet Casals, Berta Chulvi, Daryna Dementieva, Ashaf Elnagar, Dayne Freitag, Maik Fröbe, Damir Korencic, Maximilian Mayerl, Animesh Mukherjee, Alexander Panchenko, Martin Potthast, Francisco Rangel, Paolo Rosso, Alisa Smirnova, Efstathios Stamatatos, Benno Stein, Mariona Taule, Dmitry Ustalov, Matti Wiegmann and Eva Zangerle: Overview of PAN 2024: Multi-author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification - Extended Abstract. In Advances in Information Retrieval - 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24-28, 2024, Proceedings, Part VI, vol. 14613, pages 3-10. Springer, 2024

2023

Bib Link Download

Eva Zangerle, Maximilian Mayerl, Martin Potthast and Benno Stein: Overview of the Multi-Author Writing Style Analysis Task at PAN 2023. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), Thessaloniki, Greece, September 18th to 21st, 2023, vol. 3497, pages 2513-2522. CEUR-WS.org, 2023

Bib Link Download

Janek Bevendorff, Ian Borrego-Obrador, Mara Chinea-Rios, Marc Franco-Salvador, Maik Fröbe, Annina Heini, Krzysztof Kredens, Maximilian Mayerl, Piotr Pkezik, Martin Potthast, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos, Benno Stein, Matti Wiegmann, Magdalena Wolska and Eva Zangerle: Overview of PAN 2023: Authorship Verification, Multi-Author Writing Style Analysis, Profiling Cryptocurrency Influencers, and Trigger Detection. In Experimental IR Meets Multilinguality, Multimodality, and Interaction, pages 459-481. Springer Nature Switzerland, 2023

Bib Link Download

Janek Bevendorff, Mara Chinea-Rios, Marc Franco-Salvador, Annina Heini, Erik Körner, Krzysztof Kredens, Maximilian Mayerl, Piotr Pkezik, Martin Potthast, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos, Benno Stein, Matti Wiegmann, Magdalena Wolska and Eva Zangerle: Overview of PAN 2023: Authorship Verification, Multi-author Writing Style Analysis, Profiling Cryptocurrency Influencers, and Trigger Detection. In Advances in Information Retrieval (ECIR 2023), pages 518-526. Springer Nature Switzerland, 2023

2022

Bib Download

Benjamin Murauer: Universal Grammar Features for Cross-Language Authorship Attribution. PhD thesis, University of Innsbruck, Department of Computer Science, 2022.

Bib Link Download

Janek Bevendorff et al.: Overview of PAN 2022: Authorship Verification, Profiling Irony and Stereotype Spreaders, and Style Change Detection. In Experimental IR Meets Multilinguality, Multimodality, and Interaction - 13th International Conference of the CLEF Association, CLEF 2022, Bologna, Italy, September 5-8, 2022, Proceedings, vol. 13390, pages 382-394. Springer, 2022

Bib Link Download

Eva Zangerle, Maximilian Mayerl, Martin Potthast and Benno Stein: Overview of the Style Change Detection Task at PAN 2022. In Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, vol. 3180, pages 2344-2356. CEUR-WS.org, 2022

Bib Link Download

Janek Bevendorff, Berta Chulvi, Elisabetta Fersini, Annina Heini, Mike Kestemont, Krzysztof Kredens, Maximilian Mayerl, Reyner Ortega-Bueno, Piotr Pezik, Martin Potthast, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos, Benno Stein, Matti Wiegmann, Magdalena Wolska and Eva Zangerle: Overview of PAN 2022: Authorship Verification, Profiling Irony and Stereotype Spreaders, Style Change Detection, and Trigger Detection. In Advances in Information Retrieval. ECIR 2022., pages 331-338. Springer International Publishing, 2022

Bib Link Download

Benjamin Murauer and Günther Specht: DT-grams: Structured Dependency Grammar Stylometry for Cross-Language Authorship Attribution. In Proceedings of the 32nd GI-Workshop Grundlagen von Datenbanksysteme (GvDB'21) . 2022

2021

Bib Link Download

Benjamin Murauer and Günther Specht: Developing a Benchmark for Reducing Data Bias in Authorship Attribution. In Proceedings of the Second Workshop on Evaluation and Comparison of NLP Systems (Eval4NLP'21). 2021

Bib Link Download

Eva Zangerle, Maximilian Mayerl, Martin Potthast and Benno Stein: Overview of the Style Change Detection Task at PAN 2021. In CLEF 2021 Labs and Workshops, Notebook Papers, pages 1760-1771. CEUR-WS.org, 2021

Bib Link Download

Janek Bevendorff, Berta Chulvi, Gretel Liz De la Pena Sarracen, Mike Kestemont, Enrique Manjavacas, Ilia Markov, Maximilian Mayerl, Martin Potthast, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos, Benno Stein, Matti Wiegmann, Magdalena Wolska and Eva Zangerle: Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection. In Experimental IR Meets Multilinguality, Multimodality, and Interaction - 12th International Conference of the CLEF Association, CLEF 2021, Proceedings, vol. 12880, pages 419-431. Springer, 2021

Bib Link Download

Benjamin Murauer and Günther Specht: Small-Scale Cross-Language Authorship Attribution on Social Media Comments. In Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021), pages 11-19. 2021

Bib Link Download

Janek Bevendorff, BERTa Chulvi, Gretel Liz De La Pe\~na Sarrac\'en, Mike Kestemont, Enrique Manjavacas, Ilia Markov, Maximilian Mayerl, Martin Potthast, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos, Benno Stein, Matti Wiegmann, Magdalena Wolska and Eva Zangerle: Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection. In Advances in Information Retrieval. ECIR 2021, pages 567-573. Springer International Publishing, 2021

2020

Bib Link Download

Janek Bevendorff, Bilal Ghanem, Anastasia Giachanou, Mike Kestemont, Enrique Manjavacas, Ilia Markov, Maximilian Mayerl, Martin Potthast, Francisco Rangel, Paolo Rosso, Günther Specht, Efstathios Stamatatos, Benno~Stein, Matti Wiegmann and Eva Zangerle: Overview of PAN 2020: Authorship Verification, Celebrity Profiling, Profiling Fake News Spreaders on Twitter, and Style Change Detection. In 11th International Conference of the CLEF Association (CLEF 2020). Springer, 2020

Bib Link Download

Janek Bevendorff, Bilal Ghanem, Anastasia Giachanou, Mike Kestemont, Enrique Manjavacas, Martin Potthast, Francisco Rangel, Paolo Rosso, Günther Specht, Efstathios Stamatatos, Benno Stein, Matti Wiegmann and Eva Zangerle: Shared Tasks on Authorship Analysis at PAN 2020. In Advances in Information Retrieval (ECIR 2020), pages 508-516. Springer International Publishing, 2020

Bib Link Download

Manfred Moosleitner, Benjamin Murauer and Günther Specht: Detecting Conspiracy Tweets using Support Vector Machines. In Working Notes Proceedings of the MediaEval 2020 Workshop. ceur-ws.org, 2020.

Bib Link Download

Eva Zangerle, Maximilian Mayerl, Günther Specht, Martin Potthast and Benno Stein: Overview of the style change detection task at PAN 2020. In CLEF 2020 Working Notes, CEUR Workshop Proceedings 2696, Paper 256. 9 S. 2020

2019

Bib Link

Michael Tschuggnall, Benjamin Murauer and Günther Specht: Reduce & Attribute: Two-Step Authorship Attribution for Large-Scale Problems. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 951-960. Association for Computational Linguistics, 2019