Authorship Analysis and Cross-Language Grammar Features

Intrinsic Plagiarism Detection and Authorship Analysis

Capturing the essence of the writing style of authors is an important research area in natural language processing. It allows to identify and attribute the author of a previously unseen document, perform so-called style change detection (find the positions at which the author changes within a document), detect plagiarism intrinsically, develop new technology for writing support, or perform forensic analyses.

To date, detecting variations in the writing style belongs to the most difficult and most interesting challenges in authorship analyses. The task of authorship attribution is particularly challenging in scenarios where ground truth textual data is only available in different languages (for instance, for bilingual authors). Moreover, style change detection is the only means to detect plagiarism in a document if no comparison texts are available.

In our research, we focus on utilizing grammar features for several of the above-mentioned tasks. Thereby, we have pioneered work in cross-language scenarios, where authors have written documents in multiple languages. Current research in this field also covers the detection of social media bots, which have become a more pressing matter in recent years. 

At DBIS, we are part of PAN, an international group of scientists focusing on the writing styles and habits of authors. The PAN initiative organizes shared tasks, where many researchers from across the world compete against each other in finding the best strategies to tackle problems in Authorship Attribution, Author Profiling as well as Multi-Author-Decomposition. Particularly, we are co-organizers of the Style Change Detection task at PAN.

 

Team

Current Theses

Currently running

Publications

2021

Bib Link Download

Benjamin Murauer and Günther Specht: Small-Scale Cross-Language AuthorshipAttribution on Social Media Comments. In Proceedings of the 18th Machine Translation Summit: 4th Workshop on Technologies for MT of Low Resource Languages, pages 11-19. 2021

Bib Link Download

Janek Bevendorff, BERTa Chulvi, Gretel Liz De La Pena Sarracen, Mike Kestemont, Enrique Manjavacas, Ilia Markov, Maximilian Mayerl, Martin Potthast, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos, Benno Stein, Matti Wiegmann, Magdalena Wolska and Eva Zangerle: Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection. In Advances in Information Retrieval, pages 567-573. Springer International Publishing, 2021

2020

Bib Link Download

Janek Bevendorff, Bilal Ghanem, Anastasia Giachanou, Mike Kestemont, Enrique Manjavacas, Martin Potthast, Francisco Rangel, Paolo Rosso, Günther Specht, Efstathios Stamatatos, Benno Stein, Matti Wiegmann and Eva Zangerle: Shared Tasks on Authorship Analysis at PAN 2020. In Advances in Information Retrieval (ECIR 2020), pages 508-516. Springer International Publishing, 2020

Bib Link Download

Manfred Moosleitner, Benjamin Murauer and Günther Specht: Detecting Conspiracy Tweets using Support Vector Machines. In Working Notes Proceedings of the MediaEval 2020 Workshop. ceur-ws.org, 2020.

Bib Link Download

Eva Zangerle, Maximilian Mayerl, Günther Specht, Martin Potthast and Benno Stein: Overview of the style change detection task at PAN 2020. In CLEF 2020 Working Notes, CEUR Workshop Proceedings 2696, Paper 256. 9 S. 2020

2019

Bib Link

Michael Tschuggnall, Benjamin Murauer and Günther Specht: Reduce & Attribute: Two-Step Authorship Attribution for Large-Scale Problems. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 951-960. Association for Computational Linguistics, 2019

Bib Link Download

Walter Daelemans, Mike Kestemont, Enrique Manjavacas, Martin Potthast, Francisco M. Rangel Pardo, Paolo Rosso, Günther Specht, Efstathios Stamatatos, Benno Stein, Michael Tschuggnall, Matti Wiegmann and Eva Zangerle: Overview of PAN 2019: Bots and Gender Profiling, Celebrity Profiling, Cross-Domain Authorship Attribution and Style Change Detection. In Experimental IR Meets Multilinguality, Multimodality, and Interaction - 10th International Conference of the CLEF Association, CLEF 2019, Lugano, Switzerland, September 9-12, 2019, Proceedings, vol. 11696, pages 402-416.

Bib Link Download

Eva Zangerle, Michael Tschuggnall, Günther Specht, Martin Potthast and Benno Stein: Overview of the Style Change Detection Task at PAN 2019. In CLEF 2019 Labs and Workshops, Notebook Papers. CEUR-WS.org, 2019

Bib Link

Benjamin Murauer and Günther Specht: Generating Cross-Domain Text Classification Corpora from Social Media Comments. In 20th Conference and Labs of the Evaluation Forum (CLEF'2019), pages 114-125. Springer International Publishing, 2019

Bib Link

Michael Tschuggnall, Thibault Gerrier and Günther Specht: StyleExplorer: A Toolkit for Textual Writing Style Visualization. In Proceedings of the 41th European Conference on Information Retrieval (ECIR 2019): Advances in Information Retrieval, pages 220-224. Springer International Publishing, 2019

2018

Bib Link

Benjamin Murauer, Michael Tschuggnall and Günther Specht: Dynamic Parameter Search for Cross-Domain Authorship Attribution. In Working Notes of CLEF. 2018

Bib Link

M Kestemont, M Tschuggnall, E Stamatatos, W Daelemans, G Specht, B Stein and M Potthast: Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection. In Working Notes Papers of the CLEF. 2018

Bib Link

Efstathios Stamatatos, Francisco Rangel, Michael Tschuggnall, Mike Kestemont, Paolo Rosso, Benno Stein and Martin Potthast: Overview of PAN-2018: Author Identification, Author Profiling, and Author Obfuscation. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. 9th International Conference of the CLEF Initiative (CLEF 18). Springer, Berlin Heidelberg New York (Sep 2018). 2018

Bib Link Download

Eva Zangerle, Michael Tschuggnall, Stefan Wurzinger and Günther Specht: ALF-200k: Towards Extensive Multimodal Analyses of Music Tracks and Playlists. In Advances in Information Retrieval - 39th European Conference on IR Research (ECIR 2018), pages 584-590. Springer, 2018

Bib

Benjamin Murauer, Michael Tschuggnall and Günther Specht: On the Influence of Machine Translation on Language Origin Obfuscation. In Proceedings of the 18th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2018). 2018 (to be published)

2017

Bib Link

Michael Tschuggnall: Automatisierte Plagiatserkennung in Textdokumenten: Was der Schreibstil eines Autors über die Echtheit verrät. In S. Mauler, H. Ortner, U. Pfeiffenberger (Edt): Medien und Glaubwürdigkeit, pages 131-140, Innsbruck University Press, 2017

Bib Link

Martin Potthast, Francisco Rangel, Michael Tschuggnall, Efstathios Stamatatos, Paolo Rosso and Benno Stein: Overview of PAN’17: Author Identification, Author Profiling, and Author Obfuscation. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. 8th International Conference of the CLEF Initiative (CLEF 17). Springer, Berlin Heidelberg New York (Sep 2017). 2017

Bib Link

Michael Tschuggnall, Efstathios Stamatatos, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast: Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering. In CEUR Workshop Proceedings, CLEF 2017 Working Notes, Dublin, Ireland, September 11-14, 2017.

2016

Bib Link

Efstathios Stamatatos, Michael Tschuggnall, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast: Clustering by Authorship Within and Across Documents. In Working Notes Papers of the CLEF 2016 Evaluation Labs, CEUR Workshop Proceedings, September 2016. Pages 691-715. CLEF and CEUR-WS.org. ISSN 1613-0073.

Bib Link

Paolo Rosso, Francisco Rangel, Martin Potthast, Efstathios Stamatatos, Michael Tschuggnall and Benno Stein: Overview of PAN'16 - New Challenges for Authorship Analysis: Cross-genre Profiling, Clustering, Diarization, and Obfuscation. In Norbert Fuhr et al, editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction. 7th International Conference of the CLEF Initiative (CLEF 16), Berlin Heidelberg New York, September 2016. Springer. ISBN 978-3-319-44564-9.