Intrinsic Plagiarism Detection and Author Analysis

Intrinsic Plagiarism Detection and Authorship Analysis

Today more and more text documents are made publicly available through large text collections or literary databases. As recent events show, the detection of plagiarism in such systems becomes considerably more important. To counter this problem, we propose the Plag-Inn algorithm in different variants, which attempt to expose plagiarism in text documents by analyzing the grammar of authors and finding significant stylistic differences within a single document.

The algorithms are also adapted so they can be applied to the fields of Authorship Attribution, Author Profiling as well as Multi-Author-Decomposition. Given a previously unseen text document, the question of Author Attribution is to predict the correct author, whereas the aim of Author Profiling is to extract meta information like the gender or the age of the writer. Finally, also collaboratively written documents can be automatically decomposed and clustered by distinct writers. All approaches reuse the idea of inspecting the grammar syntax of sentences.

Team

Current Theses

Open Currently running

Publications

2017

Bib

Michael Tschuggnall: Automatisierte Plagiatserkennung in Textdokumenten: Was der Schreibstil eines Autors über die Echtheit verrät. In S. Mauler, H. Ortner, U. Pfeiffenberger (Edt): Medien und Glaubwürdigkeit, pages 131-140, Innsbruck University Press, 2017

Bib Link

Martin Potthast, Francisco Rangel, Michael Tschuggnall, Efstathios Stamatatos, Paolo Rosso and Benno Stein: Overview of PAN’17: Author Identification, Author Profiling, and Author Obfuscation. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. 8th International Conference of the CLEF Initiative (CLEF 17). Springer, Berlin Heidelberg New York (Sep 2017). 2017

Bib Link

Michael Tschuggnall, Efstathios Stamatatos, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast: Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering. In CEUR Workshop Proceedings, CLEF 2017 Working Notes, Dublin, Ireland, September 11-14, 2017.

2016

Bib Link

Efstathios Stamatatos, Michael Tschuggnall, Ben Verhoeven, Walter Daelemans, Guenther Specht, Benno Stein, and Martin Potthast: Clustering by Authorship Within and Across Documents. In Working Notes Papers of the CLEF 2016 Evaluation Labs, CEUR Workshop Proceedings, September 2016. CLEF and CEUR-WS.org. ISSN 1613-0073.

Bib Link

Paolo Rosso, Francisco Rangel, Martin Potthast, Efstathios Stamatatos, Michael Tschuggnall and Benno Stein: Overview of PAN'16 - New Challenges for Authorship Analysis: Cross-genre Profiling, Clustering, Diarization, and Obfuscation. In Norbert Fuhr et al, editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction. 7th International Conference of the CLEF Initiative (CLEF 16), Berlin Heidelberg New York, September 2016. Springer. ISBN 978-3-319-44564-9.

Bib Download

Michael Tschuggnall, Günther Specht and Christian Riepl: Algorithmisch unterstützte Literarkritik: Eine grammatikalische Analyse zur Bestimmung von Schreibstilen. In In Memoriam Wolfgang Richter, Hrsg.: H. Rechenmacher, pages 415-428. EOS-Verlag, 2016.

Bib

Michael Tschuggnall and Günther Specht: From Plagiarism Detection to Bible Analysis: The Potential of Machine Learning for Grammar-Based Text Analysis. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 245-248. 2016

2015

Bib Link

Michael Tschuggnall and Günther Specht: On the Potential of Grammar Features for Automated Author Profiling. In International Journal On Advances in Intelligent Systems, Volume 8, Number. 3&4, pages 255-265, 2015.

Bib

Michael Tschuggnall: Intrinsische Plagiatserkennung und Autorenerkennung mittels Grammatikanalyse. In Ausgezeichnete Informatikdissertationen 2014, Volumne D-15, pages 279-288. Bonner Köllen Druck+Verlag, 2015.

2014

Bib Download

Michael Tschuggnall: Intrinsic Plagiarism Detection and Author Analysis By Utilizing Grammar. PhD thesis, University of Innsbruck, Department of Computer Science, 2014.

Bib Link

Michael Tschuggnall and Günther Specht: Automatic Decomposition of Multi-Author Documents Using Grammar Analysis. In Proceedings of the 26th GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken), October 2014, Bozen, Italy. CEUR-WS.org, Volume 1313, pages 17-22, 2014

Bib Link

Michael Tschuggnall, Günther Specht: What Grammar Tells About Gender and Age of Authors. In Proceedings of the 4th International Conference on Advances in Information Mining and Management (IMMM), July 2014, Paris, France, pp. 30-35, 2014

Bib Link

Michael Tschuggnall and Günther Specht: Enhancing Authorship Attribution By Utilizing Syntax Tree Profiles. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, April 2014, ACL, Gothenburg, Sweden, pages 195-199, 2014.

2013

Bib

Michael Tschuggnall and Günther Specht: Countering Plagiarism by Exposing Irregularities in Authors Grammars. In EISIC 2013, European Intelligence and Security Informatics Conference, 12.-14. August 2013, Uppsala, Sweden, IEEE, pages 15-22, 2013

Bib Link

Michael Tschuggnall and Günther Specht: Using Grammar-Profiles to Intrinsically Expose Plagiarism in Text Documents. In 18th Int. Conference of Natural Language Processing and Information Systems (NLDB 2013), Manchester, UK, June 2013, Springer, LNCS Volume 7934, pages 297-302, 2013

Bib Link

Michael Tschuggnall and Günther Specht. Detecting Plagiarism in Text Documents through Grammar-Analysis of Authors. In BTW 2013, 15. GI-Fachtagung Datenbanksysteme für Business, Technologie und Web, 11. März – 15. März 2013 Magdeburg, LNI, pages 241-259, 2013

Bib

Michael Tschuggnall and Günther Specht: Plag-Inn: Uncovering Plagiarism by Examining Author’s Grammar Syntax. In M. Barden, A. Ostermann (ed): Scientific Computing @ uibk, innsbruck university press, pages 151-152, 2013

2012

Bib Link

Michael Tschuggnall and Günther Specht. Plag-Inn: Intrinsic Plagiarism Detection Using Grammar Trees. In 17th Int. Conference of Natural Language Processing and Information Systems (NLDB 2012), Groningen, The Netherlands, June 2012, Springer, LNCS Volume 7337, pages 284-289, 2012