Intrinsic Plagiarism Detection and Author Analysis
Today more and more text documents are made publicly available through large text collections or literary databases. As recent events show, the detection of plagiarism in such systems becomes considerably more important. To counter this problem, we propose the Plag-Inn algorithm in different variants, which attempt to expose plagiarism in text documents by analyzing the grammar of authors and finding significant stylistic differences within a single document.
The algorithms are also adapted so they can be applied to the fields of Authorship Attribution, Author Profiling as well as Multi-Author-Decomposition. Given a previously unseen text document, the question of Author Attribution is to predict the correct author, whereas the aim of Author Profiling is to extract meta information like the gender or the age of the writer. Finally, also collaboratively written documents can be automatically decomposed and clustered by distinct writers. All approaches reuse the idea of inspecting the grammar syntax of sentences.