A cluster-based external plagiarism and parallel corpora detection method

Karbeyaz, Ceyhun Efe

A cluster-based external plagiarism and parallel corpora detection method

Files

0006004.pdf (400.64 KB)

Date

2011

Authors

Karbeyaz, Ceyhun Efe

Advisor

Can, Fazlı

BUIR Usage Stats

2
views

22
downloads

Abstract

Today different editions and translations of the same literary text can be found. Intuitively such translations that are based on the same literary text are expected to possess significantly similar structure. In the same way, it is possible that a text that is suspected to have plagiarism can possess structural similarities with the text that is believed to be the source of the plagiarism. Textual plagiarism implies the usage of an author’s text, his/her work or the idea that is inserted in another textual work without giving a reference or without taking the permission of the original text’s author. Today, existing intrinsic and external plagiarism detection methods tend to detect plagiarism cases within a given dataset in order to run these algorithms in a reasonable amount of time. Hence a reference document set is built in order to search for plagiarism cases successfully by these algorithms. In this thesis, a method for detecting and quantifying the external plagiarism and parallel corpora is introduced. For this purpose, we use the structural similarities in order to analyze plagiarism detection problem and to quantify the similarity between given texts. In this method, suspicious and source texts are partitioned into corresponding blocks. Each block is represented as a group of documents where a document consists of a fixed amount of words. Then, blocks are indexed and clustered by using the cover coefficient clustering algorithm. Cluster formations for both texts are then analyzed and their similarities are measured. The results over PAN’09 plagiarism dataset and over different versions of the famous literary text classic Leylˆa and Mecnun show that the proposed method successfully detects and quantifies the structurally similar plagiarism cases and succeeds in detecting the parallel corpora.

Keywords

Plagiarism detection, Parallel corpora detection, Clustering

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

http://hdl.handle.net/11693/15249

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

A cluster-based external plagiarism and parallel corpora detection method

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

A cluster-based external plagiarism and parallel corpora detection method

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type