A cluster-based external plagiarism and parallel corpora detection method

Karbeyaz, Ceyhun Efe

A cluster-based external plagiarism and parallel corpora detection method

buir.advisor	Can, Fazlı
dc.contributor.author	Karbeyaz, Ceyhun Efe
dc.date.accessioned	2016-01-08T18:15:36Z
dc.date.available	2016-01-08T18:15:36Z
dc.date.issued	2011
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references leaves 60-64.	en_US
dc.description.abstract	Today different editions and translations of the same literary text can be found. Intuitively such translations that are based on the same literary text are expected to possess significantly similar structure. In the same way, it is possible that a text that is suspected to have plagiarism can possess structural similarities with the text that is believed to be the source of the plagiarism. Textual plagiarism implies the usage of an author’s text, his/her work or the idea that is inserted in another textual work without giving a reference or without taking the permission of the original text’s author. Today, existing intrinsic and external plagiarism detection methods tend to detect plagiarism cases within a given dataset in order to run these algorithms in a reasonable amount of time. Hence a reference document set is built in order to search for plagiarism cases successfully by these algorithms. In this thesis, a method for detecting and quantifying the external plagiarism and parallel corpora is introduced. For this purpose, we use the structural similarities in order to analyze plagiarism detection problem and to quantify the similarity between given texts. In this method, suspicious and source texts are partitioned into corresponding blocks. Each block is represented as a group of documents where a document consists of a fixed amount of words. Then, blocks are indexed and clustered by using the cover coefficient clustering algorithm. Cluster formations for both texts are then analyzed and their similarities are measured. The results over PAN’09 plagiarism dataset and over different versions of the famous literary text classic Leylˆa and Mecnun show that the proposed method successfully detects and quantifies the structurally similar plagiarism cases and succeeds in detecting the parallel corpora.	en_US
dc.description.statementofresponsibility	Karbeyaz, Ceyhun Efe	en_US
dc.format.extent	xiv, 78 leaves, illustrations, graphs	en_US
dc.identifier.itemid	B123603
dc.identifier.uri	http://hdl.handle.net/11693/15249
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Plagiarism detection	en_US
dc.subject	Parallel corpora detection	en_US
dc.subject	Clustering	en_US
dc.subject.lcc	PN167 .K37 2011	en_US
dc.subject.lcsh	Plagiarism--Prevention.	en_US
dc.subject.lcsh	Plagiarism--Computer programs.	en_US
dc.title	A cluster-based external plagiarism and parallel corpora detection method	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0006004.pdf
Size:: 400.64 KB
Format:: Adobe Portable Document Format

Download

Collections

Graduate School of Engineering and Science