A cluster-based external plagiarism and parallel corpora detection method

buir.advisorCan, Fazlı
dc.contributor.authorKarbeyaz, Ceyhun Efe
dc.date.accessioned2016-01-08T18:15:36Z
dc.date.available2016-01-08T18:15:36Z
dc.date.issued2011
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionAnkara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent Univ., 2011.en_US
dc.descriptionThesis (Master's) -- Bilkent University, 2011.en_US
dc.descriptionIncludes bibliographical references leaves 60-64.en_US
dc.description.abstractToday different editions and translations of the same literary text can be found. Intuitively such translations that are based on the same literary text are expected to possess significantly similar structure. In the same way, it is possible that a text that is suspected to have plagiarism can possess structural similarities with the text that is believed to be the source of the plagiarism. Textual plagiarism implies the usage of an author’s text, his/her work or the idea that is inserted in another textual work without giving a reference or without taking the permission of the original text’s author. Today, existing intrinsic and external plagiarism detection methods tend to detect plagiarism cases within a given dataset in order to run these algorithms in a reasonable amount of time. Hence a reference document set is built in order to search for plagiarism cases successfully by these algorithms. In this thesis, a method for detecting and quantifying the external plagiarism and parallel corpora is introduced. For this purpose, we use the structural similarities in order to analyze plagiarism detection problem and to quantify the similarity between given texts. In this method, suspicious and source texts are partitioned into corresponding blocks. Each block is represented as a group of documents where a document consists of a fixed amount of words. Then, blocks are indexed and clustered by using the cover coefficient clustering algorithm. Cluster formations for both texts are then analyzed and their similarities are measured. The results over PAN’09 plagiarism dataset and over different versions of the famous literary text classic Leylˆa and Mecnun show that the proposed method successfully detects and quantifies the structurally similar plagiarism cases and succeeds in detecting the parallel corpora.en_US
dc.description.degreeM.S.en_US
dc.description.statementofresponsibilityKarbeyaz, Ceyhun Efeen_US
dc.format.extentxiv, 78 leaves, illustrations, graphsen_US
dc.identifier.itemidB123603
dc.identifier.urihttp://hdl.handle.net/11693/15249
dc.language.isoEnglishen_US
dc.publisherBilkent Universityen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectPlagiarism detectionen_US
dc.subjectParallel corpora detectionen_US
dc.subjectClusteringen_US
dc.subject.lccPN167 .K37 2011en_US
dc.subject.lcshPlagiarism--Prevention.en_US
dc.subject.lcshPlagiarism--Computer programs.en_US
dc.titleA cluster-based external plagiarism and parallel corpora detection methoden_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0006004.pdf
Size:
400.64 KB
Format:
Adobe Portable Document Format