CoDet: Sentence-based containment detection in news corpora

dc.citation.epage2052en_US
dc.citation.spage2049en_US
dc.contributor.authorVarol, Emreen_US
dc.contributor.authorCan, Fazlıen_US
dc.contributor.authorAykanat, Cevdeten_US
dc.contributor.authorKaya, Oğuzen_US
dc.contributor.bilkentauthorAykanat, Cevdet
dc.coverage.spatialGlasgow, Scotlanden_US
dc.date.accessioned2016-02-08T12:15:23Z
dc.date.available2016-02-08T12:15:23Z
dc.date.issued2011en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionDate of Conference: October 24 - 28, 2011en_US
dc.description.abstractWe study a generalized version of the near-duplicate detection problem which concerns whether a document is a subset of another document. In text-based applications, document containment can be observed in exact-duplicates, near-duplicates, or containments, where the first two are special cases of the third. We introduce a novel method, called CoDet, which focuses particularly on this problem, and compare its performance with four well-known near-duplicate detection methods (DSC, full fingerprinting, I-Match, and SimHash) that are adapted to containment detection. Our method is expandable to different domains, and especially suitable for streaming news. Experimental results show that CoDet effectively and efficiently produces remarkable results in detecting containments. © 2011 ACM.en_US
dc.identifier.doi10.1145/2063576.2063887en_US
dc.identifier.urihttp://hdl.handle.net/11693/28250
dc.language.isoEnglishen_US
dc.publisherACMen_US
dc.relation.isversionofhttp://dx.doi.org/10.1145/2063576.2063887en_US
dc.source.titleCIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge managementen_US
dc.subjectCorpus treeen_US
dc.subjectDocument containmenten_US
dc.subjectDuplicate detectionen_US
dc.subjectSimilarityen_US
dc.subjectTest Collectionen_US
dc.subjectKnowledge managementen_US
dc.subjectSoftware agentsen_US
dc.titleCoDet: Sentence-based containment detection in news corporaen_US
dc.typeConference Paperen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CoDet Sentence-based containment detection in news corpora.pdf
Size:
1.53 MB
Format:
Adobe Portable Document Format
Description:
Full printable version