CoDet: Sentence-based containment detection in news corpora
buir.contributor.author | Aykanat, Cevdet | |
dc.citation.epage | 2052 | en_US |
dc.citation.spage | 2049 | en_US |
dc.contributor.author | Varol, Emre | en_US |
dc.contributor.author | Can, Fazlı | en_US |
dc.contributor.author | Aykanat, Cevdet | en_US |
dc.contributor.author | Kaya, Oğuz | en_US |
dc.coverage.spatial | Glasgow, Scotland | en_US |
dc.date.accessioned | 2016-02-08T12:15:23Z | |
dc.date.available | 2016-02-08T12:15:23Z | |
dc.date.issued | 2011 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description | Date of Conference: October 24 - 28, 2011 | en_US |
dc.description.abstract | We study a generalized version of the near-duplicate detection problem which concerns whether a document is a subset of another document. In text-based applications, document containment can be observed in exact-duplicates, near-duplicates, or containments, where the first two are special cases of the third. We introduce a novel method, called CoDet, which focuses particularly on this problem, and compare its performance with four well-known near-duplicate detection methods (DSC, full fingerprinting, I-Match, and SimHash) that are adapted to containment detection. Our method is expandable to different domains, and especially suitable for streaming news. Experimental results show that CoDet effectively and efficiently produces remarkable results in detecting containments. © 2011 ACM. | en_US |
dc.description.provenance | Made available in DSpace on 2016-02-08T12:15:23Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2011 | en |
dc.identifier.doi | 10.1145/2063576.2063887 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/28250 | en_US |
dc.language.iso | English | en_US |
dc.publisher | ACM | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1145/2063576.2063887 | en_US |
dc.source.title | CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management | en_US |
dc.subject | Corpus tree | en_US |
dc.subject | Document containment | en_US |
dc.subject | Duplicate detection | en_US |
dc.subject | Similarity | en_US |
dc.subject | Test Collection | en_US |
dc.subject | Knowledge management | en_US |
dc.subject | Software agents | en_US |
dc.title | CoDet: Sentence-based containment detection in news corpora | en_US |
dc.type | Conference Paper | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- CoDet Sentence-based containment detection in news corpora.pdf
- Size:
- 1.53 MB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version