Browsing by Subject "Similarity"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Open Access CoDet : a new algorithm for containment and near duplicate detection in text corpora(2012) Varol, EmreIn this thesis, we investigate containment detection, which is a generalized version of the well known near-duplicate detection problem concerning whether a document is a subset of another document. In text-based applications, there are three way of observing document containment: exact-duplicates, near-duplicates, or containments, where first two are the special cases of containment. To detect containments, we introduce CoDet, which is a novel algorithm that focuses particularly on containment problem. We also construct a test collection using a novel pooling technique, which enables us to make reliable judgments for the relative effectiveness of algorithms using limited human assessments. We compare its performance with four well-known near duplicate detection methods (DSC, full fingerprinting, I-Match, and SimHash) that are adapted to containment detection. Our algorithm is especially suitable for streaming news. It is also expandable to different domains. Experimental results show that CoDet mostly outperforms the other algorithms and produces remarkable results in detection of containments in text corpora.Item Open Access CoDet: Sentence-based containment detection in news corpora(ACM, 2011) Varol, Emre; Can, Fazlı; Aykanat, Cevdet; Kaya, OğuzWe study a generalized version of the near-duplicate detection problem which concerns whether a document is a subset of another document. In text-based applications, document containment can be observed in exact-duplicates, near-duplicates, or containments, where the first two are special cases of the third. We introduce a novel method, called CoDet, which focuses particularly on this problem, and compare its performance with four well-known near-duplicate detection methods (DSC, full fingerprinting, I-Match, and SimHash) that are adapted to containment detection. Our method is expandable to different domains, and especially suitable for streaming news. Experimental results show that CoDet effectively and efficiently produces remarkable results in detecting containments. © 2011 ACM.Item Open Access A note on Radon-Nikodym derivatives and similarity for completely bounded maps(Wydawnictwo A G H, 2009) Gheondea, A.; Kavruk, A. Ş.We point out a relation between the Arveson’s Radon-Nikodým derivative and known similarity results for completely bounded maps. We also consider Jordan type decompositions coming out from Wittstock’s Decomposition Theorem and illustrate, by an example, the nonuniqueness of these decompositions.