On-line new event detection and clustering using the concepts of the cover coefficient-based clustering methodology
Author(s)
Advisor
Can, FazlıDate
2002Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
133
views
views
32
downloads
downloads
Abstract
In this study, we use the concepts of the cover coefficient-based clustering
methodology (C3
M) for on-line new event detection and event clustering. The
main idea of the study is to use the seed selection process of the C3
M algorithm
for the purpose of detecting new events. Since C3
M works in a retrospective
manner, we modify the algorithm to work in an on-line environment.
Furthermore, in order to prevent producing oversized event clusters, and to give
equal chance to all documents to be the seed of a new event, we employ the
window size concept. Since we desire to control the number of seed documents,
we introduce a threshold concept to the event clustering algorithm. We also use
the threshold concept, with a little modification, in the on-line event detection. In
the experiments we use TDT1 corpus, which is also used in the original topic
detection and tracking study. In event clustering and event detection, we use both
binary and weighted versions of TDT1 corpus. With the binary implementation,
we obtain better results. When we compare our on-line event detection results to
the results of UMASS approach, we obtain better performance in terms of false
alarm rates.