Categorization in a hierarchically structured text database

Kutlu, Ferhat

Categorization in a hierarchically structured text database

buir.supervisor	Güvenir, H. Altay
dc.contributor.author	Kutlu, Ferhat
dc.date.accessioned	2016-01-08T18:06:48Z
dc.date.available	2016-01-08T18:06:48Z
dc.date.issued	2001
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical refences (leaves 63-66).	en_US
dc.description.abstract	Over the past two decades there has been a huge increase in the amount of data being stored in databases and the on-line ﬂow of data by the effects of improvements in Internet. This huge increase brought out the needs for intelligent tools to manage that size of data and its ﬂow. Hierarchical approach is the best way to satisfy these needs and it is so widespread among people dealing with databases and Internet. Usenet newsgroups system is one of the on-line databases that have built-in hierarchical structures. Our point of departure is this hierarchical structure which makes categorization tasks easier and faster. In fact most of the search engines in Internet also exploit inherent hierarchy of Internet. Growing size of data makes most of the traditional categorization algorithms obsolete. Thus we developed a brand-new categorization learning algorithm which constructs an index tree out of Usenet news database and then decides the related newsgroups of a new news by categorizing it over the index tree. In learning phase it has an agglomerative and bottom-up hierarchical approach. In categorization phase it does an overlapping and supervised categorization. k Nearest Neighbor categorization algorithm is used to compare the complexity measure and accuracy of our algorithm. This comparison does not only mean comparing two different algorithms but also means comparing hierarchical approach vs. ﬂat approach, similarity measure vs. distance measure and importance of accuracy vs. importance of speed. Our algorithm prefers hierarchical approach and similarity measure, and greatly outperforms k Nearest Neighbor categorization algorithm in speed with minimal loss of accuracy.
dc.description.statementofresponsibility	by Ferhat Kutlu	en_US
dc.format.extent	66 leaves ; 30 cm.	en_US
dc.identifier.itemid	B056068
dc.identifier.uri	http://hdl.handle.net/11693/14741
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Learning
dc.subject	Categorization
dc.subject	Clustering
dc.subject	Usenet
dc.subject	Newsgroup
dc.subject	Top-level
dc.subject	Header-line
dc.subject	Posting
dc.subject	Frequency
dc.subject	Norm-scaling
dc.subject	Similarity measure
dc.subject	Distance measure
dc.subject	Agglomerative
dc.subject	Bottom-up
dc.subject	Stemming
dc.subject	Stopword
dc.subject	Index
dc.title	Categorization in a hierarchically structured text database	en_US
dc.title.alternative	Hiyerarşik yapıda olan bir veritabanının kategorizasyonu
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0001610.pdf
Size:: 844.6 KB
Format:: Adobe Portable Document Format

Download

Collections

Graduate School of Engineering and Science