Categorization in a hierarchically structured text database
| buir.supervisor | Güvenir, H. Altay | |
| dc.contributor.author | Kutlu, Ferhat | |
| dc.date.accessioned | 2016-01-08T18:06:48Z | |
| dc.date.available | 2016-01-08T18:06:48Z | |
| dc.date.issued | 2001 | |
| dc.description | Cataloged from PDF version of article. | en_US | 
| dc.description | Includes bibliographical refences (leaves 63-66). | en_US | 
| dc.description.abstract | Over the past two decades there has been a huge increase in the amount of data being stored in databases and the on-line flow of data by the effects of improvements in Internet. This huge increase brought out the needs for intelligent tools to manage that size of data and its flow. Hierarchical approach is the best way to satisfy these needs and it is so widespread among people dealing with databases and Internet. Usenet newsgroups system is one of the on-line databases that have built-in hierarchical structures. Our point of departure is this hierarchical structure which makes categorization tasks easier and faster. In fact most of the search engines in Internet also exploit inherent hierarchy of Internet. Growing size of data makes most of the traditional categorization algorithms obsolete. Thus we developed a brand-new categorization learning algorithm which constructs an index tree out of Usenet news database and then decides the related newsgroups of a new news by categorizing it over the index tree. In learning phase it has an agglomerative and bottom-up hierarchical approach. In categorization phase it does an overlapping and supervised categorization. k Nearest Neighbor categorization algorithm is used to compare the complexity measure and accuracy of our algorithm. This comparison does not only mean comparing two different algorithms but also means comparing hierarchical approach vs. flat approach, similarity measure vs. distance measure and importance of accuracy vs. importance of speed. Our algorithm prefers hierarchical approach and similarity measure, and greatly outperforms k Nearest Neighbor categorization algorithm in speed with minimal loss of accuracy. | |
| dc.description.statementofresponsibility | by Ferhat Kutlu | en_US | 
| dc.format.extent | 66 leaves ; 30 cm. | en_US | 
| dc.identifier.itemid | B056068 | |
| dc.identifier.uri | http://hdl.handle.net/11693/14741 | |
| dc.language.iso | English | en_US | 
| dc.rights | info:eu-repo/semantics/openAccess | en_US | 
| dc.subject | Learning | |
| dc.subject | Categorization | |
| dc.subject | Clustering | |
| dc.subject | Usenet | |
| dc.subject | Newsgroup | |
| dc.subject | Top-level | |
| dc.subject | Header-line | |
| dc.subject | Posting | |
| dc.subject | Frequency | |
| dc.subject | Norm-scaling | |
| dc.subject | Similarity measure | |
| dc.subject | Distance measure | |
| dc.subject | Agglomerative | |
| dc.subject | Bottom-up | |
| dc.subject | Stemming | |
| dc.subject | Stopword | |
| dc.subject | Index | |
| dc.title | Categorization in a hierarchically structured text database | en_US | 
| dc.title.alternative | Hiyerarşik yapıda olan bir veritabanının kategorizasyonu | |
| dc.type | Thesis | en_US | 
| thesis.degree.discipline | Computer Engineering | |
| thesis.degree.grantor | Bilkent University | |
| thesis.degree.level | Master's | |
| thesis.degree.name | MS (Master of Science) | 
Files
Original bundle
1 - 1 of 1