Developing a text categorization template for Turkish news portals
dc.citation.epage | 383 | en_US |
dc.citation.spage | 379 | en_US |
dc.contributor.author | Toraman, Çağrı | en_US |
dc.contributor.author | Can, Fazlı | en_US |
dc.contributor.author | Koçberber, Seyit | en_US |
dc.coverage.spatial | Istanbul, Turkey | en_US |
dc.coverage.spatial | Istanbul, Turkey | en_US |
dc.date.accessioned | 2016-02-08T12:18:59Z | |
dc.date.available | 2016-02-08T12:18:59Z | |
dc.date.issued | 2011 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description | Date of Conference: 15-18 June 2011 | en_US |
dc.description.abstract | In news portals, text category information is needed for news presentation. However, for many news stories the category information is unavailable, incorrectly assigned or too generic. This makes the text categorization a necessary tool for news portals. Automated text categorization (ATC) is a multifaceted difficult process that involves decisions regarding tuning of several parameters, term weighting, word stemming, word stopping, and feature selection. In this study we aim to find a categorization setup that will provide highly accurate results in ATC for Turkish news portals. We also examine some other aspects such as the effects of training dataset set size and robustness issues. Two Turkish test collections with different characteristics are created using Bilkent News Portal. Experiments are conducted with four classification methods: C4.5, KNN, Naive Bayes, and SVM (using polynomial and rbf kernels). Our results recommends a text categorization template for Turkish news portals and provides some future research pointers. © 2011 IEEE. | en_US |
dc.description.provenance | Made available in DSpace on 2016-02-08T12:18:59Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2011 | en |
dc.identifier.doi | 10.1109/INISTA.2011.5946096 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/28376 | en_US |
dc.language.iso | English | en_US |
dc.publisher | IEEE | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1109/INISTA.2011.5946096 | en_US |
dc.source.title | 2011 International Symposium on Innovations in Intelligent Systems and Applications | en_US |
dc.subject | Turkish news | en_US |
dc.subject | Automated text categorization | en_US |
dc.subject | Classification methods | en_US |
dc.subject | Naive Bayes | en_US |
dc.subject | news portals | en_US |
dc.subject | RBF kernels | en_US |
dc.subject | Robustness issues | en_US |
dc.subject | Term weighting | en_US |
dc.subject | Test Collection | en_US |
dc.subject | Text categorization | en_US |
dc.subject | Training dataset | en_US |
dc.subject | Turkishs | en_US |
dc.subject | Word-stemming | en_US |
dc.subject | Feature extraction | en_US |
dc.subject | Intelligent systems | en_US |
dc.subject | Text processing | en_US |
dc.title | Developing a text categorization template for Turkish news portals | en_US |
dc.type | Conference Paper | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Developing a text categorization template for Turkish news portals.pdf
- Size:
- 1.26 MB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version