Developing a text categorization template for Turkish news portals

dc.citation.epage383en_US
dc.citation.spage379en_US
dc.contributor.authorToraman, Çağrıen_US
dc.contributor.authorCan, Fazlıen_US
dc.contributor.authorKoçberber, Seyiten_US
dc.coverage.spatialIstanbul, Turkeyen_US
dc.coverage.spatialIstanbul, Turkeyen_US
dc.date.accessioned2016-02-08T12:18:59Z
dc.date.available2016-02-08T12:18:59Z
dc.date.issued2011en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionDate of Conference: 15-18 June 2011en_US
dc.description.abstractIn news portals, text category information is needed for news presentation. However, for many news stories the category information is unavailable, incorrectly assigned or too generic. This makes the text categorization a necessary tool for news portals. Automated text categorization (ATC) is a multifaceted difficult process that involves decisions regarding tuning of several parameters, term weighting, word stemming, word stopping, and feature selection. In this study we aim to find a categorization setup that will provide highly accurate results in ATC for Turkish news portals. We also examine some other aspects such as the effects of training dataset set size and robustness issues. Two Turkish test collections with different characteristics are created using Bilkent News Portal. Experiments are conducted with four classification methods: C4.5, KNN, Naive Bayes, and SVM (using polynomial and rbf kernels). Our results recommends a text categorization template for Turkish news portals and provides some future research pointers. © 2011 IEEE.en_US
dc.description.provenanceMade available in DSpace on 2016-02-08T12:18:59Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2011en
dc.identifier.doi10.1109/INISTA.2011.5946096en_US
dc.identifier.urihttp://hdl.handle.net/11693/28376en_US
dc.language.isoEnglishen_US
dc.publisherIEEEen_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/INISTA.2011.5946096en_US
dc.source.title2011 International Symposium on Innovations in Intelligent Systems and Applicationsen_US
dc.subjectTurkish newsen_US
dc.subjectAutomated text categorizationen_US
dc.subjectClassification methodsen_US
dc.subjectNaive Bayesen_US
dc.subjectnews portalsen_US
dc.subjectRBF kernelsen_US
dc.subjectRobustness issuesen_US
dc.subjectTerm weightingen_US
dc.subjectTest Collectionen_US
dc.subjectText categorizationen_US
dc.subjectTraining dataseten_US
dc.subjectTurkishsen_US
dc.subjectWord-stemmingen_US
dc.subjectFeature extractionen_US
dc.subjectIntelligent systemsen_US
dc.subjectText processingen_US
dc.titleDeveloping a text categorization template for Turkish news portalsen_US
dc.typeConference Paperen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Developing a text categorization template for Turkish news portals.pdf
Size:
1.26 MB
Format:
Adobe Portable Document Format
Description:
Full printable version