Online text classification for real life tweet analysis
dc.citation.epage | 1612 | en_US |
dc.citation.spage | 1609 | en_US |
dc.contributor.author | Yar, Ersin | en_US |
dc.contributor.author | Delibalta, İ. | en_US |
dc.contributor.author | Baruh, L. | en_US |
dc.contributor.author | Kozat, Süleyman Serdar | en_US |
dc.coverage.spatial | Zonguldak, Turkey | en_US |
dc.date.accessioned | 2018-04-12T11:48:24Z | |
dc.date.available | 2018-04-12T11:48:24Z | |
dc.date.issued | 2016 | en_US |
dc.department | Department of Electrical and Electronics Engineering | en_US |
dc.description | Date of Conference: 16-19 May 2016 | en_US |
dc.description | Conference Name: IEEE 24th Signal Processing and Communications Applications Conference, SIU 2016 | en_US |
dc.description.abstract | In this paper, we study multi-class classification of tweets, where we introduce highly efficient dimensionality reduction techniques suitable for online processing of high dimensional feature vectors generated from freely-worded text. As for the real life case study, we work on tweets in the Turkish language, however, our methods are generic and can be used for other languages as clearly explained in the paper. Since we work on a real life application and the tweets are freely worded, we introduce text correction, normalization and root finding algorithms. Although text processing and classification are highly important due to many applications such as emotion recognition, advertisement selection, etc., online classification and regression algorithms over text are limited due to need for high dimensional vectors to represent natural text inputs. We overcome such limitations by showing that randomized projections and piecewise linear models can be efficiently leveraged to significantly reduce the computational cost for feature vector extraction from the tweets. Hence, we can perform multi-class tweet classification and regression in real time. We demonstrate our results over tweets collected from a real life case study where the tweets are freely-worded, e.g., with emoticons, shortened words, special characters, etc., and are unstructured. We implement several well-known machine learning algorithms as well as novel regression methods and demonstrate that we can significantly reduce the computational complexity with insignificant change in the classification and regression performance. | en_US |
dc.description.provenance | Made available in DSpace on 2018-04-12T11:48:24Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 179475 bytes, checksum: ea0bedeb05ac9ccfb983c327e155f0c2 (MD5) Previous issue date: 2016 | en |
dc.identifier.doi | 10.1109/SIU.2016.7496063 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/37699 | |
dc.language.iso | Turkish | en_US |
dc.publisher | IEEE | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1109/SIU.2016.7496063 | en_US |
dc.source.title | Proceedings of the IEEE 24th Signal Processing and Communications Applications Conference, SIU 2016 | en_US |
dc.subject | Big data | en_US |
dc.subject | Computationally efficient | en_US |
dc.subject | Natural language processing | en_US |
dc.subject | Regression | en_US |
dc.subject | Text classification | en_US |
dc.subject | Tweet analysis | en_US |
dc.title | Online text classification for real life tweet analysis | en_US |
dc.title.alternative | Gerçek hayat tweet analizi için çevrimiçi metin sınıflandırması | en_US |
dc.type | Conference Paper | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Online text classification for real life tweet analysis [Gerçek Hayat Tweet Analizi için Çevrimiçi Metin Siniflandirmasi].pdf
- Size:
- 296.5 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version