TMD-NER: Turkish multi-domain named entity recognition for informal texts

buir.contributor.authorMutlu, Furkan Burak
buir.contributor.authorKozat, Süleyman Serdar
buir.contributor.orcidMutlu, Furkan Burak|0000-0002-0486-7731
buir.contributor.orcidKozat, Süleyman Serdar|0000-0002-6488-3848
dc.citation.epage2263
dc.citation.issueNumber3
dc.citation.spage2255
dc.citation.volumeNumber18
dc.contributor.authorYılmaz, Selim F.
dc.contributor.authorMutlu, Furkan Burak
dc.contributor.authorBalaban, Ismail
dc.contributor.authorKozat, Süleyman Serdar
dc.date.accessioned2025-02-24T13:15:12Z
dc.date.available2025-02-24T13:15:12Z
dc.date.issued2023-12-19
dc.departmentDepartment of Electrical and Electronics Engineering
dc.description.abstractWe examine named entity recognition (NER), an essential and commonly used first step in many natural language processing tasks, including chatbots and language translation. We focus on the application of NER to texts that have a lot of noise, such as tweets, which is difficult due to the casual and unstructured language often used in these mediums. In this study, we make use of the largest available labeled data sets for Turkish NER, specifically targeting three informal platforms, namely Twitter, Facebook and Donanimhaber. We choose Turkish as a representative agglutinative language, which has a significantly different structure than other well-known languages such as English, French, and German. We emphasize that the methodologies and insights gained from this study can be extended to other agglutinative languages, like Finnish, Hungarian, Japanese, and Korean. We apply NER to these datasets using 16 different named entity tags through a framework that employs bidirectional long short-term memory (BiLSTM) networks followed by conditional random fields (CRF), known together as the BiLSTM-CRF model. Our experiments show an F1 score of 84% on a combined dataset, which indicates that deep learning models can also be effectively used for business applications in informal settings in agglutinative languages such as Turkish.
dc.identifier.doi10.1007/s11760-023-02898-0
dc.identifier.issn1863-1703
dc.identifier.urihttps://hdl.handle.net/11693/116768
dc.language.isoEnglish
dc.publisherSpringer UK
dc.relation.isversionofhttps://dx.doi.org/10.1007/s11760-023-02898-0
dc.rightsCC BY 40 (Attribution 4.0 International)
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.source.titleSignal, Image and Video Processing
dc.subjectNamed entity recognition
dc.subjectTurkish language
dc.subjectBidirectional long short-term memory
dc.subjectConditional random fields
dc.titleTMD-NER: Turkish multi-domain named entity recognition for informal texts
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TMD-NER_Turkish_multi-domain_named_entity_recognition_for_informal_texts.pdf
Size:
460.41 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: