Semantic similarity between Turkish and European languages using word embeddings

Date
2017
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Proceedings of the IEEE 25th Signal Processing and Communications Applications Conference, SIU 2017
Print ISSN
Electronic ISSN
Publisher
IEEE
Volume
Issue
Pages
Language
Turkish
Type
Conference Paper
Journal Title
Journal ISSN
Volume Title
Series
Abstract

Representation of words coming from vocabulary of a language as real vectors in a high dimensional space is called as word embeddings. Word embeddings are proven to be successful in modelling semantic relations between words and numerous natural language processing applications. Although developed mainly for English, word embeddings perform well for many other languages. In this study, semantic similarity between Turkish (two different corpora) and five basic European languages (English, German, French, Spanish, Italian) is calculated using word embeddings over a fixed vocabulary, obtained results are verified using statistical testing. Also, the effect of using different corpora, and additional preprocess steps on the performance of word embeddings on similarity and analogy test sets prepared for Turkish is studied.

Course
Other identifiers
Book Title
Keywords
Natural language processing, Semantic similarity between languages, Word embeddings, Linguistics, Modeling languages, Semantics, European languages, High dimensional spaces, Semantic relations
Citation
Published Version (Please cite this version)