Semantic similarity between Turkish and European languages using word embeddings
Sjenel, L. K.
2017 25th Signal Processing and Communications Applications Conference, SIU 2017
Institute of Electrical and Electronics Engineers Inc.
Item Usage Stats
MetadataShow full item record
Representation of words coming from vocabulary of a language as real vectors in a high dimensional space is called as word embeddings. Word embeddings are proven to be successful in modelling semantic relations between words and numerous natural language processing applications. Although developed mainly for English, word embeddings perform well for many other languages. In this study, semantic similarity between Turkish (two different corpora) and five basic European languages (English, German, French, Spanish, Italian) is calculated using word embeddings over a fixed vocabulary, obtained results are verified using statistical testing. Also, the effect of using different corpora, and additional preprocess steps on the performance of word embeddings on similarity and analogy test sets prepared for Turkish is studied. © 2017 IEEE.
KeywordsNatural language processing
Semantic similarity between languages
High dimensional spaces
Natural language processing systems
Published Version (Please cite this version)http://dx.doi.org/10.1109/SIU.2017.7960365
Showing items related by title, author, creator and subject.
Temizsoy, Murat; Çiçekli, ilyas (Springer, 1998-10)The main problem with natural language analysis is the ambiguity found in various levels of linguistic information. Syntactic analysis with word senses is frequently not enough to resolve all ambiguities found in a sentence. ...
Şenel, L. K.; Yücesoy, V.; Koç, A.; Çukur, T. (Institute of Electrical and Electronics Engineers Inc., 2017)This paper studies cross-lingual semantic similarity (CLSS) between five European languages (i.e. English, French, German, Spanish and Italian) via unsupervised word embeddings from a cross-lingual lexicon. The vocabulary ...
Soulé R.; Hirzel M.; Gedik, B.; Grimm, R. (John Wiley & Sons Ltd., 2016)Summary This paper presents both a calculus for stream processing, named Brooklet, and its realization as an intermediate language, named River. Because River is based on Brooklet, it has a formal semantics that enables ...