Browsing by Subject "Word embedding"

Now showing 1 - 2 of 2

Open Access
Generating semantic similarity atlas for natural languages
(IEEE, 2018-12) Şenel, Lütfi Kerem; Utlu, İhsan; Yücesoy, V.; Koç, A.; Çukur, Tolga
Cross-lingual studies attract a growing interest in natural language processing (NLP) research, and several studies showed that similar languages are more advantageous to work with than fundamentally different languages in transferring knowledge. Different similarity measures for the languages are proposed by researchers from different domains. However, a similarity measure focusing on semantic structures of languages can be useful for selecting pairs or groups of languages to work with, especially for the tasks requiring semantic knowledge such as sentiment analysis or word sense disambiguation. For this purpose, in this work, we leverage a recently proposed word embedding based method to generate a language similarity atlas for 76 different languages around the world. This atlas can help researchers select similar language pairs or groups in cross-lingual applications. Our findings suggest that semantic similarity between two languages is strongly correlated with the geographic proximity of the countries in which they are used.
Open Access
Measuring cross-lingual semantic similarity across European languages
(IEEE, 2017) Şenel, Lütfü Kerem; Yücesoy, V.; Koç, A.; Çukur, Tolga
This paper studies cross-lingual semantic similarity (CLSS) between five European languages (i.e. English, French, German, Spanish and Italian) via unsupervised word embeddings from a cross-lingual lexicon. The vocabulary in each language is projected onto a separate high-dimensional vector space, and these vector spaces are then compared using several different distance measures (i.e., correlation, cosine etc.) to measure their pairwise semantic similarities between these languages. A substantial degree of similarity is observed between the vector spaces learned from corpora of the European languages. Null hypothesis testing and bootstrap methods (by resampling without replacement) are utilized to verify the results.