Semantic structure and interpretability of word embeddings

buir.contributor.authorŞenel, Lütfi Kerem
buir.contributor.authorUtlu, İhsan
buir.contributor.authorYücesoy, Veysel
buir.contributor.authorKoç, Aykut
buir.contributor.authorÇukur, Tolga
dc.citation.epage1779en_US
dc.citation.issueNumber10en_US
dc.citation.spage1769en_US
dc.citation.volumeNumber26en_US
dc.contributor.authorŞenel, Lütfi Keremen_US
dc.contributor.authorUtlu, İhsanen_US
dc.contributor.authorYücesoy, Veyselen_US
dc.contributor.authorKoç, Aykuten_US
dc.contributor.authorÇukur, Tolgaen_US
dc.date.accessioned2019-02-21T16:05:19Z
dc.date.available2019-02-21T16:05:19Z
dc.date.issued2018en_US
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.departmentNational Magnetic Resonance Research Center (UMRAM)en_US
dc.departmentInterdisciplinary Program in Neuroscience (NEUROSCIENCE)en_US
dc.departmentAysel Sabuncu Brain Research Center (BAM)en_US
dc.description.abstractDense word embeddings, which encode meanings of words to low-dimensional vector spaces, have become very popular in natural language processing (NLP) research due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensions making interpretation of dimensions a big challenge. In this study, we propose a statistical method to uncover the underlying latent semantic structure in the dense word embeddings. To perform our analysis, we introduce a new dataset (SEMCAT) that contains more than 6500 words semantically grouped under 110 categories. We further propose a method to quantify the interpretability of the word embeddings. The proposed method is a practical alternative to the classical word intrusion test that requires human intervention.
dc.description.provenanceMade available in DSpace on 2019-02-21T16:05:19Z (GMT). No. of bitstreams: 1 Bilkent-research-paper.pdf: 222869 bytes, checksum: 842af2b9bd649e7f548593affdbafbb3 (MD5) Previous issue date: 2018en
dc.description.sponsorshipManuscript received November 22, 2017; revised April 12, 2018; accepted May 10, 2018. Date of publication May 24, 2018; date of current version June 21, 2018. This work was supported in part by the European Molecular Biology Organization Installation under Grant IG 3028, in part by the TUBA GEBIP fellowship, and in part by the BAGEP 2017 award of the Science Academy. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Imed Zitouni. T. C¸ ukur and A. Koc¸ mutually supervised this work under a joint industry-university coadvising program. (Corresponding author: Lütfi Kerem S¸enel.) L. K. S¸enel is with the ASELSAN Research Center, Ankara 06370, Turkey, with the Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey, and also with the UMRAM, Bilkent University, Ankara 06800, Turkey (e-mail:,lksenel@aselsan.com.tr). ˙. Utlu and V. Yücesoy are with the ASELSAN Research Center, Ankara 06370, Turkey, and also with the Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey (e-mail:,utlu@ee.bilkent.edu.tr; vyucesoy@ aselsan.com.tr).
dc.identifier.doi10.1109/TASLP.2018.2837384
dc.identifier.issn2329-9290
dc.identifier.urihttp://hdl.handle.net/11693/50245
dc.language.isoEnglish
dc.publisherInstitute of Electrical and Electronics Engineers
dc.relation.isversionofhttps://doi.org/10.1109/TASLP.2018.2837384
dc.relation.projectEuropean Molecular Biology Organization, EMBO: IG 3028 - Bilkent Üniversitesi - Bilim Akademisi
dc.source.titleIEEE/ACM Transactions on Audio Speech and Language Processingen_US
dc.subjectInterpretabilityen_US
dc.subjectSemantic structureen_US
dc.subjectWord embeddingsen_US
dc.titleSemantic structure and interpretability of word embeddingsen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Semantic_structure_and_interpretability_of_word_embeddings.pdf
Size:
1.24 MB
Format:
Adobe Portable Document Format
Description:
Full printable version