Semantic structure and interpretability of word embeddings
buir.contributor.author | Şenel, Lütfi Kerem | |
buir.contributor.author | Utlu, İhsan | |
buir.contributor.author | Yücesoy, Veysel | |
buir.contributor.author | Koç, Aykut | |
buir.contributor.author | Çukur, Tolga | |
dc.citation.epage | 1779 | en_US |
dc.citation.issueNumber | 10 | en_US |
dc.citation.spage | 1769 | en_US |
dc.citation.volumeNumber | 26 | en_US |
dc.contributor.author | Şenel, Lütfi Kerem | en_US |
dc.contributor.author | Utlu, İhsan | en_US |
dc.contributor.author | Yücesoy, Veysel | en_US |
dc.contributor.author | Koç, Aykut | en_US |
dc.contributor.author | Çukur, Tolga | en_US |
dc.date.accessioned | 2019-02-21T16:05:19Z | |
dc.date.available | 2019-02-21T16:05:19Z | |
dc.date.issued | 2018 | en_US |
dc.department | Department of Electrical and Electronics Engineering | en_US |
dc.department | National Magnetic Resonance Research Center (UMRAM) | en_US |
dc.department | Interdisciplinary Program in Neuroscience (NEUROSCIENCE) | en_US |
dc.department | Aysel Sabuncu Brain Research Center (BAM) | en_US |
dc.description.abstract | Dense word embeddings, which encode meanings of words to low-dimensional vector spaces, have become very popular in natural language processing (NLP) research due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensions making interpretation of dimensions a big challenge. In this study, we propose a statistical method to uncover the underlying latent semantic structure in the dense word embeddings. To perform our analysis, we introduce a new dataset (SEMCAT) that contains more than 6500 words semantically grouped under 110 categories. We further propose a method to quantify the interpretability of the word embeddings. The proposed method is a practical alternative to the classical word intrusion test that requires human intervention. | |
dc.description.provenance | Made available in DSpace on 2019-02-21T16:05:19Z (GMT). No. of bitstreams: 1 Bilkent-research-paper.pdf: 222869 bytes, checksum: 842af2b9bd649e7f548593affdbafbb3 (MD5) Previous issue date: 2018 | en |
dc.description.sponsorship | Manuscript received November 22, 2017; revised April 12, 2018; accepted May 10, 2018. Date of publication May 24, 2018; date of current version June 21, 2018. This work was supported in part by the European Molecular Biology Organization Installation under Grant IG 3028, in part by the TUBA GEBIP fellowship, and in part by the BAGEP 2017 award of the Science Academy. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Imed Zitouni. T. C¸ ukur and A. Koc¸ mutually supervised this work under a joint industry-university coadvising program. (Corresponding author: Lütfi Kerem S¸enel.) L. K. S¸enel is with the ASELSAN Research Center, Ankara 06370, Turkey, with the Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey, and also with the UMRAM, Bilkent University, Ankara 06800, Turkey (e-mail:,lksenel@aselsan.com.tr). ˙. Utlu and V. Yücesoy are with the ASELSAN Research Center, Ankara 06370, Turkey, and also with the Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey (e-mail:,utlu@ee.bilkent.edu.tr; vyucesoy@ aselsan.com.tr). | |
dc.identifier.doi | 10.1109/TASLP.2018.2837384 | |
dc.identifier.issn | 2329-9290 | |
dc.identifier.uri | http://hdl.handle.net/11693/50245 | |
dc.language.iso | English | |
dc.publisher | Institute of Electrical and Electronics Engineers | |
dc.relation.isversionof | https://doi.org/10.1109/TASLP.2018.2837384 | |
dc.relation.project | European Molecular Biology Organization, EMBO: IG 3028 - Bilkent Üniversitesi - Bilim Akademisi | |
dc.source.title | IEEE/ACM Transactions on Audio Speech and Language Processing | en_US |
dc.subject | Interpretability | en_US |
dc.subject | Semantic structure | en_US |
dc.subject | Word embeddings | en_US |
dc.title | Semantic structure and interpretability of word embeddings | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Semantic_structure_and_interpretability_of_word_embeddings.pdf
- Size:
- 1.24 MB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version