Learning interpretable word embeddings via bidirectional alignment of dimensions with semantic concepts

Şenel, L. K.Şahinuç, FurkanYücesoy, V.Schütze, H.Çukur, TolgaKoç, Aykut2023-02-172023-02-172022-03-220306-4573http://hdl.handle.net/11693/111493We propose bidirectional imparting or BiImp, a generalized method for aligning embedding dimensions with concepts during the embedding learning phase. While preserving the semantic structure of the embedding space, BiImp makes dimensions interpretable, which has a critical role in deciphering the black-box behavior of word embeddings. BiImp separately utilizes both directions of a vector space dimension: each direction can be assigned to a different concept. This increases the number of concepts that can be represented in the embedding space. Our experimental results demonstrate the interpretability of BiImp embeddings without making compromises on the semantic task performance. We also use BiImp to reduce gender bias in word embeddings by encoding gender-opposite concepts (e.g., male–female) in a single embedding dimension. These results highlight the potential of BiImp in reducing biases and stereotypes present in word embeddings. Furthermore, task or domain-specific interpretable word embeddings can be obtained by adjusting the corresponding word groups in embedding dimensions according to task or domain. As a result, BiImp offers wide liberty in studying word embeddings without any further effort.EnglishWord embeddingsInterpretabilityWord semanticsLearning interpretable word embeddings via bidirectional alignment of dimensions with semantic conceptsArticle10.1016/j.ipm.2022.1029251873-5371