Semantic change detection with gaussian word embeddings

Date

2021-10-20

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Print ISSN

2329-9290

Electronic ISSN

2329-9304

Publisher

IEEE

Volume

29

Issue

Pages

3349 - 3361

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

Diachronic study of the evolution of languages is of importance in natural language processing (NLP). Recent years have witnessed a surge of computational approaches for the detection and characterization of lexical semantic change (LSC) due to the availability of diachronic corpora and advancing word representation techniques. We propose a Gaussian word embedding (w2g)-based method and present a comprehensive study for the LSC detection. W2g is a probabilistic distribution-based word embedding model and represents words as Gaussian mixture models using covariance information along with the existing mean (word vector). We also extensively study several aspects of w2g-based LSC detection under the SemEval-2020 Task 1 evaluation framework as well as using Google N-gram corpus. In the Sub-task 1 (LSC binary classification) of the SemEval-2020 Task 1, we report the highest overall ranking as well as the highest ranks for the two (German and Swedish) of the four languages (English, Swedish, German and Latin). We also report the highest Spearman correlation in the Sub-task 2 (LSC ranking) for Swedish. Our overall rankings in the LSC classification and ranking sub-tasks are 1st and 7th , respectively. Qualitative analysis has also been presented.

Course

Other identifiers

Book Title

Citation