Zipfian regularities in “non-point” word representations

Şahinuç, Furkan; Koç, Aykut

Zipfian regularities in “non-point” word representations

buir.contributor.author	Şahinuç, Furkan
dc.citation.epage	102493-18	en_US
dc.citation.issueNumber	3	en_US
dc.citation.spage	102493-1	en_US
dc.citation.volumeNumber	58	en_US
dc.contributor.author	Şahinuç, Furkan
dc.contributor.author	Koç, Aykut
dc.date.accessioned	2022-02-15T06:36:39Z
dc.date.available	2022-02-15T06:36:39Z
dc.date.issued	2021-05
dc.department	Department of Electrical and Electronics Engineering	en_US
dc.description.abstract	Being one of the most common empirical regularities, the Zipf’s law for word frequencies is a power law relation between word frequencies and frequency ranks of words. We quantitatively study semantic uncertainty of words through non-point distribution-based word embeddings and reveal the Zipfian regularities. Uncertainty of a word can increase due to polysemy, the word having “broad” meaning (such as the relation between broader emotion and narrower exasperation) or a combination of both. Variances of Gaussian embeddings are utilized to quantify the extent a word can be used in different senses or contexts. By using the variance information embedded in the non-point Gaussian embeddings, we quantitatively show that semantic breadth of words also exhibits Zipfian patterns, when polysemy is controlled. This outcome is complementary to Zipf’s law of meaning distribution and the related meaning-frequency law by indicating the existence of Zipfian patterns: more frequent words tend to be generic while less frequent ones tend to be specific. Results for two languages, English and Turkish that belong to different language families, are also provided. Such regularities provide valuable information to extract and understand relationships between semantic properties of words and word frequencies. In various applications, performance improvements can be obtained by employing these regularities. We also propose a method that leverages the Zipfian regularity to improve the performance of baseline textual entailment detection algorithms. To the best of our knowledge, our approach is the first quantitative study that uses Gaussian embeddings to examine the relationships between word frequencies and semantic breadth.	en_US
dc.embargo.release	2023-05-31
dc.identifier.doi	10.1016/j.ipm.2021.102493	en_US
dc.identifier.issn	0306-4573
dc.identifier.uri	http://hdl.handle.net/11693/77347
dc.language.iso	English	en_US
dc.publisher	Elsevier Ltd	en_US
dc.relation.isversionof	https://doi.org/10.1016/j.ipm.2021.102493	en_US
dc.source.title	Information_Processing_&_Management	en_US
dc.subject	Word variances	en_US
dc.subject	Word frequencies	en_US
dc.subject	Zipf’s law	en_US
dc.subject	Meaning-frequency relation	en_US
dc.subject	Zipfian regularities	en_US
dc.subject	Word entailment	en_US
dc.subject	Semantic breadth	en_US
dc.title	Zipfian regularities in “non-point” word representations	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zipfian_regularities_in_“non-point”_word_representations.pdf
Size:: 1.1 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.69 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Electrical and Electronics Engineering
Scholarly Publications - UMRAM