Zipfian regularities in “non-point” word representations

buir.contributor.authorŞahinuç, Furkan
dc.citation.epage102493-18en_US
dc.citation.issueNumber3en_US
dc.citation.spage102493-1en_US
dc.citation.volumeNumber58en_US
dc.contributor.authorŞahinuç, Furkan
dc.contributor.authorKoç, Aykut
dc.date.accessioned2022-02-15T06:36:39Z
dc.date.available2022-02-15T06:36:39Z
dc.date.issued2021-05
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.description.abstractBeing one of the most common empirical regularities, the Zipf’s law for word frequencies is a power law relation between word frequencies and frequency ranks of words. We quantitatively study semantic uncertainty of words through non-point distribution-based word embeddings and reveal the Zipfian regularities. Uncertainty of a word can increase due to polysemy, the word having “broad” meaning (such as the relation between broader emotion and narrower exasperation) or a combination of both. Variances of Gaussian embeddings are utilized to quantify the extent a word can be used in different senses or contexts. By using the variance information embedded in the non-point Gaussian embeddings, we quantitatively show that semantic breadth of words also exhibits Zipfian patterns, when polysemy is controlled. This outcome is complementary to Zipf’s law of meaning distribution and the related meaning-frequency law by indicating the existence of Zipfian patterns: more frequent words tend to be generic while less frequent ones tend to be specific. Results for two languages, English and Turkish that belong to different language families, are also provided. Such regularities provide valuable information to extract and understand relationships between semantic properties of words and word frequencies. In various applications, performance improvements can be obtained by employing these regularities. We also propose a method that leverages the Zipfian regularity to improve the performance of baseline textual entailment detection algorithms. To the best of our knowledge, our approach is the first quantitative study that uses Gaussian embeddings to examine the relationships between word frequencies and semantic breadth.en_US
dc.description.provenanceSubmitted by Samet Emre (samet.emre@bilkent.edu.tr) on 2022-02-15T06:36:39Z No. of bitstreams: 1 Zipfian_regularities_in_“non-point”_word_representations.pdf: 1150271 bytes, checksum: 0eecdb583fd691b20df8aaf9c963da75 (MD5)en
dc.description.provenanceMade available in DSpace on 2022-02-15T06:36:39Z (GMT). No. of bitstreams: 1 Zipfian_regularities_in_“non-point”_word_representations.pdf: 1150271 bytes, checksum: 0eecdb583fd691b20df8aaf9c963da75 (MD5) Previous issue date: 2021-05en
dc.embargo.release2023-05-31
dc.identifier.doi10.1016/j.ipm.2021.102493en_US
dc.identifier.issn0306-4573
dc.identifier.urihttp://hdl.handle.net/11693/77347
dc.language.isoEnglishen_US
dc.publisherElsevier Ltden_US
dc.relation.isversionofhttps://doi.org/10.1016/j.ipm.2021.102493en_US
dc.source.titleInformation_Processing_&_Managementen_US
dc.subjectWord variancesen_US
dc.subjectWord frequenciesen_US
dc.subjectZipf’s lawen_US
dc.subjectMeaning-frequency relationen_US
dc.subjectZipfian regularitiesen_US
dc.subjectWord entailmenten_US
dc.subjectSemantic breadthen_US
dc.titleZipfian regularities in “non-point” word representationsen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zipfian_regularities_in_“non-point”_word_representations.pdf
Size:
1.1 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: