Automatic construction of sememe knowledge bases from machine readable dictionaries
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
BUIR Usage Stats
views
downloads
Citation Stats
Series
Abstract
Sememes are the minimum semantic units of natural languages. Words annotated with sememes are organized into Sememe Knowledge Bases (SKBs). SKBs are successfully applied to various high-level language processing tasks as external knowledge bases. However, existing SKBs are manually or semi-manually constructed by linguistic experts over long periods, inhibiting their widespread utilization, updating, and expansion. To automatically construct an SKB from Machine-Readable Dictionaries (MRDs), which are readily available, we propose MRD2SKB as an automatic SKB generation approach. Well-established MRDs exist, and their construction is much simpler than SKBs. Therefore, the proposed MRD2SKB allows for fast, flexible, and extendable generation of SKBs. Building upon matrix factorization and topic modeling, we proposed several variants of MRD2SKB and constructed SKBs fully automatically. Both quantitative and qualitative results of extensive experiments are presented to demonstrate that the performances of the proposed automatically created SKBs are on par with manually and semi-manually prepared SKBs.