Effective and explainable mechanisms for natural language interface in databases

buir.advisorUlusoy, Özgür
dc.contributor.authorKarakayalı, Akifhan
dc.date.accessioned2021-10-07T09:33:16Z
dc.date.available2021-10-07T09:33:16Z
dc.date.copyright2021-09
dc.date.issued2021-09
dc.date.submitted2021-10-04
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (Master's): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2021.en_US
dc.descriptionIncludes bibliographical references (leaves 38-42).en_US
dc.description.abstractStructured Query Language (SQL) is a commonly used tool to extract and present structured data stored in Relational Database Management Systems (RDBMSs), yet inherited complexities of SQL create barriers for naive users who are capable of expressing queries as natural language queries (NLQs). In order to tackle this barrier we propose two di erent solutions; a Natural Language Interface to Database (NLIDB) pipeline with an explainable AI interface and a semantic search strategy. The rst solution introduces a NLIDB pipeline that uses SQL translation algorithms along with a keyword mapper to generate SQL queries for given NLQs. Proposed pipeline is presented to the user with an explainable AI interface so that the user can reason over the constructed query. We compared our approach with two state-of-art systems; NALIR+ and Pipeline+. Our approach surpass NALIR+ in imdb, scholar and yelp datasets achieving 88.9%, 100% and 60.0% translation accuracy for single table SELECT-JOIN queries and 68.6%, 87.0% and 83.6% translation accuracy for multiple table SELECT-JOIN queries, respectively. Our approach outperforms Pipeline+ in imdb and scholar datasets but Pipeline+ is slightly better in yelp dataset. The second solution proposes a semantic search approach that uses Information Retrieval based methods to retrieve related table rows for a given NLQ. The proposed approach uses the graph representation of the database where each row and value is represented with a node and edges represent the relation between them. Query and database rows are converted to vector representations using this graph representation and Graph Convolutional Networks (GCNs). A similarity calculation is performed using these vector representations and database rows are ranked according to their relevance to the query. Cosine distance metric is employed for similarity calculation. We tested our approach with college schema from Spider dataset collection and achieved a 42.8% top-5 accuracy.en_US
dc.description.degreeM.S.en_US
dc.description.provenanceSubmitted by Betül Özen (ozen@bilkent.edu.tr) on 2021-10-07T09:33:16Z No. of bitstreams: 1 10424894.pdf: 899502 bytes, checksum: 9b204b7388a65a325411397977e22670 (MD5)en
dc.description.provenanceMade available in DSpace on 2021-10-07T09:33:16Z (GMT). No. of bitstreams: 1 10424894.pdf: 899502 bytes, checksum: 9b204b7388a65a325411397977e22670 (MD5) Previous issue date: 2021-09en
dc.description.statementofresponsibilityby Akifhan Karakayalıen_US
dc.embargo.release2022-04-01
dc.format.extentxi, 42 leaves : illustrations, charts ; 30 cm.en_US
dc.identifier.itemidB149411
dc.identifier.urihttp://hdl.handle.net/11693/76594
dc.language.isoEnglishen_US
dc.publisherBilkent Universityen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectNLIDBen_US
dc.subjectQuery translationen_US
dc.subjectExplainable AIen_US
dc.subjectSemantic searchen_US
dc.titleEffective and explainable mechanisms for natural language interface in databasesen_US
dc.title.alternativeVeritabanlarında doğal dil arayüzü için etkili ve açıklanabilir mekanizmalaren_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
10424894.pdf
Size:
878.42 KB
Format:
Adobe Portable Document Format
Description:
Full printable version
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: