Biçer, Berat2022-09-162022-09-162022-092022-092022-09-14http://hdl.handle.net/11693/110515Cataloged from PDF version of article.Thesis (Master's): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2022.Includes bibliographical references (leaves 70-85).In this study we propose the use of self-attention for spatial representation learning, while explore transformers as the backbone of our speech model for in-ferring apparent deceptive intent based on multimodal analysis of speech videos. The proposed model applies separate modality-specific representation learning from visual, vocal, and speech modality representations and applies fusion afterwards to merge information channels. We test our method on the popular, high-stake Real-Life Trial (RLT) dataset. We also introduce a novel, low-stake, in-the-wild dataset named PoliDB for deceit detection; and report the first results on this dataset as well. Experiments suggest the proposed design surpasses previous studies performed on RLT dataset, while it achieves significant classification performance on the proposed PoliDB dataset. Following our analysis, we report (1) convolutional self-attention successfully achieves joint representation learning and attention computation with up to three times less number of parameters than its competitors, (2) apparent deceptive intent is a continuous function of time that can fluctuate throughout the videos, and (3) studying particular abnormal behaviors out of context can be an unreliable way to predict deceptive intent.xi, 85 leaves : illustrations (color), photography, charts ; 30 cm.Englishinfo:eu-repo/semantics/openAccessAutomatic deceit detectionBehavioral analysisAffective computingMultimodal data analysisDeep learningAutomatic deceit detection through multimodal analysis of speech videosKonuşma videolarının çok-kipli analiziyle otomatik aldatma tespitiThesisB161295