Automatic deceit detection through multimodal analysis of speech videos
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Volume
Issue
Pages
Language
Type
Journal Title
Journal ISSN
Volume Title
Usage Stats
views
downloads
Attention Stats
Series
Abstract
In this study we propose the use of self-attention for spatial representation learning, while explore transformers as the backbone of our speech model for in-ferring apparent deceptive intent based on multimodal analysis of speech videos. The proposed model applies separate modality-specific representation learning from visual, vocal, and speech modality representations and applies fusion afterwards to merge information channels. We test our method on the popular, high-stake Real-Life Trial (RLT) dataset. We also introduce a novel, low-stake, in-the-wild dataset named PoliDB for deceit detection; and report the first results on this dataset as well. Experiments suggest the proposed design surpasses previous studies performed on RLT dataset, while it achieves significant classification performance on the proposed PoliDB dataset. Following our analysis, we report (1) convolutional self-attention successfully achieves joint representation learning and attention computation with up to three times less number of parameters than its competitors, (2) apparent deceptive intent is a continuous function of time that can fluctuate throughout the videos, and (3) studying particular abnormal behaviors out of context can be an unreliable way to predict deceptive intent.