Automatic deceit detection through multimodal analysis of speech videos

Available
The embargo period has ended, and this item is now available.

Date

2022-09

Editor(s)

Advisor

Dibeklioğlu, Hamdi

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats
2
views
64
downloads

Series

Abstract

In this study we propose the use of self-attention for spatial representation learning, while explore transformers as the backbone of our speech model for in-ferring apparent deceptive intent based on multimodal analysis of speech videos. The proposed model applies separate modality-specific representation learning from visual, vocal, and speech modality representations and applies fusion afterwards to merge information channels. We test our method on the popular, high-stake Real-Life Trial (RLT) dataset. We also introduce a novel, low-stake, in-the-wild dataset named PoliDB for deceit detection; and report the first results on this dataset as well. Experiments suggest the proposed design surpasses previous studies performed on RLT dataset, while it achieves significant classification performance on the proposed PoliDB dataset. Following our analysis, we report (1) convolutional self-attention successfully achieves joint representation learning and attention computation with up to three times less number of parameters than its competitors, (2) apparent deceptive intent is a continuous function of time that can fluctuate throughout the videos, and (3) studying particular abnormal behaviors out of context can be an unreliable way to predict deceptive intent.

Source Title

Publisher

Course

Other identifiers

Book Title

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Citation

Published Version (Please cite this version)

Language

English

Type