Multimodal assessment of apparent personality using feature attention and error consistency constraint

Aslan, Süleyman; Güdükbay, Uğur; Dibeklioğlu, Hamdi

Multimodal assessment of apparent personality using feature attention and error consistency constraint

buir.contributor.author	Aslan, Süleyman
buir.contributor.author	Güdükbay, Uğur
buir.contributor.author	Dibeklioğlu, Hamdi
buir.contributor.orcid	Güdükbay, Uğur\|0000-0003-2462-6959
buir.contributor.orcid	Dibeklioğlu, Hamdi\|0000-0003-0851-7808
dc.citation.epage	104163-9	en_US
dc.citation.spage	104163-1	en_US
dc.citation.volumeNumber	110	en_US
dc.contributor.author	Aslan, Süleyman
dc.contributor.author	Güdükbay, Uğur
dc.contributor.author	Dibeklioğlu, Hamdi
dc.date.accessioned	2022-02-11T12:08:47Z
dc.date.available	2022-02-11T12:08:47Z
dc.date.issued	2021-06
dc.department	Department of Computer Engineering	en_US
dc.description.abstract	Personality computing and affective computing, where the recognition of personality traits is essential, have gained increasing interest and attention in many research areas recently. We propose a novel approach to recognize the Big Five personality traits of people from videos. To this end, we use four different modalities, namely, ambient appearance (scene), facial appearance, voice, and transcribed speech. Through a specialized subnetwork for each of these modalities, our model learns reliable modality-specific representations and fuse them using an attention mechanism that re-weights each dimension of these representations to obtain an optimal combination of multimodal information. A novel loss function is employed to enforce the proposed model to give an equivalent importance for each of the personality traits to be estimated through a consistency constraint that keeps the trait-specific errors as close as possible. To further enhance the reliability of our model, we employ (pre-trained) state-of-the-art architectures (i.e., ResNet, VGGish, ELMo) as the backbones of the modality-specific subnetworks, which are complemented by multilayered Long Short-Term Memory networks to capture temporal dynamics. To minimize the computational complexity of multimodal optimization, we use two-stage modeling, where the modality-specific subnetworks are first trained individually, and the whole network is then fine-tuned to jointly model multimodal data. On the large scale ChaLearn First Impressions V2 challenge dataset, we evaluate the reliability of our model as well as investigating the informativeness of the considered modalities. Experimental results show the effectiveness of the proposed attention mechanism and the error consistency constraint. While the best performance is obtained using facial information among individual modalities, with the use of all four modalities, our model achieves a mean accuracy of 91.8%, improving the state of the art in automatic personality analysis.	en_US
dc.embargo.release	2023-06-30
dc.identifier.doi	10.1016/j.imavis.2021.104163	en_US
dc.identifier.eissn	1872-8138
dc.identifier.issn	0262-8856
dc.identifier.uri	http://hdl.handle.net/11693/77291
dc.language.iso	English	en_US
dc.publisher	Elsevier BV	en_US
dc.relation.isversionof	https://doi.org/10.1016/j.imavis.2021.104163	en_US
dc.source.title	Image and Vision Computing	en_US
dc.subject	Deep learning	en_US
dc.subject	Apparent personality	en_US
dc.subject	Multimodal modeling	en_US
dc.subject	Information fusion	en_US
dc.subject	Feature attention	en_US
dc.subject	Error consistency	en_US
dc.title	Multimodal assessment of apparent personality using feature attention and error consistency constraint	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Multimodal_assessment_of_apparent_personality_using_feature_attention_and_error_consistency_constraint.pdf
Size:: 1.23 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.69 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Department of Computer Engineering