Multimodal assessment of apparent personality using feature attention and error consistency constraint

buir.contributor.authorAslan, Süleyman
buir.contributor.authorGüdükbay, Uğur
buir.contributor.authorDibeklioğlu, Hamdi
buir.contributor.orcidGüdükbay, Uğur|0000-0003-2462-6959
buir.contributor.orcidDibeklioğlu, Hamdi|0000-0003-0851-7808
dc.citation.epage104163-9en_US
dc.citation.spage104163-1en_US
dc.citation.volumeNumber110en_US
dc.contributor.authorAslan, Süleyman
dc.contributor.authorGüdükbay, Uğur
dc.contributor.authorDibeklioğlu, Hamdi
dc.date.accessioned2022-02-11T12:08:47Z
dc.date.available2022-02-11T12:08:47Z
dc.date.issued2021-06
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractPersonality computing and affective computing, where the recognition of personality traits is essential, have gained increasing interest and attention in many research areas recently. We propose a novel approach to recognize the Big Five personality traits of people from videos. To this end, we use four different modalities, namely, ambient appearance (scene), facial appearance, voice, and transcribed speech. Through a specialized subnetwork for each of these modalities, our model learns reliable modality-specific representations and fuse them using an attention mechanism that re-weights each dimension of these representations to obtain an optimal combination of multimodal information. A novel loss function is employed to enforce the proposed model to give an equivalent importance for each of the personality traits to be estimated through a consistency constraint that keeps the trait-specific errors as close as possible. To further enhance the reliability of our model, we employ (pre-trained) state-of-the-art architectures (i.e., ResNet, VGGish, ELMo) as the backbones of the modality-specific subnetworks, which are complemented by multilayered Long Short-Term Memory networks to capture temporal dynamics. To minimize the computational complexity of multimodal optimization, we use two-stage modeling, where the modality-specific subnetworks are first trained individually, and the whole network is then fine-tuned to jointly model multimodal data. On the large scale ChaLearn First Impressions V2 challenge dataset, we evaluate the reliability of our model as well as investigating the informativeness of the considered modalities. Experimental results show the effectiveness of the proposed attention mechanism and the error consistency constraint. While the best performance is obtained using facial information among individual modalities, with the use of all four modalities, our model achieves a mean accuracy of 91.8%, improving the state of the art in automatic personality analysis.en_US
dc.embargo.release2023-06-30
dc.identifier.doi10.1016/j.imavis.2021.104163en_US
dc.identifier.eissn1872-8138
dc.identifier.issn0262-8856
dc.identifier.urihttp://hdl.handle.net/11693/77291
dc.language.isoEnglishen_US
dc.publisherElsevier BVen_US
dc.relation.isversionofhttps://doi.org/10.1016/j.imavis.2021.104163en_US
dc.source.titleImage and Vision Computingen_US
dc.subjectDeep learningen_US
dc.subjectApparent personalityen_US
dc.subjectMultimodal modelingen_US
dc.subjectInformation fusionen_US
dc.subjectFeature attentionen_US
dc.subjectError consistencyen_US
dc.titleMultimodal assessment of apparent personality using feature attention and error consistency constrainten_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Multimodal_assessment_of_apparent_personality_using_feature_attention_and_error_consistency_constraint.pdf
Size:
1.23 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: