Attended end-to-end architecture for age estimation from facial expression videos

buir.contributor.authorDibeklioğlu, Hamdi
dc.citation.epage1984en_US
dc.citation.spage1972en_US
dc.citation.volumeNumber29en_US
dc.contributor.authorPei, W.en_US
dc.contributor.authorDibeklioğlu, Hamdien_US
dc.contributor.authorBaltrušaitis, T.en_US
dc.date.accessioned2021-02-18T10:21:54Z
dc.date.available2021-02-18T10:21:54Z
dc.date.issued2020
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractThe main challenges of age estimation from facial expression videos lie not only in the modeling of the static facial appearance, but also in the capturing of the temporal facial dynamics. Traditional techniques to this problem focus on constructing handcrafted features to explore the discriminative information contained in facial appearance and dynamics separately. This relies on sophisticated feature-refinement and framework-design. In this paper, we present an end-to-end architecture for age estimation, called Spatially-Indexed Attention Model (SIAM), which is able to simultaneously learn both the appearance and dynamics of age from raw videos of facial expressions. Specifically, we employ convolutional neural networks to extract effective latent appearance representations and feed them into recurrent networks to model the temporal dynamics. More importantly, we propose to leverage attention models for salience detection in both the spatial domain for each single image and the temporal domain for the whole video as well. We design a specific spatially-indexed attention mechanism among the convolutional layers to extract the salient facial regions in each individual image, and a temporal attention layer to assign attention weights to each frame. This two-pronged approach not only improves the performance by allowing the model to focus on informative frames and facial areas, but it also offers an interpretable correspondence between the spatial facial regions as well as temporal frames, and the task of age estimation. We demonstrate the strong performance of our model in experiments on a large, gender-balanced database with 400 subjects with ages spanning from 8 to 76 years. Experiments reveal that our model exhibits significant superiority over the state-of-the-art methods given sufficient training data.en_US
dc.description.provenanceSubmitted by Onur Emek (onur.emek@bilkent.edu.tr) on 2021-02-18T10:21:54Z No. of bitstreams: 1 Attended_End-to-End_Architecture_for_Age_Estimation_From_Facial_Expression_Videos.pdf: 2834408 bytes, checksum: c0cc363aa564421c5722cb90e7a9b0e1 (MD5)en
dc.description.provenanceMade available in DSpace on 2021-02-18T10:21:54Z (GMT). No. of bitstreams: 1 Attended_End-to-End_Architecture_for_Age_Estimation_From_Facial_Expression_Videos.pdf: 2834408 bytes, checksum: c0cc363aa564421c5722cb90e7a9b0e1 (MD5) Previous issue date: 2020en
dc.identifier.doi10.1109/TIP.2019.2948288en_US
dc.identifier.issn1057-7149
dc.identifier.urihttp://hdl.handle.net/11693/75442
dc.language.isoEnglishen_US
dc.publisherIEEEen_US
dc.relation.isversionofhttps://dx.doi.org/10.1109/TIP.2019.2948288en_US
dc.source.titleIEEE Transactions on Image Processingen_US
dc.subjectAge estimationen_US
dc.subjectEnd-to-enden_US
dc.subjectAttentionen_US
dc.subjectFacial dynamicsen_US
dc.titleAttended end-to-end architecture for age estimation from facial expression videosen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Attended_End-to-End_Architecture_for_Age_Estimation_From_Facial_Expression_Videos.pdf
Size:
2.7 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: