Unsupervised disentanglement of pose, appearance and background from images and videos

Dündar, Ayşegül; J. Shih, K.; Garg, A.; Pottorf, R.; Tao, A.; Catanzaro, B.

Unsupervised disentanglement of pose, appearance and background from images and videos

buir.contributor.author	Dündar, Ayşegül
dc.contributor.author	Dündar, Ayşegül
dc.contributor.author	J. Shih, K.
dc.contributor.author	Garg, A.
dc.contributor.author	Pottorf, R.
dc.contributor.author	Tao, A.
dc.contributor.author	Catanzaro, B.
dc.date.accessioned	2022-01-31T10:52:29Z
dc.date.available	2022-01-31T10:52:29Z
dc.date.issued	2021-01-29
dc.department	Department of Computer Engineering	en_US
dc.description	( Early Access )	en_US
dc.description.abstract	Unsupervised landmark learning is the task of learning semantic keypoint-like representations without the use of expensive input keypoint-level annotations. A popular approach is to factorize an image into a pose and appearance data stream, then to reconstruct the image from the factorized components. The pose representation should capture a set of consistent and tightly localized landmarks in order to facilitate reconstruction of the input image. Ultimately, we wish for our learned landmarks to focus on the foreground object of interest. However, the reconstruction task of the entire image forces the model to allocate landmarks to model the background. Using a motion-based foreground assumption, this work explores the effects of factorizing the reconstruction task into separate foreground and background reconstructions in an unsupervised way, allowing the model to condition only the foreground reconstruction on the unsupervised landmarks. Our experiments demonstrate that the proposed factorization results in landmarks that are focused on the foreground object of interest when measured against ground-truth foreground masks. Furthermore, the rendered background quality is also improved as ill-suited landmarks are no longer forced to model this content. We demonstrate this improvement via improved image fidelity in a video-prediction task. Code is available at https://github.com/NVIDIA/UnsupervisedLandmarkLearning	en_US
dc.identifier.doi	10.1109/TPAMI.2021.3055560	en_US
dc.identifier.eissn	1939-3539	en_US
dc.identifier.issn	0162-8828	en_US
dc.identifier.uri	http://hdl.handle.net/11693/76909	en_US
dc.language.iso	English	en_US
dc.publisher	IEEE	en_US
dc.relation.isversionof	https://doi.org/10.1109/TPAMI.2021.3055560	en_US
dc.source.title	IEEE Transactions on Pattern Analysis and Machine Intelligence	en_US
dc.subject	Unsupervised landmarks	en_US
dc.subject	Keypoints	en_US
dc.subject	Foreground-background separation	en_US
dc.subject	Video prediction	en_US
dc.title	Unsupervised disentanglement of pose, appearance and background from images and videos	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Unsupervised_disentanglement_of_pose,_appearance_and_background_from_images_and_videos.pdf
Size:: 6.67 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.69 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Computer Engineering