Browsing by Author "Dündar, Ayşegül"
Now showing 1 - 10 of 10
- Results Per Page
- Sort Options
Item Open Access Benchmarking the robustness of instance segmentation models(Institute of Electrical and Electronics Engineers, 2023-08-29) Dalva, Y.; Pehlivan, H.; Altındiş, Said Fahri; Dündar, AyşegülThis article presents a comprehensive evaluation of instance segmentation models with respect to real-world image corruptions as well as out-of-domain image collections, e.g., images captured by a different set-up than the training dataset. The out-of-domain image evaluation shows the generalization capability of models, an essential aspect of real-world applica tions, and an extensively studied topic of domain adaptation. These presented robustness and generalization evaluations are important when designing instance segmentation models for real-world applications and picking an off-the-shelf pretrained model to directly use for the task at hand. Specifically, this benchmark study includes state-of-the-art network architectures, network backbones, normalization layers, models trained starting from scratch versus pretrained networks, and the effect of multitask training on robustness and generalization. Through this study, we gain several insights. For example, we find that group normalization (GN) enhances the robustness of networks across corruptions where the image contents stay the same but corruptions are added on top. On the other hand, batch normalization (BN) improves the generalization of the models across different datasets where statistics of image features change. We also find that single-stage detectors do not generalize well to larger image resolutions than their training size. On the other hand, multistage detectors can easily be used on images of different sizes. We hope that our comprehensive study will motivate the development of more robust and reliable instance segmentation models.Item Open Access Fine detailed texture learning for 3D meshes with generative models(Institute of Electrical and Electronics Engineers, 2023-11-03) Dündar, Ayşegül; Gao, J.; Tao, A.; Catanzaro, B.This paper presents a method to achieve fine detailed texture learning for 3D models that are reconstructed from both multi-view and single-view images. The framework is posed as an adaptation problem and is done progressively where in the first stage, we focus on learning accurate geometry, whereas in the second stage, we focus on learning the texture with a generative adversarial network. The contributions of the paper are in the generative learning pipeline where we propose two improvements. First, since the learned textures should be spatially aligned, we propose an attention mechanism that relies on the learnable positions of pixels. Second, since discriminator receives aligned texture maps, we augment its input with a learnable embedding which improves the feedback to the generator. We achieve significant improvements on multi-view sequences from Tripod dataset as well as on single-view image datasets, Pascal 3D+ and CUB. We demonstrate that our method achieves superior 3D textured models compared to the previous works.Item Open Access Image-to-image translation with disentangled latent vectors for face editing(Institute of Electrical and Electronics Engineers, 2023-08-24) Dalva, Y.; Pehlivan, H.; Hatipoglu, O. I.; Moran, C.; Dündar, AyşegülWe propose an image-to-image translation framework for facial attribute editing with disentangled interpretable latent directions. Facial attribute editing task faces the challenges of targeted attribute editing with controllable strength and disentanglement in the representations of attributes to preserve the other attributes during edits. For this goal, inspired by the latent space factorization works of fixed pretrained GANs, we design the attribute editing by latent space factorization, and for each attribute, we learn a linear direction that is orthogonal to the others. We train these directions with orthogonality constraints and disentanglement losses. To project images to semantically organized latent spaces, we set an encoder-decoder architecture with attention-based skip connections. We extensively compare with previous image translation algorithms and editing with pretrained GAN works. Our extensive experiments show that our method significantly improves over the state-of-the-arts.Item Open Access Learning portrait drawing with unsupervised parts(Springer New York LLC, 2024-04) Taşdemir, Burak; Gudukbay, M. G.; Eldenk, Doğaç; Meric, A.; Dündar, AyşegülTranslating face photos into portrait drawings takes hours for a skilled artist which makes automatic generation of them desirable. Portrait drawing is a difficult image translation task with its own unique challenges. It requires emphasizing important key features of faces as well as ignoring many details of them. Therefore, an image translator should have the capacity to detect facial features and output images with the selected content of the photo preserved. In this work, we propose a method for portrait drawing that only learns from unpaired data with no additional labels. Our method via unsupervised feature learning shows good domain generalization behavior. Our first contribution is an image translation architecture that combines the high-level understanding of images with unsupervised parts and the identity preservation behavior of shallow networks. Our second contribution is a novel asymmetric pose-based cycle consistency loss. This loss relaxes the constraint on the cycle consistency loss which requires an input image to be reconstructed after transformations to a portrait and back to the input image. However, going from an RGB image to a portrait, information loss is expected (e.g. colors, background). This is what cycle consistency constraint tries to prevent and when applied to this scenario, results in learning a translation network that embeds the overall information of RGB images into portraits and causes artifacts in portrait images. Our proposed loss solves this issue. Lastly, we run extensive experiments both on in-domain and out-of-domain images and compare our method with state-of-the-art approaches. We show significant improvements both quantitatively and qualitatively on three datasets.Item Open Access Partial convolution for padding, inpainting, and image synthesis(IEEE, 2022-09-26) Liu, Guilin; Dündar, Ayşegül; Shih, Kevin J.; Wang, Ting-Chun; Reda, Fitsum A.; Sapra, Karan; Yu, Zhiding; Yang, Xiaodong; Tao, Andrew; Catanzaro, BryanPartial convolution weights convolutions with binary masks and renormalizes on valid pixels. It was originally proposed for image inpainting task because a corrupted image processed by a standard convolutional often leads to artifacts. Therefore, binary masks are constructed that define the valid and corrupted pixels, so that partial convolution results are only calculated based on valid pixels. It has been also used for conditional image synthesis task, so that when a scene is generated, convolution results of an instance depend only on the feature values that belong to the same instance. One of the unexplored applications for partial convolution is padding which is a critical component of modern convolutional networks. Common padding schemes make strong assumptions about how the padded data should be extrapolated. We show that these padding schemes impair model accuracy, whereas partial convolution based padding provides consistent improvements across a range of tasks. In this paper, we review partial convolution applications under one framework. We conduct a comprehensive study of the partial convolution based padding on a variety of computer vision tasks, including image classification, 3D-convolution-based action recognition, and semantic segmentation. Our results suggest that partial convolution-based padding shows promising improvements over strong baselines.Item Open Access Progressive learning of 3D reconstruction network from 2D GAN data(IEEE, 2024-02) Dündar, Ayşegül; Gao, Jun; Tao, Andrew; Catanzaro, BryanThis paper presents a method to reconstruct high-quality textured 3D models from single images. Current methods rely on datasets with expensive annotations; multi-view images and their camera parameters. Our method relies on GAN generated multi-view image datasets which have a negligible annotation cost. However, they are not strictly multi-view consistent and sometimes GANs output distorted images. This results in degraded reconstruction qualities. In this work, to overcome these limitations of generated datasets, we have two main contributions which lead us to achieve state-of-the-art results on challenging objects: 1) A robust multi-stage learning scheme that gradually relies more on the models own predictions when calculating losses and 2) A novel adversarial learning pipeline with online pseudo-ground truth generations to achieve fine details. Our work provides a bridge from 2D supervisions of GAN models to 3D reconstruction models and removes the expensive annotation efforts. We show significant improvements over previous methods whether they were trained on GAN generated multi-view images or on real images with expensive annotations.Item Open Access Refining 3D human texture estimation from a single image(IEEE, 2024-12) Altındiş, Said Fahri; Meric, Adil; Dalva, Yusuf; Güdükbay, Uğur; Dündar, AyşegülEstimating 3D human texture from a single image is essential in graphics and vision. It requires learning a mapping function from input images of humans with diverse poses into the parametric (uv) space and reasonably hallucinating invisible parts. To achieve a high-quality 3D human texture estimation, we propose a framework that adaptively samples the input by a deformable convolution where offsets are learned via a deep neural network. Additionally, we describe a novel cycle consistency loss that improves view generalization. We further propose to train our framework with an uncertainty-based pixel-level image reconstruction loss, which enhances color fidelity. We compare our method against the state-of-the-art approaches and show significant qualitative and quantitative improvements.Item Open Access Understanding how orthogonality of parameters improves quantization of neural networks(IEEE, 2022-05-10) Eryılmaz, Şükrü Burç; Dündar, AyşegülWe analyze why the orthogonality penalty improves quantization in deep neural networks. Using results from perturbation theory as well as through extensive experiments with Resnet50, Resnet101, and VGG19 models, we mathematically and experimentally show that improved quantization accuracy resulting from orthogonality constraint stems primarily from reduced condition numbers, which is the ratio of largest to smallest singular values of weight matrices, more so than reduced spectral norms, in contrast to the explanations in previous literature. We also show that the orthogonality penalty improves quantization even in the presence of a state-of-the-art quantized retraining method. Our results show that, when the orthogonality penalty is used with quantized retraining, ImageNet Top5 accuracy loss from 4- to 8-bit quantization is reduced by up to 7% for Resnet50, and up to 10% for Resnet101, compared to quantized retraining with no orthogonality penalty.Item Open Access Unsupervised disentanglement of pose, appearance and background from images and videos(IEEE, 2021-01-29) Dündar, Ayşegül; J. Shih, K.; Garg, A.; Pottorf, R.; Tao, A.; Catanzaro, B.Unsupervised landmark learning is the task of learning semantic keypoint-like representations without the use of expensive input keypoint-level annotations. A popular approach is to factorize an image into a pose and appearance data stream, then to reconstruct the image from the factorized components. The pose representation should capture a set of consistent and tightly localized landmarks in order to facilitate reconstruction of the input image. Ultimately, we wish for our learned landmarks to focus on the foreground object of interest. However, the reconstruction task of the entire image forces the model to allocate landmarks to model the background. Using a motion-based foreground assumption, this work explores the effects of factorizing the reconstruction task into separate foreground and background reconstructions in an unsupervised way, allowing the model to condition only the foreground reconstruction on the unsupervised landmarks. Our experiments demonstrate that the proposed factorization results in landmarks that are focused on the foreground object of interest when measured against ground-truth foreground masks. Furthermore, the rendered background quality is also improved as ill-suited landmarks are no longer forced to model this content. We demonstrate this improvement via improved image fidelity in a video-prediction task. Code is available at https://github.com/NVIDIA/UnsupervisedLandmarkLearningItem Open Access Warping the residuals for image editing with StyleGAN(Springer New York LLC, 2024-11-18) Yıldırım, Ahmet Burak; Pehlivan, Hamza; Dündar, AyşegülStyleGAN models show editing capabilities via their semantically interpretable latent organizations which require successful GAN inversion methods to edit real images. Many works have been proposed for inverting images into StyleGAN's latent space. However, their results either suffer from low fidelity to the input image or poor editing qualities, especially for edits that require large transformations. That is because low bit rate latent spaces lose many image details due to the information bottleneck even though it provides an editable space. On the other hand, higher bit rate latent spaces can pass all the image details to StyleGAN for perfect reconstruction of images but suffer from low editing qualities. In this work, we present a novel image inversion architecture that extracts high-rate latent features and includes a flow estimation module to warp these features to adapt them to edits. This is because edits often involve spatial changes in the image, such as adjustments to pose or smile. Thus, high-rate latent features must be accurately repositioned to match their new locations in the edited image space. We achieve this by employing flow estimation to determine the necessary spatial adjustments, followed by warping the features to align them correctly in the edited image. Specifically, we estimate the flows from StyleGAN features of edited and unedited latent codes. By estimating the high-rate features and warping them for edits, we achieve both high-fidelity to the input image and high-quality edits. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements.