Style synthesizing conditional generative adversarial networks
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Volume
Issue
Pages
Language
Type
Journal Title
Journal ISSN
Volume Title
Attention Stats
Usage Stats
views
downloads
Series
Abstract
Neural style transfer (NST) models aim to transfer a particular visual style to a image while preserving its content using neural networks. Style transfer models that can apply arbitrary styles without requiring style-specific models or architectures are called universal style transfer (UST) models. Typically a UST model takes a content image and a style image as inputs and outputs the corresponding stylized image. It is, therefore, required to have a style image with the required characteristics to facilitate the transfer. However, in practical applications, where the user wants to apply variations of a style class or a mixture of multiple style classes, such style images may be difficult to find or simply non-existent. In this work we propose a conditional style transfer network which can model multiple style classes. While our model requires training examples (style images) for each class at training time, it does not require any style images at test time. The model implicitly learns the manifold of each style and is able to generate diverse stylization outputs corresponding to a single style class or a mixture of the available style classes. This requires the model to be able to learn one-to-many mappings, from an input single class label to multiple styles. For this reason, we build our model based on generative adversarial networks (GAN), which have been shown to generate realistic data in highly complex and multi-modal distributions in numerous domains. More specifically, we design a conditional GAN model that takes a semantic conditioning vector specifying the desired style class(es) and a noise vector as input and outputs the statistics required for applying style transfer. In order to achieve style transfer, we adapt a preexisting encoder-decoder based universal style transfer model. The encoder component extracts convolutional feature maps from the content image. These features are first whitened and then colorized using the statistics of the input style image. The decoder component then reconstructs the stylized image from the colorized features. In our adaptation, instead of using full covariance matrices, we approximate the whitening and coloring transforms using diagonal elements of the covariance matrices. We then remove the dependence to the input style image by learning to generate the statistics via our GAN model. In our experiments, we use a subset of the WikiArt dataset to train and validate our approach. We demonstrate that our approximation method achieves stylization results similar to the preexisting model but with higher speeds and using a fraction of target style statistics. We also show that our conditional GAN model leads to successful style transfer results by learning the manifold of styles corresponding to each style class. We additionally show that the GAN model can be used to generate novel style class combinations, which are highly correlated with the corresponding actual stylization results that are not seen during training.