Fine-grained Face Editing via Personalized Spatial-aware Affine Modulation
Fine-grained face editing, as a special case of image translation task, aims at modifying face attributes according to users’ preference. Although generative adversarial networks (GANs) have achieved great success in general image translation tasks, these models cannot be directly applied in the face editing problem. Ideal face editing is challenging as it has two special requirements – personalization and spatial-awareness. To address these issues, we propose a novel Personalized Spatial-aware Affine Modulation (PSAM) method based on a general GAN structure. The key idea is modulate the intermediate features in a personalized and spatial-aware manner, which corresponds to the face editing procedure. Specifically, for personalization, we adopt both the face image and the desired attribute as input to generate the modulation tensors. For spatial-aware, we set these tensors to be of the same size as the input image, allowing pixel-wise modulation. Extensive experiments in four fine-grained face editing tasks, i.e., makeup, expression, illumination and aging, demonstrate the effectiveness of the proposed PSAM method. The synthesis results of PSAM can be further boosted by a new transferable training strategy. To facilitate research of face editing, we also construct a new large-scale makeup dataset.
Architecture of the proposed PSAM. An input image I and a target attribute a are concatenated to form feature M, which is then fed into a modulation module to get two condition tensor Γ and B, which share the same shape as M. Affine modulation are applied to linear transform M with Γ and B to produce output images. The generated images and real images are fed into the discriminator D to classify their sources and corresponding attribute.
The results of the proposed PSAM in four fine-grained face editing tasks, including face makeup editing (1st row, left column), face aging (1st row, right column), illumination editing (2nd row) and expression editing (3rd row). Note that although intra-attribute differences are subtle, PSAM can produce very distinguishable fine-grained results.
To fully evaluate our proposed methods, we experiment with four face editing tasks, including makeup (Makeup-10K dataset), illumination (Multi-PIE dataset), aging (YGA dataset), expressions (CFEE dataset). Both FID and intra-FID scores are used to evaluate the quality of the enerated images. The lower the FID score, the better the quality of the generated data. As can be seen in table, PSAM achieves best performance on all datasets. See the paper for more details.
Code to be released
Dataset to be released
-  Zhang Z, Song Y, Qi H. Age progression/regression by conditional adversarial autoencoder. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5810-5818.
-  Choi Y, Choi M, Kim M, et al. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8789-8797.
-  Wang Z, Tang X, Luo W, et al. Face aging with identity-preserved conditional generative adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7939-7947.