Image Synthesis Term Project

Controlling GAN Synthesis of Novel Architectural Imagery

Through Latent Vector Space Manipulation

 
 

Overview

There are many examples of deep neural network applications in art, design, and architecture. Algorithms like GANs have allowed designers to synthesize new imagery in the style of large databases of existing imagery, from Renaissance paintings, diagrams of modernist furniture designs, or photographs of particular architectural styles. However, the overwhelming majority of these projects fail to provide any level of synthesis control, thus reinforcing the stigma of algorithms being untamed and unknown black-box entities.

This project aims to shed light on some of these algorithms and reveal how synthesis control of architectural imagery can be achieved through the direct manipulation of latent vector space. Various models such as VanillaGAN, CycleGAN, and StyleGAN2 were revisited as potential sites for architectural synthesis control. In addition to testing various control methods, a small custom dataset of 250 architectural images was used to train and test on each model.

 

 
ezgif.com-gif-maker (1).gif
 
 

Lots of variation, but no control...

• No way of controlling synthesis output
• Style, geometry, environment, angle are left to chance
• Latent space not engaged
• Algorithm remains as uncontrollable black box
 

The Dataset

A custom 250 image dataset was used to train all models in this experiment. The dataset includes images of the Wangjin Soho building in China designed by Zaha Hadid Architects. This particular building was chosen for its strong geometric gestures, recognizable style, and strong online presence in regards to image volume.

The images within the database were collected using Archi_Base; an online tool I developed for the autonomous construction of large architectural image-based datasets. In regards to its operation, Archi_Base first collects user-specified images from online public databases, sorts them with an image-classifier algorithm, and then labels them according to class. This resulted in a large and sorted image database ready for DNN application.

A custom image augmentation tool was then used to apply various flipping, warping and environmental lighting changes to the images in order to increase the overall dataset size and improve training.

 
Screen Shot 2021-05-11 at 2.09.21 PM.png
Screen Shot 2021-05-19 at 11.36.19 PM.png
 
 

 

Control Method 1

Image to Image Translation

model: CycleGAN

At its core, CycleGAN provides a technique to translate one image into the general style of another. For example, it can transform an image of a horse galloping in a field into an image of a zebra galloping in a field. Though the background and pose of the horse remains the same, the horse itself has transformed into a zebra. This translation method gives us a great deal of control over the synthesis of new images. Not only can we create new content, but we can control its overall shape or pose, and background as well.

To the right are new synthesized images of the Yangjin Soho building that take on the volumetric language or massing of the target image which is the Heydar Aliyev centre, which is also designed by Zaha Hadid (the white wavey building). Thus, the shape of the Heydar Aliyev Centre in that specific image drives the shape of the new Soho building synthesized in the new image.

 
 
Screen Shot 2021-05-20 at 12.53.01 AM.png
 

How do CycleGANs work?

Here, the CycleGAN algorithm is trained to convert between different types of two kinds of buildings (The Wangjin Soho Building, and the Heydar Aliyev Center).” [2]

“The generator in the CycleGAN has layers that implement three stages of computation: 1) the first stage encodes the input via a series of convolutional layers that extract the image features; 2) the second stage then transforms the features by passing them through one or more residual blocks; and 3) the third stage decodes the transformed features using a series of transposed convolutional layers, to build an output image of the same size as the input. The residual block used in the transformation stage consists of a convolutional layer, where the input is added to the output of the convolution. This is done so that the characteristics of the output image (e.g., the shapes of objects) do not differ too much from the input.” [2]

 
 
Screen Shot 2021-05-20 at 12.56.57 AM.png
 
 

 

Manipulating Vector Latent Space

model: Vanilla GAN & StyleGAN 2

One of the key aspects of GAN control includes accessing and manipulating latent space vectors. Latent space is simply a representation of compressed data where similar data points are grouped closer together and dissimilar data points are grouped further apart within a multi-dimensional mathematical space. For images, data points describing the content of an image are mapped to this latent space, and thus, images with similar content (ex. images of skyscrapers) have data points that are grouped closer together. In addition to converting images into data points, we can also reverse this process and use corresponding latent space data points to reconstruct original images. The below diagram represents two steps. First, pixel-based data points are extracted from an image and mapped into latent space. Then, those data points are used to reconstruct an image that looks similar to the original. It is this “reconstructed” image that is in fact a new “synthesized” image that is not the original, but is very similar. Beyond image reconstruction, we can also mix or blend data points together that may have originated from different images. The result of this would be an image that cohesively blends a number of previously disparate features into a single image.

 

Latent Space Encoding & Reconstruction

 
Screen Shot 2021-05-20 at 6.45.35 PM.png
 

 

Control Method 2

Latent Space Interpolation

model: StyleGAN 2

GANs encode images as a collection of vectors within a complex multi-dimensional latent vector space. Though difficult to imagine, various points or areas within this latent space relate to particular features within images. For example, the patterns that define the outline of a building are located in one area of latent space, while the rules that control window grids exist in another. Beyond parts and pieces, similar images tend to cluster together in latent space, for example, images of tall skyscrapers in one area and images of 2 storey houses in another.

StyleGAN not only allows us to locate an images position within latent space, but also allows traverse between 2 or more images and explore areas in between. So for example, we might explore images of buildings that contain both elements of skyscraper and 2-storey residential buildings. Such direct control over latent space positioning allows us to control how and what we synthesize in novel imagery.

 
 
Interp.png
 
 

 

Control Method 3

Feature Extraction for Fine Detail Manipulation

model: StyleGAN 2

In addition to image reconstruction and interpolation between images, latent space provides us with the means to control finer image details such as content shape, positioning, colour, texture, etc. This is done by using StyleGAN2’s feature extraction abilities, which searches for and targets areas of latent space that control these finer image features. When applied to the Wangjin Soho dataset, we can use this tool to control things like building height, width, angle, and texture through targeted latent space manipulation.

 
 

Height Change (Left Tower)

The StyleGAN2 feature extraction tool was able to pinpoint the latent vector space that controls the height of the left tower. By manipulating this vector, director control over lefthand tower height can be achieved.

 
ezgif.com-gif-maker (2).gif
 
 
 

Height Change (Right Tower)

The StyleGAN2 feature extraction tool was able to pinpoint the latent vector space that controls the height of the right tower. By manipulating this vector, director control over right-hand tower height can be achieved.

 
rgf.gif
 
 

 

Next Steps…

Control Method 4: Sketch to Image Troubleshooting

model: StyleGAN 2

Various restraints can also be applied within StyleGAN in order to provide further synthesis control. One such example is a sketch constraint which forces the generator to synthesize an image that matches the content of a hand drawn sketch. This constraint is slightly tricky to apply and my model did not produce very satisfactory results (see below). Though my model was capable of producing fairly accurate sketch-to-image generations of cats when using the pre-trained cat generator, it did not produce good results when using my generator training on the custom Wangjin Soho dataset. As you can see below, the resultant images though matching the sketches colour and general content composition, fail to capture and reproduce the finer architectural, vegetation and sky details inherent in the training dataset. The reason for this is currently not clear as my code seems to be set up and written correctly. Perhaps my generator is not trained well enough or long enough on the dataset, my dataset may be too small (currently only 250 images), the correct hyper-paramaters or runtime arguments may not have been implemented correctly, or there are other problems that I have not yet realized. Again, this could be a coding issue somewhere which has not yet been resolved.

Either way, Sketch-to-Image is one of the more promising synthesis control methods that deserves further inquiry and application in the future. Tune back as those results will be posted once these issues have been resolved.