I had often read that AI can write texts or paint pictures. Until now, I had always thought that this could only be tried via web services such as GPT-3 or DALL-Efrom(commercial) providers.
I was therefore pleasantly surprised that there are also GANs that run on the home computer - as long as the graphics card has the appropriate performance and special drivers such as CUDA.
A few years ago, I had already successfully experimented with the related topic of "deep-fakes" with an even older graphics card. So I was in good spirits 😀.
On my first attempt in 2021, I wanted to go "all out" and train a GAN myself. It is important to know that training the model is usually the more computationally intensive part. Later use then requires comparatively little computing power. My PC has a GeForce GTX 1080 TI with 12 gigabytes of GPU memory. Although this already performs quite well, it is mediocre at best for AI training. But that shouldn't deter me, because I thought to myself: If the graphics card doesn't have enough power, then you just have to wait a little longer.
A selfie AI of its own
My first idea was: Why don't I train myself an AI to invent selfie photos of me - then I don't have to take 😉 my own selfies anymore. Coincidentally, I had been doing just that for the last few years, namely (spurred on by a corresponding smartphone app) regularly taking selfies. As a result, I already had a lot of training material at my disposal.
The appropriate tool for this was a StyleGAN based on TensorFlow. Setting up such a GAN was surprisingly easy - even though I realized almost too late the first time that you "mess up" the PC with all the Python dependencies. It is better to use virtual environments or the Anaconda tool.
However, the waiting time during training was sobering, even though my PC was calculating at full power (and fan!) day and night. Even after two days, the model was only able to create faces very dimly:
After two more weeks(!) you could already roughly see where it was going. However, the difference "glasses/no glasses" could perhaps become a problem. The results were starting to get a bit creepy...
Another week later, an interesting mix emerged: A few pictures were already pretty close to a real photo, while many still had quite obvious mistakes. The glasses, on the other hand, didn't seem to be a big problem after the first training results.
After another week of training, the network began to break off in the middle of training and quite randomly with mathematical calculation errors. This could be remedied by jumping back to an earlier state of the model and retraining from this state. From then on, however, this problem hung over the training as a sword of Damocles and cost even more computing time than before. If you regularly see in the morning that the training was stopped in the middle of the night without any result, it demotivates you a bit.
As a proof-of-concept and for my own spirit of research, however, the result was enough for me, so I did not start a new training course lasting several weeks. The last successful model was able to create selfies like this one (after a total of 4.5 weeks of training). Still pretty creepy - but also amazing what an AI can train on its own:
What I find exciting about this is that it is often said that the trained AI models do not allow any conclusions to be drawn about the training material. Instead, they would only "somehow" process the material abstractly and then create completely new things from what they have learned themselves, which would no longer have anything to do with the individual raw data. At least for this type of network, this does not seem to be entirely true from my point of view. In most of the pictures, I can also recognize the background of the picture: Both the whiteboard in the office covered with sticky notes and the blue Ikea shelf and the wallpaper in my study appear again and again recognizably. Which in itself seems logical, because the AI must also be able to generate this component of the image from its trained model.
Generate images by text input
I also find AIs that generate an image based on a text-only description exciting. Depending on the quality of the images, it is sometimes difficult to explain to someone that the computer has generated these images from training data on the basis of purely mathematical-statistical methods - and has not really "understood" what the given text means in terms of content.
Here, too, there are some programs and models that you can run locally and without an Internet connection on your own PC. With most of these AIs, you can also do things like "style transfer", i.e. have an existing image or photo traced in the style of a particular artist. Or you can retouch parts from images by letting the AI regenerate the missing area.
Here I have dealt exclusively with the function "Create image from text". The different AIs should produce the following images:
- French fries on the beach with sailboats in the background.
- A cat laying on a car in front of a beautiful sunset.
- A squirrel at a computer in a server room, with lots of colorful lights.
- Nikola Tesla holding a battery on a hill during a thunderstorm.
Optionally, they were output as a painting and/or as a photo - depending on the ability of the AI.
VQGAN-CLIP - creates rather abstract images
For my first attempts, I used VQGAN-CLIP, which is based on Pytorch. There are numerous trained models available, such as vqgan_imagenet, wikiart_16384, sflckr or coco.
The results were rather abstract in style - perhaps quite describable like the motifs of very artistic postcards:
French fries on the beach with sailboats in the background.
A cat laying on a car in front of a beautiful sunset.
A squirrel at a computer in a server room, with lots of colorful lights.
Nikola Tesla holding a battery on a hill during a thunderstorm.
Latent Diffusion Models (LDM)
The next images should produce Latent Diffusion Models based on pytorch and taming-transformers. LDMs can produce recognizable motifs, some of which look quite realistic. I find it interesting that some images also have a white, illegible caption or a kind of "Shutterstock" watermark.
Here are the results of the LDMs:
French fries on the beach with sailboats in the background.
A cat laying on a car in front of a beautiful sunset.
A squirrel at a computer in a server room, with lots of colorful lights.
Nikola Tesla holding a battery on a hill during a thunderstorm.
Other useful links to LDM:
Dall-E mini
Also DALL· E mini can be operated free of charge on your own PC. There is also a free online version on huggingface.co.
In my opinion, the generated images tend to be a little less successful than with the latent diffusion models. The motifs often tend to be abstract or sometimes look like a child's drawing. In between, however, there are always almost photorealistic ones, like here with the cats.
French fries on the beach with sailboats in the background.
A cat laying on a car in front of a beautiful sunset.
A squirrel at a computer in a server room, with lots of colorful lights.
Nikola Tesla holding a battery on a hill during a thunderstorm.
Disco Diffusion v5 Turbo
The last AI I tried was Disco Diffusion v5 for Windows.
The themes of the motifs are usually easily recognizable in the pictures, but often become a bit psychedelic and then remind me of DeepDream.Themotifsarethenincorporatedseveraltimesandoverlappingintothestructuresofthepicture.Inmyexperiments(especiallywiththephotooutput) were nowhere near as realistic as those of the LDMs or Dall-E-mini. However, some results like the "French Fries" have inspired me quite a bit - especially the paintings look quite artistic in my opinion.
Here are the results for Disco Diffusion v5 Turbo:
French fries on the beach with sailboats in the background.
A cat laying on a car in front of a beautiful sunset.
A squirrel at a computer in a server room, with lots of colorful lights.
Nikola Tesla holding a battery on a hill during a thunderstorm.
Useful links to Disco Diffusion v5 Turbo
- get started with disco diffusion to create AI generated art
- Disco Diffusion Modifiers
- Disco Diffusion AI Guide
- Zippy's Disco Diffusion Cheatsheet v0.3
- Generate a Music Video from song lyrics
- Disco Diffusion: How I Play With Prompts
- progrockdiffusion to run in console
Functionality
If you are interested in how it works - here is a nicely done explanation: