As explained in the how and why section we found motivation and inspiration for this generator by our own hobby. Since both of us are working in the IT industry and try to keep up with state-of-the-art developments in our field (Data Engineering, Data Science), we were pretty excited when Nvidia released the source code for their new StyleGAN2.
To be an effective generator StyleGAN2 requires a lot of training images and it is therefore not easy to develop a well-functioning model on your own, as most people simply don`t have access to a lot of training data (as with every machine learning problem). The solution that made our own generator even possible was the release of StyleGAN2-ada.
In essence, the newer model needs far less training images to produce good results (good can be mostly defined by the eye-test for art-like images) by randomly augmenting the provided images (with pre-defined augmentation rules) to artificially increase the amount/diversity of the training input. This opened up the possibility to train our generator with a pretty small training set.
Creating our own dataset was/is the most difficult task of the whole creation process and in a project as small as ours, you never really stop looking for new data and inserting it into your dataset and therefore into your model. The most time-consuming part of creating our own dataset was/is the acquisition of new images, because in our case you need at least 1000 images per race to get the model to work well enough (we will explain later why so few are already a solid base). We didn´t want to create this site without the support of real artists and wanted to honor them for their great work. So we wrote mails to many artists and got some positive feedback to use their pictures for our project.
Additionally we started by searching and downloading free to use images, but quickly realized we needed the help of artists as the supply of images is very limited. After collecting the first images, we decided to focus only on portraits of fantasy characters as generating whole models of characters would need far more training images (we would need to train a model from scratch, see https://www.gwern.net/Faces#data for comparison).
At first we tried to work with automatic face detection models, but the results were not very good. So we decided to follow different approaches. We wanted to train our own face detection allgorithm for fantasy images but additionally get quick results. With the free tool (labelImg) we drew bounding boxes over the faces of the characters in the images.
Then we used the bounding box meta-data to crop the faces in a simple python script. After resizing all faces to 512x512 and manually sorting out bad quality our little dataset was good to go. In addition we have this labelled data for training our own face detection algorithm to reduce manual work in the future.
As explained above, our dataset only contains a small number of images (small in machine learning context) so we needed to find a model to use transfer learning, i.e. building up on the already trained model. A model of fantasy characters is apparently not already available, so we decided to try out the anime face model of https://www.gwern.net/Faces#stylegan-2 as a starting point. To our own surprise the results were already pretty good after a small number of training iterations (i.e. no more anime faces and a shift to the art style of our dataset) and therefore we stuck to that plan.
Again, a huge shout out the YouTube channel Artificial Images for providing free tutorials and notebooks to train custom StyleGAN2-ada models on Google Colab. We used Google Colab to train our model, because it is free, pretty simple to set up with the notebooks provided from Artificial Images and most importantly provides big enough GPUs that are required for training. Go check it out yourself!
As we collect for images, we constantly add them into our dataset by preparing them as explained above. Whenever we feel like we added enough images, we train a new model based on our best performing model with transfer learning.
In every project there will be some failures or hurdles and we had our fair share of them!
For training StyleGAN2(-ada) models your training images are required to be of specific size (quadratic with side length as a power of two) and they need to be stored in one single folder you can access with the image converter provided in the StyleGAN2(-ada) repository to convert all images into the required data format "tf-records". You can imagine how often one needs to redo the dataset if not reading the full instructions first...
Your images must also match the dimension of the model you plan to use for transfer learning, but luckily that requirement was pretty obvious.
While using Google Colab is cool because everyone likes free stuff, it comes with its own share of problems. You need to keep your browser tab with Colab open and running (and therefore your computer) and also you should avoid internet disconnects. Colab also understandably limits the free use of its GPUs to around 8-10 hours and you may be unable to connect to GPUs on consecutive days. But as our project was just a hobby we gladly took these restrictions to be able to train our model for free (comparable training hours on paid platforms would already cost a lot of money, only increasing as we constantly try to improve our model).
...to be surely continued
(This is not meant to be a step-by-step guide to the creation of your own model, but more like a documentation of the work we have done. We are glad to have received so much attention and feedback on https://www.reddit.com/r/rpg for a side-project we just didn`t want to end up somewhere on our hard drives.) Feel free to contact us directly if you have detailed questions or want to share thoughts on this topic.