The Future Of AI Image Synthesis
Shortly after the new year 2021, the Media Synthesis community1 at Reddit began to become more than usually psychedelic.
The board became saturated2 with unearthly images depicting rivers of blood3, Picasso’s King Kong4, a Pikachu chasing Mark Zuckerberg5, Synthwave witches6, acid-induced kittens7, an inter-dimensional portal8, the industrial revolution9 and the possible child of Barack Obama and Donald Trump10.
These bizarre images were generated by inputting short phrases into Google Colab notebooks (web pages from which a user can access the formidable machine learning resources of the search giant), and letting the trained algorithms compute possible images based on that text.
In most cases, the optimal results were obtained in minutes. Various attempts at the same phrase would usually produce wildly different results.
In the image synthesis field, this free-ranging facility of invention is something new; not just a bridge between the text and image domains, but an early look at comprehensive AI-driven image generation systems that don’t need hyper-specific training in very limited domains (i.e. NVIDIA’s landscape generation framework GauGAN [on which, more later], which can turn sketches into landscapes, but only into landscapes; or the various sketch>face Pix2Pix projects, that are likewise ‘specialized’11).