Futurist > Implications of AI > VFX

A comprehensive guide to the state-of-art in how AI is transforming the visual effects (VFX) industry

By Martin Anderson

New machine learning techniques being pioneered at the major visual effects studios promise to transform the visual effects industry in a way not seen since the CGI revolution.


A lot has changed in the VFX AI scene since this article was published in early 2019, so do also check out our more recent feature on The Future Of AI Image Synthesis


It’s over twenty five years since the ground-breaking CGI effects of Jurassic Park usurped 100 years of visual effects tradition. When Steven Spielberg showed the first rushes of computer-generated dinosaurs to acclaimed traditional stop-motion animator Phil Tippett (who had been hired to create the dinosaurs in the same way they had been done since the 1920s) he announced “I think I’m extinct.” It’s a line so significant that it made it into the movie itself, in reference to a paleontologist envisaging a world where no-one would need him to theorize about dinosaurs any longer.

Though a quantum leap, the visual effects of Jurassic Park did not represent an overnight upheaval. They had been presaged sporadically throughout the 1970s, and at greater length in the 1980s, in cinematic curios such as Tron, The Last Starfighter and Flight of The Navigator.  In the few years directly prior, James Cameron had brought renewed interest to the possibilities of CGI with the ‘liquid’ effects of The Abyss and Terminator 2: Judgement Day.

But Jurassic Park was different: computers had achieved the ability to generate solid, photo-real objects, promising to relegate the uncomfortable burdens of the physical, photochemical VFX world. It set the trend for the decades ahead, and reinvented the visual effects industry — not without many casualties among the old guard.

Many influential movie makers and VFX studios were unable or unwilling to read the signs of the times in the years leading up to Jurassic Park. It now seems that the water is rippling again for the current state of the art in visual effects, as new machine learning techniques slowly encroach on the now-established workflows of CGI. And that a new ‘disruptive event’ may be coming to shake up the industry.

1: DeepFakes

Born in porn

In late 2017, not for the first time, porn proved a prime mover for a relatively obscure new technology. In that period a new sub-Reddit appeared, dedicated to publishing short pornographic video clips which had been convincingly altered to feature the faces of celebrities.

This apparent alchemy, now packaged by a pseudonymous user into a public code repository called DeepFakes, had been achieved with the use of a Convolutional Neural Network (CNN) and autoencoders (but not, as widely reported in mainstream articles and in Wikipedia, using a Generative Adversarial Network [GAN] – a machine learning technique first proposed by lead Google researcher Ian Goodfellow over three years earlier which was then gaining traction in other image-generation projects ¹ ).

Something seismic had begun to occur in machine learning research in this period. Recent advances in GPU-based machine learning had begun to facilitate the processing of large amounts of data in increasingly efficient time-frames. Almost overnight (in terms of the often-hindered history of AI), a great deal of global governmental and industry-led research into computer vision and object recognition, research centered around well-funded sectors such as robotics, logistics and security footage analysis, had become actionable and accessible to less ‘serious’ purposes.

GANs, CNNs and autoencoders began to crop up in headline-grabbing experiments around style-transfer and the inference and generation, using data from the public domain, of ‘unreal’, yet photorealistic images.

Research into Deep Photo Style Transfer in 2017, a collaboration between Cornell University and Adobe. Reference images and stylized secondary images are combined via a neural network to output photorealistic combinations.

Research into Deep Photo Style Transfer in 2017, a collaboration between Cornell University and Adobe. Reference images and stylized secondary images are combined via a neural network to output photorealistic combinations.

But DeepFakes, which allowed casual users to assault our longstanding faith in the authenticity of casual video footage, became the villainous totem which revealed the extent and nature of the coming revolution.


We take a deeper and more up-to-date dive into the topic of deepfakes in The Limited Future Of Deepfakes (March 2021), by the same author.


The port of the code was rough, but solid. It soon led to the availability of various, slightly more user-friendly DeepFake applications. However, those wishing to create videos were (and are) required to become (ironically) meticulous data scientists, gathering and curating large face-sets of celebrities to feed into the neural network, and then waiting up to a week for the neural net to cook the data into a model capable of making the transformation from one face to another.

Face-swapping software in action, as a journalist trains a machine learning model designed to swap faces between Jeff Bezos and Elon Musk. The preview window shows how near the model is getting to achieving a photorealistic swap. Full training can take anywhere between six hours and a week, depending on the configuration. Once trained, the model can spit out face-swapped images in seconds, and videos in minutes.

Face-swapping software in action, as a journalist trains a machine learning model designed to swap faces between Jeff Bezos and Elon Musk. The preview window shows how near the model is getting to becoming capable of a photorealistic swap between the two subjects. Full training can take anywhere between six hours and a week, depending on the configuration. Once trained, the model can spit out face-swapped images in seconds, and videos in minutes.

That notwithstanding, results which were now stunning and shocking the world could be obtained via some diligence, publicly available photos and a mid-level PC with a NVIDIA graphics card.

As would soon become clear, the technology seemed capable of rivalling or exceeding any comparable work out of Hollywood, within the limits of its own ambition: the convincing digital manipulation and transposition of faces in video footage.

Permission to fake

Though the internet ruminates on it almost daily, the implications of post-truth video are a subject for another day. However, it’s worth noting, in a legal climate that has yet to catch up with DeepFake technology, that the New York bill attempting to criminalize DeepFakes has been repudiated by the MPAA. The organization believes such a sweeping law would limit Hollywood’s ability to replicate historical personages, even with pre-DeepFake technology. A more general bill is currently working its way through Congress, though this relates to exclusively criminal usage of DeepFake tech.

But even in the event that the New York bill passes, it seems reasonable to assume that U.S.-based film-makers will be able to obtain permission to replicate actors using machine learning. For sure, the possibilities that AI-based technologies offer to the visual effects industry far exceed the aims of the implementations that made them famous.

Automation of VFX roles

Much as the march of AI is threatening radiologists more than doctors, the use of machine learning imperils certain trades within the VFX industry more than others — at least in the early years of tentative adoption. Its eventual potential scope extends to nearly every facet of VFX production currently handled under a traditional CGI pipeline.

As with most trades which AI is encroaching upon, it’s the layer of ‘interpretation’ which is most subject to automation: the artisanal process of collating and creatively manipulating data into the desired results.

In terms of CGI vs. AI, it’s useful to understand which parts of the process are susceptible to a machine learning approach.

The difference between a CGI mesh and a Deep Learning ‘model’

A traditional CGI approach to generating a human face involves creating or generating a 3D ‘mesh’ of the person, and mapping appropriate texture images onto the model. If the face needs to move, such as blinking or smiling, these variations will have to be painstakingly sculpted into the model as parameters. Muscle and skin simulations may need to be devised, in addition to hair or fur systems, to simulate eyebrows and facial hair.

A traditional CGI head comprising vector-based mesh, point information and texture detailing, among other facets

A traditional CGI head comprising vector-based mesh, point information and texture detailing, among other facets

A machine learning-based model is much more abstract in nature. It’s created by the process of analyzing and assimilating thousands of real-world source images of the two subjects being processed (the ‘target’ person who will feature in the final work, and the ‘source’ person that they will be transformed from).

During extraction of the data from source images, the software applies facial pose estimation to gain an understanding of the angle and expression of the face in each image (see colored lines in the image-set below). These ‘facial landmarks are used to make effective conversions, and, optionally, to train the model more efficiently.