AI and the Rise of Digital Doppelgängers
Examining Hollywood's current AI issues, Part II: The Actors
In the last post, I discussed the AI concerns of the Writers Guild of America (WGA) and how those concerns relate to the current and near future state of AI technology.
In this post I’m going to examine the AI concerns of the Screen Actors Guild — American Federation of Television and Radio Artists (SAG-AFTRA) in their ongoing strike against the Alliance of Motion Picture and Television Producers (AMPTP). The AMPTP is the trade association that negotiates union contracts for the studios and production companies that make movies and tv shows.
Technically speaking, these AI concerns fall into two broad categories: those related to digital replicas of actors and those related to the digital manipulation of an actor’s appearance and performance. All of this falls into the purview of what’s termed visual effects.
First, let’s dive into where we are today in this area and the history of how we got here.
Digital Replicas
Digital replication of actors has been used in movies and television for several decades now, starting around the early nineties. This scene from Jurassic Park in which the T-Rex chomps down on the lawyer character in a bathroom and swings him around is an early example of a full digital double. As the computer generated T-Rex lifts the man up, the actor is transitioned into a digital double that the T-Rex can then swing around in the air.
A similar technique was used in the early days of visual effects by the renowned visual effects creator Ray Harryhausen. In movies like The Seventh Voyage of Sinbad, he’d replace an actor with a stop-motion animated replica when a stop-motion animated monster grabbed the actor.
Sometimes only the face of an actor is duplicated. This is often done when a stunt performer executes a stunt close to camera, and the face of the actor the stunt performer represents is digitally duplicated and placed on the stunt performer.
Duplicating an actor to create a twin has also been done a number of times and with a variety of techniques. Sometimes this is as simple as carefully combining separate shots of the actor, while at other times the face of the actor is digitally placed onto a second actor who is performing the movements of the twin, such as in this scene from The Social Network.
Occasionally, there are more unusual uses of a digital double, such as when the digitally duplicated and aged head of Brad Pitt was put on the body of a smaller actor for The Curious Case of Benjamin Button. Or, when Paul Walker died in a car crash with some outstanding scenes still to be shot for the movie Furious 7, and digitally recreated versions of his face were put onto actors doubling for him (usually one of his brothers).
More recently, at least in action-oriented scenes not involving close-ups, an actor injury or lack of availability might result in the scene being completed using a digital double.
Replication of background actors for crowd scenes has been going on for many more decades. Originally, this involved locking down the camera and repeatedly shooting a small group of people. Then, these same people would be moved to an adjacent area in the frame, rearranged, and filmed again. After doing this a number of times, the small sections filled with people would be combined in post production into one large area filled with a crowd.
More recently, background actors have been digitally duplicated either as a collection of images on two-dimensional “cards” or as relatively low resolution three-dimensional digital people. This is how sports stadiums are filled and massive armies created.
Digital Manipulation
Manipulating the appearance of actors digitally has been a reality for nearly thirty years. Manipulating their appearance through other means, such as make-up, costumes, and lighting, has been around almost as long as movies themselves.
Some common types of digital manipulation are aging or de-aging and removing blemishes and imperfections. For example, here’s a clip of how Michael Douglas was de-aged for a scene in Ant-man. In recent years, digital make-up effects have been applied in scenes that were shot without practical make-up effects and practical make-up effects have been digitally enhanced.
Digital manipulation of an actor’s actual performance has typically been pretty minimal, usually limited to something like adjusting an actor’s eyeline to match something that was added in later or the eyeline of a background actor who accidentally looked at the camera.
The Creation of a Digital Double
The digital replicas mentioned above can and have been created in a variety of ways. For more forgiving applications (like replicating background crowds), a few photos of each background performer is enough. However, for more demanding applications, a digital scan of the actor is used.
This has typically involved expensive and complex equipment and a knowledgeable technical team. Over the years, the technology has dramatically improved in quality, ease of use, and cost, and a quick and dirty scan can even be done by anyone with a recent high-end iPhone or iPad.
It’s important to note that simply scanning an actor does not create a digital double; it creates what amounts to a three-dimensional still photo. To make a digital duplicate of an actor that can move, even if it’s just the actors face, requires sophisticated software tools and an experienced team of visual effects artists. That’s just to create a face or full body that is capable of being animated.
To actually make the digital duplicate move requires a visual effects animator, or more likely, a team of them. Sometimes the animation is driven by capturing the motion of either the original actor or another actor, although this usually still requires refinement by animators.
Once the digital duplicate has been animated, it still must be put into each frame of the movie or tv show. This means that a photorealistic image of the digital double must be created for each frame, one that matches the lighting, focus, and camera movement of the original footage. This image must be rendered to accurately replicate the physics of light interactions in the real world, which can get pretty complicated.
Creating the rendered frames for a shot used to require many racks of computer servers and a lot of processing time, but hardware and software have continued to improve over the years and cut down on these requirements.
To create the final frame, a compositing artist uses more sophisticated software to pull all the elements together and make them match into the same frame properly. There are countless other details that must be created and mixed in between the scanning of the actor and creating the final frame, including shadows, reflections, objects passing between the camera and where the digital double is supposed to be, and anything that the digital double interacts with.
AI and Actors
So as can be seen from the above, duplication of actors, manipulating their appearance, and, to a lesser extent, manipulating their performance have all been going on for some time.
It’s worth noting that no AI was required for any of this. In fact, until the last couple of years, very little machine learning was used for any of the above. One early exception that employed rudimentary machine learning techniques was software used to create massive crowds of people (appropriately called Massive) that was originally developed for Peter Jackson’s Lord of the Rings trilogy and was used in the video clip above that showed three-dimensional digital crowds.
In general, though, all of the above has been created by large teams of artists laboring for many hours, days, weeks, months, and sometimes years to create this work. Anyone who’s stayed for the end credit of movies, particularly big, Hollywood blockbusters, has surely noticed the vast sea of names listing those responsible for the visual effects.
Today, AI techniques are definitely being used more and more in visual effects; they’re being woven into both existing and new tools of the trade, allowing artists to do things faster and more efficiently and frequently with better results. These tools are used for a wide variety of tasks, most of which are pretty obscure to the general public.
One use of AI that is somewhat better known to the general public is the creation of deepfakes. A deepfake is a duplicate of a person’s face using AI techniques, and they can look unsettlingly realistic. They can be created using photos of an actor from several angles or video footage that covers several angles.
This information is manipulated by software employing generative AI techniques to match the source face onto the movement and lighting of the target face. In most current applications, an artist is still needed to do some clean-up on the results to create a seamless blend, but the results can be very impressive.
And deepfake technology isn’t limited to just an actor’s face. Audio deepfakes can turn anyone’s voice into an extremely realistic reproduction of someone else’s voice. Some recent examples of deepfakes that have gone viral on YouTube and TikTok have been created by artists and programmers at Deep Voodoo and Metaphysic, two companies that are diving into the deepfake market for movies and tv.
While most of the discussion about deepfake technology involves duplicating actors (or other famous people), there are other uses for the technology, and one that is gaining momentum is translating shows from one language to another.
Right now generating a version of a show in another language means using either subtitles or dubbing. Subtitles let you hear the original voice of the actor and all the emotion behind it, but many people just don’t like having to read subtitles.
Dubbing solves the reading issue, but it’s usually pretty clunky, as it’s difficult to make one language match the mouth movements and timing of another. In fact, sometimes the dialogue is actually changed just to help with matching the new language to the original actor’s facial movements. It’s also hard to get the correct tonality and emotion that was present in the original.
Using deepfake technology, a movie can be translated into another language using the original actors voice and tonality, and it will look and sound like they’re actually saying the words in the new language.
The Proposals
SAG-AFTRA and AMPTP have both made public statements about their proposals for an agreement between the two parties regarding AI. For the purpose of examining the issue of AI and its relationship to the SAG-AFTRA strike, I considered the two documents listed below.
I should point out that there is a discrepancy between how SAG-AFTRA characterizes AMPTP’s counterproposals to SAG-AFTRA’s proposals and AMPTP’s own statement of their counterproposals (according to their press releases).
A list of proposals issued by SAG-AFTRA along with SAG-AFTRA’s description of AMPTP’s responses.
The above list along with AMPTP’s own description of their responses issued in an AMPTP press release on 7/21/23.
Here’s what SAG-AFTRA proposed:
ARTIFICIAL INTELLIGENCE: Establish a comprehensive set of provisions to protect human-created work and require informed consent and fair compensation when a “digital replica” is made of a performer, or when their voice, likeness, or performance will be substantially changed using AI.
Here’s are the AMPTP proposals from their press release:
Must obtain a background actor’s consent to use a “digital replica” other than for the motion picture for which the background actor was hired. Producers told SAG-AFTRA they would agree to apply the same provisions that the Producers proposed would apply to performers, so that consent and separate bargaining for payment must take place at the time of use.
Cannot use “digital replicas” of background actors in lieu of hiring the required number of covered background actors under the Agreement.
Must obtain a performer’s consent to create a “digital replica” for use in a motion picture.
Must obtain a performer’s consent to digitally alter the performance beyond typical alterations that have historically been done in post-production.
Must obtain a performer’s consent and bargain separately for use of a “digital replica” other than for the motion picture for which the performer was hired.
Producers told SAG-AFTRA they would agree to SAG-AFTRA’s proposal that consent to use a “digital replica” must include a “description of the intended use.” Likewise, consent to digital alterations must include a “description of the intended alterations.”
The main concern of SAG-AFTRA about AI seems to be that studios will be able to use AI to generate a duplicate of an actor and place that actor into a scene even when the actor hasn’t been contracted and paid to do that scene or even that show. A lesser concern seems to be that the studios will use AI to modify the performance of an actor after the fact without the actor’s consent.
Practically Speaking
It seems that the real issue here is not necessarily AI but digital duplication and manipulation of actors, something that has been going on for some time. AI will just make it easier, more efficient, and better looking. As an example, here’s a video that a single artist created using off-the-shelf, open-source deepfake software to recreate the same de-aging of Michael Douglas in Ant-man mentioned above — it’s arguably better than what appeared in the movie.
Since digital duplicates and digital manipulation have already been used many times in movies without AI, it seems worthwhile to consider the end result rather than how that result was achieved.
Of course, anything that is easier, more efficient, and better looking will likely be used more often and more widely. AI will eventually be able to create duplicates of actors or manipulate their performances much more directly and without anyone ever having been on set or on a motion capture stage and without the need for highly skilled animators. This is still some years away at the very least but will likely be possible at some point in the future.
One thing that stands out in the two documents quoted above is that AMPTP’s proposals do not use the term AI, unlike SAG-AFTRA’s proposals. This may be an important point given that this has been going on for some time without AI. In other words, the SAG-AFTRA concerns would likely need to be addressed more specifically in modified and/or additional language in the SAG-AFTRA Minimum Agreement, but it’s worth keeping in mind that neither digital duplication nor digital manipulation of actors require AI technology.
One More Thing
One of the concerns SAG-AFTRA representatives Duncan Crabtree-Ireland expressed was:
…that our background performers should be able to be scanned and get paid for one day’s pay and their companies should own that scanned image, their likeness to be able to use it for the rest of eternity in any project they want with no consent and no compensation…
As mentioned above, AMPTP denied that they had proposed this and this seems to be reflected in the proposals listed above from their press release. However, as a practical matter, there are a few things worth considering.
When replicating people to build up a large crowd, part of what you’re trying to replicate is the look of the crowd, specifically what they’re wearing. This is a big road block to using background actors from one movie on another. The faces in the crowd are typically at such low resolution in the final frame that a handful of random faces will usually work.
Generative AI techniques already make it quite possible to create generic photorealistic faces of people that have never existed. Similar techniques will likely make it possible in the near future to match the wardrobe as well. This means that for large crowds, no photos or scans of background actors will be needed.
However, the closer a digital character gets to camera, the more expensive and time consuming it is to create. Right now, the effort to get realistic looking background characters that are relatively close to camera is prohibitive — it’s less expensive to just use real people. But some number of years in the future, AI tools will likely reduce the expense and time needed to have realistic background characters that are close to camera. This probably won’t be in the next few years, but it will likely be possible some time soon after that.
Tools and Results
As things stand now and other than for these background crowds, creating a digital duplicate of a key actor to place into a scene where no actor was on set requires extreme effort and expense, and good results are difficult to attain. It will likely take some time before creating a digital duplicate of an actor in a scene doesn’t require significantly more time and money than using the actor directly.
Deepfake technology does allow one actor’s face to be placed on another actor in a faster and more convincing manner than previously possible, and this will likely lead to more widespread use of the technique. However, there’s still an actor driving the performance and face replacement itself doesn’t require deepfake technology or any other AI technology.
As mentioned in the last post in regards to AI systems writing screenplays, getting 80% of the way to your goal is usually quite a bit easier than getting that last 20%. The last 20% of effort to get to economically viable, fully AI generated lead performers could take years, so it may be worth concentrating more on the end results that are causing concern rather than the tools used to get those results.