Breaking Benjamin —

This wild, AI-generated film is the next step in “whole-movie puppetry”

Results are admittedly limited due to a 48-hour crunch—but hint at a wild future.

Two years ago, Ars Technica hosted the online premiere of a weird short film called Sunspring, which was mostly remarkable because its entire script was created by an AI. The film's human cast laughed at odd, computer-generated dialogue and stage direction before performing the results in particularly earnest fashion.

That film's production duo, Director Oscar Sharp and AI researcher Ross Goodwin, have returned with another AI-driven experiment that, on its face, looks decidedly worse. Blurry faces, computer-generated dialogue, and awkward scene changes fill out this year's Zone Out, a film created as an entry in the Sci-Fi-London 48-Hour Challenge—meaning, just like last time, it had to be produced in 48 hours and adhere to certain specific prompts.

That 48-hour limit is worth minding, because Sharp and Goodwin went one bigger this time: they let their AI system, which they call Benjamin, handle the film's entire production pipeline.

The Benjamin that wouldn’t die

The requirements for Zone Out, as given to the filmmakers by Sci-Fi-London once the 48-hour clock started ticking.
Enlarge / The requirements for Zone Out, as given to the filmmakers by Sci-Fi-London once the 48-hour clock started ticking.

Just like in 2016, the duo was given a series of requirements for their short film. (Their 2018 requirements are shown above.) This time, they wanted Benjamin to take that data and run with it.

In order to achieve their goal of having Benjamin "write, direct, perform and score" this short film within 48 hours, without any human intervention, the duo began pre-planning for the festival by developing a workflow, Sharp said in an Ars interview. That meant adding additional tasks to Benjamin's workload. Their plan required Benjamin to do the following: cobble together footage from public domain films, face-swap the duo's database of human actors into that footage, insert spoken voices to read Benjamin's script, and score the film.

This was all on top of writing the screenplay, a process that has been refined since Benjamin's last 2016 splash. The AI continues to rely on a LSTM (long short-term memory) recurrent neural network, which Ars' Annalee Newitz described previously:

To train Benjamin, Goodwin fed the AI with a corpus of dozens of sci-fi screenplays he found online—mostly movies from the 1980s and '90s. Benjamin dissected them down to the letter, learning to predict which letters tended to follow each other and from there which words and phrases tended to occur together. The advantage of an LSTM algorithm over a Markov chain is that it can sample much longer strings of letters, so it's better at predicting whole paragraphs rather than just a few words. It's also good at generating original sentences rather than cutting and pasting sentences together from its corpus. Over time, Benjamin learned to imitate the structure of a screenplay, producing stage directions and well-formatted character lines.

Zone Out's script, just like Sunspring's, teeters on the edge of inanity and emotion—which, honestly, puts it right up there with the best of the sci-fi canon. (A dialogue example taken directly from the film, which almost sounds like Benjamin's criticism of his masters: "Why don't you tell me what... you say is true that the human being will be able to reenforce the destruction of a human being?") This time, the script's odd, not-quite-human results are only amplified by having so many other film-production tasks automated by AI.

Snags arose during production as the duo struggled to find public-domain film footage that they could safely use in their own potentially commercial enterprise. The challenge wasn't just about copyright; the footage had to contain a significant number of shots with sole actors facing directly toward the camera, which Benjamin could more easily snip and insert into whatever it composed. Between their deep dive into a public domain film database and conversations with a lawyer, Goodwin and Sharp settled on two films: The Last Man on Earth and The Brain That Wouldn't Die.

"Badly dubbed"... for now

The most striking part of the film is its reliance on face-swapping technologies to adapt existing films to Benjamin's will. Face-swapping has become a pretty hot topic in pop culture, particularly after an altered video of President Barack Obama went viral in 2017 (and a followup take, with director/comedian Jordan Peele filling in as an impersonator, rekindled the viral fire in April). Still, the technology's limitations are quite apparent, especially when time limits factor into any production. An April attempt to insert actor John Cho into popular films illustrated the immense amount of computational time needed to refine a face swap, and Zone Out's production team ran into similar issues while having Benjamin parse pre-recorded footage of actors Thomas Middleditch, Elisabeth Gray, and Humphrey Ker.

The impact of the time crunch is quite apparent in the final product, and Sharp admits that computational limits hamstrung the team's vision of a decent-looking and sounding product. An open source version of Tacotron was initially considered to synthesize speech using the duo's own human-recorded dialogue and samples; human actors spoke reams of dialogue that Benjamin would have automatically inserted where appropriate. But this proved too computationally expensive for the time limit, so the duo fell back on synthetic voice generation.

The three actors relied upon for face-swap content.
Enlarge / The three actors relied upon for face-swap content.

Similar issues arose with face-swap and face-puppeteering systems in place. "We eventually had to accept that the film would just look 'badly dubbed,'" Sharp said, as tools such as a generative adversarial network and an open source version of face2face could only muster "first-draft" face-rendering results in the time allotted.

To the team's credit, one of the duo's original plans worked out pretty swimmingly: a completely robo-composed score, based on the Jukedeck platform, that "analyzed the emotional content of the screenplay," Sharp said. The result is a sparse but solid piano soundtrack that helps distract from Zone Out's odd voice synthesis.

“A goal for next time”

The biggest failure in the automation process came from an attempt to have a different AI system, a convolutional neural network, automate the process of selecting footage from the public domain films to be edited by Benjamin. "There were neither sufficient object descriptors in the screenplay nor sufficient numbers of unique objects in the shots," Sharp said, which meant the auto-editing system didn't have enough data to clasp onto. Sharp and Goodwin were careful at this point to essentially obey the AI's decisions as a "director" and pick film scenes, shot lengths, and casting assignments that hewed to Benjamin's apparent artistic vision.

"Editor Jono Chanin and I worked under the presumption that this was the story Benjamin was trying to tell and edited accordingly, while also keeping verbatim to Benjamin's screenplay," Sharp said. "So here, some human interpretation finally broke in, despite my hope to purge it completely from this iteration of the Benjamin saga. That remains a goal for next time."

Indeed, while the resulting film (and its reliance on subpar voice synthesis) is bizarre, it includes a fair share of appreciably emotional moments, particularly when Benjamin's script aligns with public-domain footage of a showdown between a suffering couple. Greater computational efficiency and refined data-parsing tools may very well make this kind of 48-hour computer-crunch of filmmaking a real possibility in the future.

And Sharp clearly isn't done trying. "In the days since this experiment, Ross has already uncovered some new technologies he thinks could lead us to fully automated editing—and something else we’re nicknaming 'whole-movie puppetry,'" he said to Ars. "Exciting stuff."

Listing image by Therefore Films

Channel Ars Technica