What better way to spend a weekend than to put together a fake nature documentary teaser trailer using as many AI tools as I could? Please take a look at the official teaser trailer for “Wings: The Unexpected History of The Gorillahawk."
The tools I used were all free for me, but to be fair, that’s because I already own or subscribe to a couple of them. Here’s a breakdown of what I used, and my estimate of how the workload was split between me and the particular AI tools involved:
STABLE DIFFUSION, DALLE: AI-Generated base gorillahawk images: I spent a lot of time working on the prompts for these images. The image generation tools still struggle with several things that made my life difficult. In particular, non-singular elements tend to confuse even the newest tools. So a group of gorillahawks seemed impossible until just the past few weeks, but even now, lots of post-generation work is needed. [My work: 25 percent, AI: 75 percent]
GIGAPIXEL: Image upscaling. Original generations tend to be 1024x1024, so about half of HD, and I wanted to do significant Ken Burns-style motion-control effects, so I needed a much bigger canvas to work with. Gigapixel provided me with 6x sizing with almost no work on my part outside of some fiddling with settings. [My work: 5 percent, AI: 95 percent]
PHOTOSHOP: Image extension and editing: Most of the images started in square aspect ratio, but the final product needed to be 16:9. So after the upscale, I used Photoshop’s AI-based generative fill feature to fill in the edges. It was a tossup as to whether this would be a single button-press, or a herculean struggle to get what I needed. Same with using the generative fill feature to ‘inpaint’ areas of the image that needed help. So, after any AI generation, I would step in to do a certain amount of old-school Photoshop retouching to get the images where they needed to be. [My work: 40 percent, AI: 60 percent].
MICROSOFT COPILOT (aka Chat-GPT): Script. I had an idea for the script for the voiceover, so I entered that idea into Copilot, and did a few rerolls to get enough content I liked. I then rearranged, edited, and added content to get the final version. In the end, I did more work than the AI on this part. [My work 65 percent, AI: 35 percent]
GENNY BY LOVO: Voiceover. After spending some time fine-tuning the voiceover and rerolling with different voices and other settings, the AI created just about a perfect voiceover for the video. If I had been using the paid version, I would have gone deeper to try to create better intonation, emphasis, and phrasing for some of the lines, but overall I was surprised at how close I got to what I wanted. [My work: 3 percent, AI: 97 percent]
AIVA: Music. In creating the music bed for the video, there were very few options; I just chose a style, a speed, and an approximate length, then rerolled a couple of times. It’s not John Williams, but it is passable and required effectively no effort on my part. [My effort: 1 percent, AI: 99 percent]
DAVINCI RESOLVE STUDIO: Editing, VFX, Graphics, Color Grading. For the most part, this was standard post-production work. The one very small exception is that for a couple of shots, I used one of two new plugins for Resolve which claim to use AI: Relight and Depth Map. These were used just to speed up the process of rotoscoping and didn’t add anything I could not have done without them. [My work: 99.5 percent, AI: 0.5 percent]
I expect that video creation will just get easier and easier from here on out. I’d like to have actually animated these images rather than just doing the motion control and simple VFX on them. This capability has just crept into the discussion in the past few weeks and is not ready for prime time yet, but I expect it will be in six months to a year. I’d also love to be able to get more consistent styles, particularly photo-realistic ones. But given how far things have changed in the past 18 months, I’m sure this will be easier and easier in the coming year or so. A little further out will be consistent 3D characters that can be animated in precise and controllable ways via text prompt, and lit and ‘photographed’ with similar precision, all in an intuitive user interface. I’d be surprised if we don’t see that sometime in 2025. I can’t wait to see what we can create then!