Artlist CIO on the Future of AI Video Generation

Spilling the digital beans on AI. Interviews at the cutting edge

'This Year, You'll Be Able to Direct AI Video in Real Time Like You're on a Film

Interview with Joshua Davies, Chief Innovation Officer at Artlist

Joshua Davies is the Chief Innovation Officer at Artlist, the creative platform behind some of the most advanced AI video generation tools available today.

I got the chance to interview him at the Artlist Studio launch in NYC. Artlist had already caught my attention for how seriously they're approaching the AI side of video creation, but talking to Joshua made it clear just how far ahead they're thinking. He has a genuinely rare grasp of where this technology is going, and this conversation is full of insights I hadn't heard anywhere else.

I test Artlist regularly for my AI video generator comparison, so it was great to see where we're going in the near future with this platform.

The Interview

Lili: As Chief Innovation Officer at Artlist, you must have a much better sense of where video AI is actually heading. Can you talk about that?

Joshua: Right now, you still hit this wall where you click a button, wait several minutes, and a video comes out that isn't quite what you had in mind. But that's changing fast. Pretty much this year, if not early next year, companies like Google will have world models where you can interact with the AI in real time. You'll be able to put all your pieces together live, position the camera where you want, direct your actors how you want. The closest thing we have right now is doing that in 3D inside a game engine, but this will be generative AI running in real time. It will still be software, because you're still going to want all the familiar controls: the lens, the camera, the staging. But it's going to require everyone in this space to move really, really fast.

Lili: And where do AI agents fit into all of this?

Joshua: AI agents are really about helping people who don't want to think about AI at all, which is hopefully most of us. Right now, if you want to edit an image, you use one tool. If you want to generate an image, you use another. That's ridiculous, and we shouldn't have to do it. The agent handles that. You just tell it what you want, and it figures out which tool to use behind the scenes.

Lili: What about audio? If we're making something right now, even with tools like Veo 3 with native audio, everything still has to be managed separately. The sounds, the ambience, all of it.

Joshua: There's definitely work happening there. We're doing things inside Studio to give users more control. And there's strong evidence that the next versions of Veo and similar models will let you feed in a reference audio, so the output actually generates in the voice or sound profile you want. The truth is, pretty much every problem creators are running into is shared by hundreds of thousands of people around the world. Everyone knows what needs to be fixed. It's just not moving as fast as we'd like, because we want it now.

Lili: And on the other hand, it's all moving so fast already. Like, what do we even expect?

Joshua: Yeah, it's wild. It's a lot. But it's very exciting.