>

I Tested the 6 Best AI Video Models on Artlist - Part 1

I would have bet against these results...

Lili Marocsik

April 10, 2026

Blog

Video Generators

5 min

TL;DR

❤️ Before we get started I'd like to thank you for using my affiliate links to sign up to free trials, LLMs are constantly stealing my content and you help me stay afloat and create more of this content to AI enthusiasts and small business owners. ❤️

If you've been on the fence about which AI video model to use on Artlist, I get it. The options keep growing and most comparisons out there don't actually show you what happens with a real prompt on the first try.

That's exactly what I did here. I ran the same prompts across all six models and I'm always showing you the first generation, no prompt fine-tuning, no cherry-picking. That's the only way to see where each model actually stands.
One thing worth mentioning: I would have loved to include Seedance 2.0 in this test, but since it currently restricts videos with humans, I had to leave it out.

To find the best AI video models, I tested six: Kling 3.0, Sora 2, Veo 3.1, Wan 2.7, LTX 2 Pro and Runway 4.5.
In this post I'm covering prompt 1 and prompt 2.

Wanna jump ahead? Prompts 3 and 4 are in part 2 — link here.

Prompt 1: Robot Meets Human

The prompt: Animate a warm handshake between a humanoid robot and a woman in slow motion, emphasizing the connection. The robot's LED lights softly glow in sync with its movements, while the woman's expression shows joy and curiosity. The Milky Way moves fast from right to left behind them, as if time stopped when they met. In the background, a neon sign reads: "aitoolssme.com really takes every opportunity for advertisement...". The woman says warmly: "Nice to meet you." The robot responds in a robotic voice: "The pleasure is mine. Lili says hi." Use multiple camera angles and shots throughout the scene.

What I was testing: multishot capability and text rendering in video.

This was a deliberately tricky prompt. It stacks a lot of requirements at once: synchronized LED motion, a moving Milky Way, legible background text, voiceovers from two characters with different voices, and multiple camera angles. I wanted to push each model hard from the start.

Try Veo 3 Via Artlist

🥇 Veo 3.1

Veo 3.1 came closest to nailing this prompt. The characters look believable, it shifts perspective once without overdoing it, and the neon text in the background renders correctly. It's the only model that handled almost every requirement in a single generation. The one real letdown is the audio, which didn't work despite having it switched on. I even ran a second generation to check, same result. But if the sound had delivered, this would have been close to perfect.

Sora 2

Sora turned this into an unintentional comedy and honestly, it's kind of charming. The robot has aggressively painted-on eyebrows that give it a permanently furious expression, which clashes hilariously with the romantic tone of the scene. The hand grip is a bit off and the background text is misspelled, but the overall composition holds together. Think "The Room" but make it sci-fi.

Kling 3.0

Kling 3.0 struggled most with the text element. It added a large backdrop behind the neon sign that covers most of the space background, and even then the text is illegible and doesn't make sense. The robot looks more like a toy than a humanoid, and while the audio is passable, the final voiceover line lands awkwardly. Camera movement floats around the subjects but never actually cuts to a different angle or offers a closeup, so multishot is a miss here.

Wan 2.7

Wan 2.7 has a known strength with text rendering and that's partly why I included the neon sign in the prompt. It mostly delivered on that, though it dropped one word so the sign doesn't quite make sense. The Milky Way moves, which is good, but in the wrong direction. The camera movement is rough: it zooms into the handshake, snaps back to the original framing, then slightly pulls out. And true to form, Wan defaulted to its animation style without being prompted otherwise.

LTX 2 Pro

LTX 2 Pro overshoots the brief on camera movement to the point where the whole thing feels like a fairground ride. The setting is unclear, it reads more like a film studio with practical floor lighting than deep space. The background text is completely incoherent, the robot looks unfinished, and there's a continuity error mid-clip where the woman's hands shift position between cuts. The ending frame has a background that looks like it belongs at a carnival, which at least brings the rollercoaster metaphor full circle.

Runway 4.5

Runway's output is static. Multishot isn't something it can handle yet, and it has no audio capability for this type of generation. The robot has six fingers, the text placement is poor and contains errors, and there's an awkward moment with the woman's second hand that's hard to look away from. Not Runway's strongest prompt type.

Prompt 2: The Breakdancing Nun

The prompt: A nun dances breakdance like in the 80s in the middle of young, urban-looking spectators. The background is a public square, the crowd of spectators is surrounding them like breakdancers usually do. Everything looks very urban and a bit gangster. The crowd cheers and whistles. The style is photorealistic, and everything has enhanced shadows.

What I was testing: motion accuracy, audio and SFX.

This is one of my favourite prompts for stress-testing the best AI video models because it combines complex human motion with crowd dynamics and audio, three things most models still struggle with.
This prompt is deceptively hard. Breakdancing involves very specific body mechanics, and most models either freeze up or produce something that looks vaguely dance-adjacent but nothing close to actual footwork or headspins. Add a nun's habit into the mix and you've got a real stress test for how well each model handles complex human motion.

Try Sora 2 Via Artlist

🥇 Sora 2

Sora 2 is the only model here that actually generates a breakdancing nun. Not a nun swaying awkwardly, not a nun doing something that might be dancing if you squint. An actual breakdancing nun. The crowd framing feels natural, the spectators react authentically, and the audio fits the scene without feeling bolted on. This is where Sora 2 pulls ahead and it's not particularly close.

Kling 3.0

Kling's nun ends up performing something closer to a Cossack dance than breakdance, which is genuinely funny to watch. The audio is mediocre and the background lacks the kind of urban grit the prompt calls for. The scene doesn't have much depth or detail, but there's an unintentional charm to it.

Veo 3.1

Veo 3.1 can't crack the breakdance movement either. The nun's legs do their own thing, none of it reads as breakdancing. The crowd is also too choreographed, everyone moves in near-perfect unison which kills the energy of what should be a spontaneous street scene. The audio and scenery are decent but not enough to compensate for the motion failing.

Wan 2.7

After Wan defaulted to animation style in prompt 1, I added a photorealistic specification here. It didn't fully fix it, but the result is at least grounded in live action. The nun's dance moves are chaotic and hard to label, but the character stays consistent throughout the clip. A nice touch: Wan added multishot cuts on its own, which none of the other models did for this prompt.

LTX 2 Pro

LTX rendered the whole scene in black and white, which wasn't prompted but actually works with the urban aesthetic. The problem is there's no breakdancing happening. Nobody in the frame is breakdancing, the central character included. The movement is vague and the model clearly struggles with the specific physicality the prompt requires.

Runway 4.5

Runway's nun has the same leg problem as most of the others, the lower body movement is incoherent. The crowd is a genuine strength here though: good ethnic diversity, natural individual reactions, nobody moving in lockstep. The overall scene composition is fine, but the core of the prompt just doesn't land.

What's Coming in Part 2

Prompts 3 and 4 go in a completely different direction. Prompt 3 tests cinematic aesthetic and creative scene-building, and prompt 4 tests start-to-end frame consistency, which is where things get really interesting. I'll also wrap up with a full breakdown of which model is best for what use case across all four prompts.

Jump to Part 2

Try Sora 2 Via Artlist

Lili Marocsik has tested 400+ AI tools since 2023, back when most of them were more hype than help. Before building this site, she spent years as a video marketer creating YouTube Ads for brands like HelloFresh and Revolut. She started aitoolssme.com because every tool was getting five stars and glowing writeups, but nobody was telling the truth about what actually works. Beyond the site, she hosts the German AI podcast KI Plausch, organizes the AI Enthusiasts Berlin meetup group, and is an active member of Women in AI. When she's not testing tools or running events, she's looking after 30 houseplants and hunting down modern art.