AI Video Is Here, But It’s Not Ready for the Classroom
Google's new AI video generator is incredible, but frustrating to use.
On May 20, 2025, Google announced the next generation of AI media models including Veo 3, a new video generating model that includes audio and speech.
Over the past week, the internet has been flooded with incredible examples of what this new model is able to create.
Videos like this are truly extraordinary. The combination of AI-generated video with speech and lip syncing was incredibly difficult to create just a few weeks ago.
Now, you just describe your video in a text prompt and an 8-second clip pops out with music, sound effect, and speech. From my vantage point, this really does represent a significant moment in AI-generate media and our ability to trust whether the videos we see online are “real.”
Trying Veo 3 for myself…
As an online astronomy professor, I make a lot of videos for my students. So, naturally, I was eager to experiment with this new tool to see what I could create. Would it be possible for me to make anything useful for my students?
But first, in order to access the new model, I had to subscribe to Google’s AI Pro plan (I went for Ultra, actually, to unlock more credits). It was expensive - something like $129 / month for 3 months.
My experience with Veo 3 was, in a word, frustrating.
My first attempt with Veo 3 was just a silly idea. Not for the classroom, but for my own enjoyment. I imagined a young woman on a first date with a local TV weatherman. Instead of making normal conversation, he forecasts the rest of their date as though he is on TV.
I was amazed and delighted by the first few clips that I received from Veo 3, but it took me many tries (and maybe 45 minutes) to get a result that had the look, feel, and sound that I was hoping for. But the real trouble came when I tried to extend the conversation.
I wanted the woman to respond, confused, and then have the weatherman continue his forecast, getting more and more ridiculous.
Unfortunately, the Veo 3 model (the only one that can generate audio) does not yet allow you to extend the clip beyond the initial 8 seconds. Google’s other video generating model, Veo 2, does allow you to extend clips, but that doesn’t include any sound.
Building a more complex scene, then, would require separately prompting a series of “shots” that I could stitch together to form a complete scene. For example, a shot of the man speaking. Then a shot of the woman. Then another shot of the man. But here I ran into more problems.
Since generative AI is inherently statistical and randomized, every time I put the same prompt in, my actors looked and sounded different.

I tried giving more and more detailed prompts, specifying the exact appearance of the actors. But after experimenting for a few hours, I made an important realization.
This is impossible!
Only after experimenting with Veo 3 for several hours did I finally realize what all of those incredible Veo 3 videos circulating the web had in common:
A dynamic cast of characters who all looked very different from one another.
A montage of short clips featuring each character saying something pithy.
Looking at another popular example from the web (below) illustrates the specific use-case where Veo 3 shines - a montage of many brief clips where many different people are speaking. (And it is crazy good at that.)
The problems with Veo 3… (and Veo 2)
So, to review, I’ve encountered two problems with Veo 3:
You cannot extend a clip and continue generating audio.
You cannot readily use the same prompt to generate the same character’s appearance (it is like rolling the dice to see what you get).
But Google’s Veo 2 model is able to extend scenes. While it cannot generate audio, it can take the last frames from an existing video and use that as the starting point for the next clip.
So, I was curious whether Veo 2 can keep a persistent memory of a character’s appearance from one clip to the next. I decided to put it to the test.
I asked Veo 3 to create a video of a character saying hello. I then used Veo 2 to extend this clip, asking the character to turn all the way around. I was curious, would Veo be able to “remember” what the character’s face looked like while he wasn’t facing the camera?
Well, the answer is no. Check out the video below (it’s kind of hilarious).
In speaking with other Veo 3 creators on Reddit, I learned that some of the most popular examples spreading around the web required the creators to spend $500+ on credits in order to generate clips in which the characters looked “close enough” for continuity… just to build a 1-minute video.
That said, the trendline seems clear. AI video generation continues to improve at an incredible pace. One benchmark for measuring the progress in AI video generation is creating videos of Will Smith eating spaghetti (don’t ask me why - it’s an internet thing). The video below illustrates this progress over the past two years.
What does this mean for education?
As a tool for teachers, I don’t see much value in Veo 3. It’s expensive, frustrating, and the results are unreliable.
Even in a niche educational application, like creating videos for e-learning, I think the tool still isn’t useful enough compared to stock footage or just filming a video yourself.
But I still think that Veo 3 is an important milestone that impacts us as educators, parents, and consumers of online information.
We’ve been conditioned over many years to know that we can’t trust everything that we see online. We know that images can be Photoshopped, and that videos can be deep-faked. But it used to take significant knowledge, effort, and expense to create a credible fake (especially fake videos). Today, it just takes a couple hours and about $130. That means much more “fake” content will be generated in the months and years ahead.
It is more important than ever that teachers be able to communicate these developments to students.
As scary as that is (and I think it is scary), I also want to challenge myself to cast a position vision of how this technology could impact education for good. I have a few ideas, but I’m really curious to hear what you think. Can you imagine any positive outcomes for education from this kind of technology?
Here’s an idea: Choose your own teacher…
About a decade ago, I was building an online algebra course when I discovered an amazing set of educational videos. What made this collection unique was that every lesson topic had three different versions of the same video lesson - each one made by a different teacher.
Students could choose which instructor they wanted to learn from.
Maybe that’s one positive use case for AI generated video in education. What if students could choose from an array of AI characters or historical figures to be the narrator or guide through their online curriculum?
Here’s a little vision of what that could look like.
Tell me what you think in the comments: Is this a positive vision for the future of education or a nightmare?