What the AI World Can Learn From OpenAI’s Sora Debut

More and more, people are starting to understand that what comes out of a generative AI system isn’t a finished product. A lot of that has to do with the disastrous results that often occur when people try to go from output to content, skipping (or greatly neglecting) the step where humans need to check the robot’s work.

But every now and then, the AI industry helps, stating or demonstrating the limitations of the seemingly magical systems that instantaneously create content — text, images, video, and more — on command. In its announcement of Sora on Thursday, OpenAI did exactly this: It didn’t just unveil an impressive new product; it talked about what the tool does wrong almost as much as what it does right.

In case you missed it, Sora is an AI video generator. From simple text prompts, Sora can create a short but extremely realistic-looking video of what you specify (at least it’s supposed to — OpenAI hasn’t made it widely available yet). If you spend a few minutes browsing the examples on OpenAI’s debut page for the tool, I think it’s hard to come away thinking the results are anything other than incredibly high quality. Just look at the first one: a synthetic video of a woman walking down a Tokyo street at night. The page is littered with similarly breathtaking examples.

Subscribe now

But after you browse for a minute or two, you start to see the weaknesses in these videos. You see them because OpenAI explicitly points them out, describing not just what Sora does well, but what it struggles with. One video of wild gray wolf pups, for example, shows the group spontaneously multiplying in number at various points, with the caption underneath pointing this out as a common problem when creating videos with several animals (short sample shown below). Similar caveats are on several of the videos.

As a general rule of tech releases, the idea of pointing out the flaws of a product when you announce it is unheard of. But OpenAI doing it here is a very wise move, since it allows the company to capitalize on the amazing visuals that Sora creates while being honest about their problems.

I would also argue this kind of transparency is healthy for the AI ecosystem. It simultaneously shows the promise of AI and what humans need to do to leverage it. Yes, AI can create robust “first drafts” of content, but then a real, live person needs to take that draft and perfect it. In the case of text, it’s often to correct the occasional made-up fact. For video, you’ll need to drag-and-drop those extra pups to the trash.

Not a video editing expert? Don’t worry: software is getting better about that, too, with tools like Generative Fill in Photoshop simplifying the path to turn that first stab from the AI into something usable. For the best quality, you’ll still need experts, but everyone else will get much closer than they could before.

This democratization of video generation has big implications, and Sora (and similar tools, like Pika) have the potential to up-end our information ecosystem in positive and negative ways. But its debut sets a good precedent, and shows that you don’t have to sacrifice the “wow” factor of your new toy by being honest about its problems.

Ready to start using AI like a pro?


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.