Now Hear This: The Rise of the Spoken Article

Credit: DALL-E

We’re wrapping up yet another Tech Week here in NYC (it’s like Amazon Prime Day — it happens whenever it wants to happen), and AI is of course in the air. I attended an excellent discussion about the use of synthetic voice in news media, exploring both the practical and the ethical sides of the trend, which I talk about extensively below.

Before I get to that ,a few things: First, welcome new subscribers from the Daily Mail, the CBC, CNN, Foothill Ventures, Adobe, and more. The cross-section of online and broadcast media, tech, and business subscribers tells me I’m doing my job of bringing valuable insights that connect all those areas, like my comprehensive overview of how to think about the problem of deepfakes from earlier this week, which just peeked out from behind the paywall (how about that?).

I also took some time earlier this week to share my perspective on how AI isn’t just changing media, but also how public relations interfaces with media in a Q&A with Kersa Haughey of Ink Communications. We talked about the new AI-driven ecosystem that’s emerging as Google deploys AI Overviews, what the consequences are of OpenAI making deals with publishers, and how companies and brands who cultivate relationships with media should think about all this. I encourage you to check it out.

And finally, if you’re in PR and looking to elevate your knowledge and skill with AI, please consider attending The Media Copilot’s upcoming class on AI Fundamentals for Media, Marketing, and PR, happening June 20. THIS MONTH ONLY, the class will focus specifically on skills, prompts, and tools curated for PR work. Sign up before June 10 with the code AISPRING for a 50% discount.

Now we’ll just pay one more bill, and on with show.

AI Scams Are Rising. Here’s How You Can Protect Yourself.

Scammers just got even more dangerous thanks to AI. It’s become incredibly easy to copy someone’s voice or create a deepfake. Many cases have already been reported of people impersonating family members asking for money.

Here’s how you can help prevent that: Incogni is a personal data removal service that scrubs your personal information from the web. Incogni:

Protects you from identity theft and scammers taking out loans in your name.

Prevents strangers from buying your personal information on search sites.

Get 55% off with the code COPILOT. And if you’re not happy, get a full refund within 30 days.

Try Incogni

The power of the spoken word is something media has traditionally been great at harnessing.

From news to radio dramas to podcasting, there is something unique to listening to another human speak to you. Even though most voices are pre-recorded, there’s a way the human mind engages the imagination that creates a connection between the speaker and the listener in a way that’s fundamentally different from other media. It’s a big reason that podcasts experienced a surge in popularity in the early 2010s, one that shows no signs of abating.

But what if the voice speaking to you isn’t human at all, but a wholly convincing simulation? That’s exactly what generative AI technology does, and it’s not just a theory or a promise. It’s happening. Companies like ElevenLabs have risen in the last couple of years, enabling users of its tech to generate scripts and create artificial voices — complete with inflection, emotion, and personality — much more cheaply than previous solutions.

The tech has caught the attention of media companies, and some major publishers are applying it to transform their primary product — text — into audio. The Washington Post now includes spoken-word versions of their articles on its website and app, and CNN has experimented with voice articles, too. The New York Times began rolling out AI-generated spoken articles earlier this year. You might say this is media’s way of going multimodal.

Speaking to Readers

Whereas earlier applications of voice tech to “read” articles were primarily about assisting the visually impaired, the new push is more than that. Although it’s early days, there is clearly a product intent this time – one that leverages the ability of podcasts to captivate listeners (as well as drive high CPM ad rates) and applying it, at least in part, to content at scale.

“We’re all super busy people… how much time you want to read news is probably very different than the amount of time you actually have to read news,” Noa Yadidi, Senior Product Manager at The Washington Post, said at the Tech Week event, hosted by ElevenLabs. “But when you’re listening to audio, you get to also work out, or go to the grocery store, or clean your house. Now the Washington Post is accessible to you in those times.”

Noa Yadidi, Senior Product Manager of The Washington Post (left), and Christine Yi, CNN’s Head of Business Development and Strategy, discuss the use of AI-generated voices in news media. Credit: Pete Pachal

You can easily imagine the Post or the Times curating a set of key articles into a playlist that summarizes the news of the day, or gives a complete overview of a meaty news topic. You could even use AI to automatically create personalized playlists for subscribers, packaging them together into a daily news audio package they could listen to like a podcast.

Because the voice is synthetic, and the nature of news content is less “personal,” you won’t get quite the same connection that I alluded to in the introduction. But you do get something different — and presumably more valuable — than just a click. As CNN’s Christine Yi noted at the event, pushing play on an audio article demonstrates a level of engagement greater than scrolling another half-inch down the page.

“When people choose to click to listen, that’s like a massive dedication of time,” said Yi, who is CNN Digital’s ​head of business development and strategy. “I’m going to dedicate my time to listening to this for at least 10 minutes, maybe an hour. That’s way more engagement than deciding to read an article for a minute, two minutes, three minutes. It’s an incredible engagement tool.”

Subscribe now

Finding the Right Voice

Once you start down the voice path, there’s an obvious question: whose voice do you use? After much discussion, Yadidi and Yi said, both their publications look for a voice that sounds “neutral,” one that won’t bias or distract the listener. That’s a little different from the approach taken by the Kevin Systrom-backed news aggregator Artifact (which shuttered earlier this year). That app famously gave users the option to listen to news read by AI-generated voices that resembled celebrities like Snoop Dogg and Gwenyth Paltrow.

By contrast, the Times and the Post don’t give the reader a selection of voices. But that doesn’t mean all articles sound the same – when a Post reader taps the Play button, they’ll hear one from a selection of four different ElevenLabs voices the Post created, two male and two female.

Outside of its synthetic voices, The New York Times sometimes has reporters read articles in their own voice, recorded the old-fashioned way. So far no publication has made a move to do what seems like the obvious next step: cloning the reporter’s voice and applying it to all articles they’ve written.

“It’s one of those things that our newsroom doesn’t feel quite comfortable doing yet,” the Post’s Yadidi said. “We sort of kicked around that idea, but it hasn’t really been something that our users have asked for. So it hasn’t really been something we’ve discussed much further.”

Turning Up the Dial on AI Voices

The big question: Is voice here to stay? When I was Chief of Staff at CoinDesk, we considered offering voice articles, but the intent was to be assistive, and our research showed most people with visual impairments already had technology for reading articles out loud. Without that purpose, the idea was shelved since there was no other long-term vision.

Now there is. As more readers find and listen to more news articles, the next step will be to “productize” voice. The challenge will be to layer advertising onto the audio experience in a way that isn’t too intrusive and ends up being more lucrative than what the publication would otherwise have gotten through ads on the page.

Certainly, such a model won’t work for all content. After all, how do you click on an affiliate link while listening to an audio review (sorry, Wirecutter)? And editorial teams will likely demand veto power over presenting a voice option on certain articles, which is part of the Post’s process, according to Yadidi.

Voice articles aren’t new, but with generative AI, they’re newly scalable. That’s creating opportunities for deeper engagement, more loyal audiences, and maybe even a little money. At a time when media companies are largely getting swept up in unbridled progress of AI, spoken articles could be a step toward taking back control.

The Media Copilot is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Ready to start using AI like a pro?


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.