How Reporters Can Leverage AI to Track Their Beat by Pete Pachal
AI tools like AskNews, Sprout, and Google Pinpoint can help keep up with news, analyze sentiment, and speed up research.
Read on SubstackThis post originally appeared in The Media Copilot newsletter. Subscribe here.
How people will search the web with AI — and how content creators will get paid — looks a lot clearer today.
That’s because of two notable developments. First, OpenAI teased its long-in-the-works search product, SearchGPT. Labeled a “prototype” and only available to a select few testers (I, along with a few million friends, am on the waitlist), SearchGPT pairs a search engine with the power of AI to summarize answers — even about current events — not just serve up a list of links.
If that sounds a lot like what Perplexity does, you’re right. The company behind the so-called “answer engine” has some news of its own that might be even more influential in defining how AI search works going forward: the official unveiling of the Perplexity Publishers’ Program, which opens the door for advertising revenue on the service and establishes a system where Perplexity will share that revenue with publishers and content creators. The publishers also get free access to both Perplexity Pro as well as Perplexity’s APIs so they can create AI search bots for their own sites.
Taken together, the two announcements confirm AI-powered search is a market that’s highly competitive — not just for dollars but also for headlines contemplating who’s going to be the “next Google.” (It’s also doubtful the near-simultaneous timing of the announcements is a mere coincidence). And since they’re coming from two of the biggest players in AI, they point toward a new economy emerging around AI search, one that includes a key party: publishers of original, human-authored content.
Perplexity’s announcement comes just a few weeks after both Forbes and Wired publicly criticized the company for not attributing its answers properly as well as the quality of those answers. In the resulting firestorm, Perplexity revealed the existence of the partnership program. Now we have details, including confirmation that the fuel of the new AI search economy will be — wait for it — advertising. The key to the new program is revenue sharing: If a publisher’s content is used in any particular answer, it will get a cut of any ad revenue generated by that answer.
Initial partners include TIME, Der Spiegel, The Texas Tribune, and a few others. Automattic is also a launch partner, which is notable since it extends the option to share revenue to smaller publishers with WordPress.com domains (a small subset of sites that use WordPress, the blogging software that powers half the web). In an interview that The Media Copilot will publish later this week, Perplexity Chief Business Officer Dmitry Shevelenko told me Perplexity plans to scale the program and ultimately to make it “self-serve” for publishers.
The scalability of Perplexity’s approach is where it diverges from OpenAI’s. OpenAI has been signing deals with publishers left and right for months, but they’re all with major media companies, including News Corp, TIME, Dotdash Meredith, Axel Springer, Reuters, and several others. If you’re not of a certain size, you’re not on the radar. For smaller publishers, that doesn’t leave many options outside of blocking your content from crawlers, sue, or both.
There are, however, a handful of startups trying to create ways for smaller pubs to get paid by creating marketplaces where they can “sell” their content to AI engines. TollBit, for example, creates what amounts to a paywall for web scrapers — if they want to crawl your content, they have to pay up. Dappier, which recently announced $2 million in seed funding, is another solution.
Now Perplexity has entered the chat. It’s unclear if the partnership program would be compatible with toll-based payment solutions, and there’s a good chance that — in whatever economy emerges out of all this — publishers will need to implement both. With licensing revenue from the creators of foundational models, ad partnerships with AI search engines like Perplexity, and toll revenue from other AI scrapers, balance sheets at media companies might get pretty complicated in a few years.
Let’s be clear: That’s a good problem. Given that the media industry has been cratering over the past year due to referral traffic drying up and a lousy ad market, a picture of new future revenue is welcome — even a hazy and undeveloped one.
One player that’s been strangely absent from the new economy around AI search is Google. Of course, Google is pushing hard into its own AI-powered search feature, AI Overviews, but so far it has given no indication that it ever intends to create or participate in a new way to compensate publishers or content creators. The company has, in fact, shown open hostility to solutions that would require Google to pay for what appears in search results (to be fair, many of the proposed solutions involve the long arm of the law acting as an enforcement mechanism).
Given that Google has built a multi-trillion dollar company on being able to freely crawl the web to power search engines, you can see why it’s not in a hurry to upset that status quo. But that obstinacy here may be shortsighted. From its perch on top of the search market, Google may not see clearly that Perplexity, OpenAI, and others are beginning to define how the AI search economy will work. And they’re doing it by actually involving publishers rather than simply expecting them to surrender to a new reality.
Let’s not get deluded: The hill to climb to become the next Google is high — perhaps insurmountable. What seems to be clearer by the day, though, is whoever wins the future of search will need a system that doesn’t just provide the best answers, but also the means to encourage more high-quality information. And right now the energy around that idea is in companies that don’t start with G.
This post originally appeared in The Media Copilot newsletter. Subscribe here.
In the struggle between content creators and the AI builders who scrape that content for their tools, a new front has opened up. Instead of targeting the tech companies that create and manage the AI models that use their data, they’re going after the data itself.
AI companies rely on public data sets — huge compilations of content scraped from the internet that had previously been used mostly for research — the most popular one being Common Crawl. Although training AI wasn’t the original purpose of these information troves, they quickly became go-to sources for training data for most of the major models.
The thing is, if you’re a content creator, you probably don’t mind your work being used for research, and you probably see the benefit of having your data crawled since it’s the same process Google uses to index the web and then link people to your content in Google Search.
AI changes that calculation. Now, if the AI gets access to your content but just ends up summarizing it for users of that AI, there’s essentially no benefit to the content creator for having your data crawled. You may as well opt out.
That seems to be the logic of publishers such as The New York Times, a whole bunch of media outlets in Denmark, and other publications, all of whom have requested their data to be removed from Common Crawl, according to Wired. Apparently, the data harvester had never received a request for removal prior to 2023, but now is fielding a bunch of them. Although there is a case to be made that the public nature of the data makes Common Crawl’s actions legally defensible, because the organization is a nonprofit and can’t withstand any lawsuits, it’s simply complying with every request for deletion.
It’s an understandable move on the part of the publishers. However, outside of big organizations like the Times, which has massive brand recognition and a very successful business, it strikes me as a bit suicidal. Common Crawl feeds all kinds of purposes besides providing AI with training data, and it helps power several search engines besides Google. Deciding to opt out to end-run the threat of AI is a classic “cutting off your nose to spite your face” situation.
It’s also a dead end, in my view. The fact is, AI is a part of our media ecosystem now, and there’s a simple reason why: Summarization is simply an attractive product for many news consumers.
The whole act of searching for a topic, getting some blue links, and then clicking on a bunch to form your impression of the right answer was never the most efficient process. AI is actually an excellent intermediary here because it does that work for you, removing friction in the process. Even though, yes, AI sometimes hallucinates, it’s a lot less likely to do so when it’s adapting existing text instead of going into its knowledge base to write something “original.”
But the point is summarization, whether done by a chatbot or automatically, isn’t going away and can only grow. Artifact might have left the building, but its generative takeaways live on in the new Yahoo News app, not to mention news aggregators like Otherweb and a new player on the scene, Particle. Perplexity, Google’s AI Overviews, and the coming ChatGPT Search are all moving in this direction. It’s inevitable that there will be a large and growing portion of news consumption that exists at the AI summary level.
Publishers need to understand that in order to adapt their content strategy to this new world. Does that mean providing your content essentially “free” to AI summarizers in order to be a part of that world? Maybe not in every case, but isolating your content from the AI summary-industrial complex by opting out of data sets like Common Crawl is a defensive move, one that won’t work without also playing offense.
What does that mean? For larger publishers, we already know: signing deals with AI builders like OpenAI, which is in turn altering incentives in the marketplace. But if you’re a small to midsize publisher, that’s probably not an option, but there are other ways to take control of your own destiny.
Rather than simply surrendering to the mercy of AI models and aggregators, you can start building AI experiences into your own platform. While that won’t affect any external forces, it will start pointing your content strategy in the right direction, and get your operations to start indexing toward the unique value your content provides in the marketplace.
What would really move the needle, though, is a marketplace for publishers to get good value for their content from those interested in summarizing it for audiences. That’s exactly what TollBit is trying to do. It’s slow going, but with more publisher participation it could become the offensive complement to the defensive move of opting out of data sets like Common Crawl.
Right now there’s a lot of fear, uncertainty, and doubt around what AI summarization will do to the media world. One thing is not in doubt, though: It’s happening, and content providers need to adapt. Standing still isn’t an option.