What Perplexity Gets Wrong About Aggregation

It’s been a lousy couple of weeks for Perplexity.

Ever since the AI-powered “answer engine” was caught republishing significant parts of a Forbes scoop on a drone company backed by former Google CEO Eric Schmidt — including the artwork — the media have been in full pile-on mode. Wired did a full inventory of Perplexity’s sins, which included bypassing a common internet standard and getting answers wrong to such a degree that the publication declared Perplexity a “bullshit machine.”

More recently, the Verge said what Perplexity is doing amounts to “grand theft AI,” dissecting recent comments from CEO Aravind Srinivas on the Lex Fridman podcast and accusing him and his company of pillaging copyrighted and paywalled information, or at the very least turning a blind eye toward the practice.

It’s pretty wild how quickly the media turned on Perplexity and made it the AI bogeyman of the moment. Just a couple of months ago, when the company secured a $63 million funding round, there was the normal breathtaking coverage of how AI companies are fast rising, and lots of wondering if Perplexity could ever become the next Google. In the AI classes we teach at The Media Copilot, we almost always show off Perplexity and its strengths as a research assistant.

(story continues below)

The Media Copilot’s upcoming AI Fundamentals class on July 18 will zero in on marketing and design. It will incorporates the latest developments and a whole suite of tools for leveling-up your work.

The class quickly covers the basics and dives into practical applications — prompt frameworks for specific workflows, instruction on using ChatGPT’s and advanced features, and several platforms hand-picked for their utility in marketing and design. Did we mention the class is 100% live?

Sign up for the July 18 AI Fundamentals class here, and use the discount code AIMARKET before July 12 to save 50%.

Learn more about the class

Now, Perplexity has essentially usurped OpenAI as public enemy No. 1 in the eyes of the media (that New York Times lawsuit feels so long ago, doesn’t it?). That likely won’t slow it down — the company reportedly has another SoftBank-led funding round in the works that values the company at $3 billion — and in the wake of the recent media scandal the company’s chief business officer revealed it has licensing deals with media companies in the works, which may help rehabilitate the company’s image among journalists and media executives.

Whatever happens to the company, I think it’s worth examining what Perplexity got wrong here with respect to aggregating stories and how it compares to how humans do it. From there, I’d like to reverse-engineer a best practice for Perplexity and other AI-powered “answer engines” when the information they’re summarizing is a scoop or something original behind a paywall.

Subscribe now

The Subtle Art of Aggregating Scoops

For the purposes of this thought exercise, I’m putting aside the accusations of direct plagiarism. Those are egregious, no question, but my goal isn’t to point a finger that so many others already have. The general consensus (indeed, the law) declares this a no-no, and Srinivas’s comments seem to indicate Perplexity will adjust what it does to prevent outright copying. (It will of course still need to be held accountable for any previous acts.)

But even if Perplexity had been careful to rewrite the story so the language was sufficiently different from the original, there’s still a question around what AI-powered aggregation should look like. Since AI summaries are unquestionably useful and their use will certainly grow in the future, getting this right is important.

Credit: Freddy Kearney, Unsplash

The original story at the source of the scandal was behind a hard paywall, meaning anyone who wanted to read it needed a paid subscription. Many news sites also use metered paywalls, which allow for a certain number of “free” articles every month. Regardless of the type of paywall, the article is copyrighted to the publisher. However, the information within isn’t, which means it can be re-reported.

That’s exactly what other publications do, especially with scoops. This of course is the act of aggregation, and pretty much everybody does it. A reporter at Site B will access the paywalled article on Site A via a subscription, then write a summary for Site B’s audience (typically with no paywall), crediting the original reporting and linking to it on Site A.

It’s conceptually simple, but there’s a set of unwritten rules that publications adhere to when re-reporting a scoop or piece of original reporting. Those rules are rooted in respect for the original journalists because getting worthwhile scoops is hard work.

For starters, the headline will make clear the story was originally reported somewhere else. Often the first word in the headline is “Report,” followed by a colon, and if the publication is well known, it’ll sometimes get name-checked (e.g. “The Information:”).

The story will also include a prominent credit and a link to the original story in the lede (i.e. the first couple of paragraphs). There’s careful use of language here, even when the re-reported story has significant new information. Typically you’ll see the words “first reported by” to credit to the first story so the reader understands that it’s the original source.

Finally, for a purely aggregated story, there’s a general practice of not re-reporting the entire thing, blow by blow. The article on Site B is usually significantly shorter, summarizing just the main takeaways and most news-making quotes. This encourages readers to go to the original article to get the entirety of the report.

Subscribe now

Can AI Aggregate With Respect?

These standards — again, created by journalists to give deference to real journalism — are so  common and hard-wired into newsrooms that they’re muscle memory for most reporters and editors. While sometimes an aggregated story can garner more page views than the original (usually due to stronger SEO), that’s a quirk of platforms and domains. It’s generally tolerated as an occasional, if unfortunate, byproduct of the practice, not the way the system is supposed to work.

Credit: DALL-E

AI engines, however, obviously have no sense of deference. They don’t discern between a new scoop on a timely topic and an Life magazine article from 1973 — it’s all just information. And their programmers have clearly not taken into account the subtle art of scoop attribution.

Could they? Is it possible to teach an AI to understand what a scoop is, and train it to attribute the information correctly? It should be. Since the rules are pretty straightforward, it would mostly require some fine tuning: You would need to define what a news site looks like, establish a way to detect where information was first reported (detecting the “atoms” of journalism), and a get clarity on the rules (prominent attribution, careful language, and not doing a comprehensive rewrite).

The big questions: Would that make what Perplexity does OK? Considering it would be doing exactly what humans do, it certainly seems like there’d be less to complain about. An AI rewrite would also shrink the window of exclusivity: While it usually takes a little time for aggregated stories to appear, a machine could read, process and republish an aggregated version far faster than a human could.

That said, if the link in the AI summary is prominent, the speed should serve the original source even better. For a news site that would mean, instead of just your own platform, everyone on Perplexity or ChatGPT Search or Google AI Overviews would see your story almost immediately.

Ultimately whether the media regards AI aggregation as “OK” depends greatly on the portion of those readers who, based upon the aggregated summary, then decide to have a direct relationship with the publisher instead of just the AI intermediary. As with the current system that pairs aggregation with search, what you give up in terms of exclusivity you make up for in reach.

Will AI aggregation serve the same goal? We don’t know the answer yet, but if the decisions that product designers make align with the respect that journalists have traditionally held for each other, it’ll at least have a shot.

The Media Copilot is a reader-supported publication. To receive new posts and support The Media Copilot, consider becoming a free or paid subscriber.

Ready to start using AI like a pro?


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.