What’s So Concerning About the Perplexity Mess

Credit: DALL-E

It’s been a wild week for Perplexity, with accusations flying from various publications that are accusing the “answer engine” of some pretty serious stuff. But I think most stories are missing the most far-reaching implication of the whole affair.

More on that in a minute, but before I jump in, a quick shout out to anyone on Long Island, NY: I’ll be speaking at Crypto Mondays in Westhampton this evening around 7 p.m., chatting about our collective negative bias against AI writing. If you’re around, swing by and say hi. I might even have something fun to show you.

On Wednesday, you can catch me at a webinar from Local Media Association at 2 p.m. Eastern Time, where I’ll be talking to John Sumpter of Nota about The Power of AI in Storytelling. If you’re part of a smaller newsroom, I think you’ll get a lot out of the session.

OK, just one more thing before the main event: GO OILERS!

Now, let’s talk about Perplexity… as soon as I pay a bill.

AI Scams Are Rising. Here’s How You Can Protect Yourself.

Scammers just got even more dangerous thanks to AI. It’s become incredibly easy to copy someone’s voice or create a deepfake. Many cases have already been reported of people impersonating family members asking for money.

Here’s how you can help prevent that: Incogni is a personal data removal service that scrubs your personal information from the web. Incogni:

Protects you from identity theft and scammers taking out loans in your name.

Prevents strangers from buying your personal information on search sites.

Get 55% off with the code COPILOT. And if you’re not happy, get a full refund within 30 days.

Try Incogni

The honeymoon is over for Perplexity.

If the AI company ever enjoyed any sort of favored status from journalists in the news media, it looks like that’s over. After Forbes accused the so-called answer engine of plagiarizing a story about secret tests of a drone startup backed by former Google CEO Eric Schmidt, Wired tagged in and came out swinging, calling Perplexity a “bullshit machine.”

There’s been a lot of back and forth over this, and I won’t go over all the details here. (Casey Newton over at Platformer has a pretty good rundown.) The wrongdoing Perplexity is accused of, as I see it, comes down to three things:

Plagiarizing others’ articles, both with its regular summaries as well as with its new Pages feature, which lets users customize and publish the answers Perplexity gives them as new, shareable web pages.

Ignoring the standard warnings that sites use to signal to web crawlers like Perplexity’s that its contents is off limits (the company’s CEO denies this, it should be noted).

Straight-up making up stuff with its answers, and attributing those hallucinations to real news sites.

That sounds bad, but most of it isn’t surprising. For starters, AI chatbots making stuff up is par for the course at this point. It should be clear to anyone using one of these large language models that hallucinations — where a chatbot imagines untruths and states them as fact with confidence — are a statistical reality. In other words, all LLMs are bullshit machines.

There’s an extra bit of nuance here when part of the fib is blaming a news source. Putting aside the legal technicalities of the matter, it’s hard for me to see a news source suffering long-term reputational damage for any theoretical incident from someone believing a lie that a chatbot told them. Just like your Aunt Bea telling you “at least that’s what I heard,” it would pretty clearly be the AI’s fault.

The plagiarism is potentially serious, and what Perplexity CEO Aravind Srinivas has said in the wake of the accusations — that his product has “rough edges” —  sounds like a weak defense. That said, this feels like a very solvable problem. Like The New York Times lawsuit against OpenAI, I think Perplexity will skirt around this by adjusting its language processing so directly copied or lightly paraphrased passages from the original article won’t appear in answers. The Pages feature might need to be rethought or recalled.

Subscribe now

The Real Perplexity Scandal

But remember what I said: the concerns are only “mostly” not surprising. What should concern everyone in media is Perplexity brazenly — and allegedly — ignoring the standard signal that websites use to tell crawlers to back off: robots.txt.

Most every website has a robots.txt file that’s invisible to people, but important for web crawlers. The file tells crawlers whether or not the owner of that site allows its content to be indexed for search engines like Google, research tools like Common Crawl, and AI companies hoovering up the internet for training data like OpenAI.

To be clear, Perplexity is more search engine than AI company; its technology uses a mixture of large language models and plugs them into a search engine designed for AI. You might think that makes things a bit gray, and you might be right — at least from a legal perspective. Respecting robots.txt preferences is not a technical requirement; web crawlers can choose to ignore it at will. While media websites can track the IP addresses that access their content, taking action against any of them is expensive, and it’s difficult to verify who is behind any single actor.

But the honor system works because of the very scandal we’re talking about: When you violate it, and you have a public-facing product, the results are there for everyone to see. Perplexity is clearly indexing articles that media sites are telling it not to. In a true facepalm moment, when Wired asked Perplexity about the article with the original accusations it had just published, Perplexity had a summary that would have been hard to create without access. By contrast, ChatGPT — which OpenAI claims respects robots.txt — said it couldn’t access the article, Wired claimed.

If something good can come out of this particular AI scandal, it’s that content owners might better understand just how much robots.txt runs on mutual trust. While Perplexity appears to have been caught red-handed here, it’s not difficult to imagine any number of actors (who probably don’t have big, public consumer products) hoovering up off-limits content on the regular.

In the past, that reality might have been met with a “so what?” But in this new era of media, where content — especially the human-generated variety — has suddenly become valuable raw material for the AI industrial complex, the rules matter more than ever before. And it’s very apparent they could use some teeth.

The Media Copilot is a reader-supported publication. To receive new posts and support The Media Copilot, consider becoming a free or paid subscriber.

Ready to start using AI like a pro?


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.