Missing the Point on AI Safety

Credit: DALL-E

The popular conversation around AI safety — which tends to polarize around “slow down” and “get real” camps with — has always struck me as oversimplified and off target. I think it’s one area where the media’s role in shaping that conversation could use some adjustment.

A recent government-commissioned report that warned about AI posing an “extinction-level” threat to humanity has been a flashpoint in the ongoing safety debate. While I don’t have an opinion on the report itself, or the weighty national-security issues it explores, I do think all the consternation over AI safety tends to put an outsize focus on the safety features of the models themselves rather than the rest of the information ecosystem they live in.

Danger Zones

So, some level-setting. There’s a lot of information out there, and large language models (LLMs) ingest it all: the good, the bad, the extremely ugly. Without safety features, something particularly dangerous — say, detailed instructions on building a “dirty bomb” — could be coaxed out of the model via simple prompting. The transformative features of generative systems can be misused as well, putting the power to create offensive imagery in the hands of everyone.

That may sound bad, but you could have written virtually the exact same thing in the early days of the internet and photo-editing software. And people did, but nobody was making the case to build content filters directly into fundamental internet protocols or the JPEG codec. That’s effectively what LLM developers are being asked to do: put in blockers to prevent models from doing exactly what it was designed to do — serve up the best output that satisfies the prompt.

In this way, AI companies are victims of their own success. Since they tend to suck up all the oxygen (and funding) in the cultural conversation around AI, they naturally get all the credit… and the blame. When something like Sora debuts, it’s mind-blowing emojis all the way down. But if someone coaxed a model into creating offensive images, they’re roasted all over social media, not to mention Capitol Hill.

Whether it’s fear of bad press or a genuine desire to protect users, AI companies have aligned and fine-tuned their models in the name of safety. In the case of “forbidden knowledge,” like how to create a weapon or give access to copyrighted content, this is a very difficult thing to get right: safeguards to prevent responses serving up “unsafe” output have a good chance of also blocking or muddying the underlying information that fuels that output, making them less effective for benign uses.

Subscribe now

A Recalibration

And all that attention on safety may not even make much difference. A recent piece at the AI Snake Oil Substack persuasively makes the case that safety concerns around AI systems spouting potentially dangerous knowledge should be directed downstream from the LLM. In many cases, safeguards would be more effective at the places where that knowledge is practically applied, whether it’s email providers scanning for automated phishing emails or the inherent physical resources needed to deploy any kind of bioweapon.

None of this is to say safety shouldn’t be a factor for companies developing LLMs. They have a role to play, and to the extent they can discourage bad outputs while not compromising good ones, they should act.

But they are not the only, and possibly not even the most important cop on this beat. Lawmakers, law enforcement, the people developing systems for applying AI, and yes, the users all have responsibilities, too.

The media largely shapes the conversation around AI safety, and journalists are understandably scrutinous of the multibillion-dollar companies shaping the industry and our future. But the myopic focus on foundational models on the safety question has created unrealistic expectations and let other parties evade responsibility. A recalibration is in order.

Missing the Point on AI Safety

Danger Zones

A Recalibration

Ready to start using AI like a pro?

Comments

Leave a Reply Cancel reply