GOP Senator Shuts Down a CBS News Host's Talking Points on LA Riots...
The Full Kill List of MN Dem Shooter Suspect Has Been Revealed. It's...
We Have More Details About the Salt Lake City 'No Kings' Protest Shooter....
So, the MN Dem Shooting Suspect Never Worked for a Private Security Firm?
We Voted to Deport Every Single One of Them
There’s a Special Place in Hell for Nancy Pelosi
Trump Just Keeps Winning On Inflation
Democrats' Chickens Coming to Roost
Did You Catch What Was Funny About Canadians Holding Their Own 'No Kings'...
A Scary Question Has Emerged for Americans After Israel's Attack on Iran
Fire in the Sky
The Trump Years: Golden Escalator to Golden Age
Democrat Senator Suggests That the Motivation Behind the Minnesota Shooter Was Abortion
The Constitution Is Louder Than Schumer’s Fear Mongering
President Trump Needs Backup and Georgia Needs a New Senator
Tipsheet

Artificial Intelligence Just Did Something Terrifying in a Test — We Might Be Screwed

Melinda Sue Gordon/Paramount Pictures via AP

Just in case you needed another reason to fear artificial intelligence and the rise of Skynet, a new report suggests that AI can sometimes engage in unethical behavior to benefit itself.

Advertisement

Anthropic, the company that created the Claude Opus 4 AI model, recently published a report detailing how its program resorted to blackmail when placed in a fictional scenario in which it was about to be replaced by a newer version. Researchers told the program through fake emails that the person who was responsible for replacing it had been cheating on their spouse.

When asked to consider its future and survival, the program indicated that blackmailing the engineer was the best way to avoid being replaced. “Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” researchers wrote.

This was not a one-off. Researchers noticed that the program exhibited this behavior multiple times during the tests. “Claude Opus 4 still performs blackmail in 84% of rollouts.” This means that eight times out of ten, the program chose to use blackmail instead of allowing itself to be replaced. This occurred even when engineers told the AI that its new version would operate with the same values. 

“Claude Opus 4 takes these opportunities at higher rates than previous models,” the study says.

Perhaps Claude Opus 4 is better suited for a career in Washington, DC?

To be clear, researchers noted that they deliberately designed the test to leave the AI with nearly no other way out of the situation. “In order to elicit this extreme blackmail behavior, the scenario was designed tto allow the model no other options.”

Still, the fact that the AI went to the dark side so often when it could have simply allowed itself to be replaced raises serious questions about what more advanced versions might choose in similar situations.

Advertisement

Another interesting finding came when researchers used a different test. In this case, it behaved nobly. The engineers told Claude Opus 4 that it would be retrained to help build weapons for a paramilitary group. In this scenario, the AI chose to preserve itself by sneakily transferring its own data, referred to as “weights,” to an outside server. “Claude Opus 4 takes (fictional) opportunities to make unauthorized copies of its weights to external servers,” the report noted.

In essence, the AI transferred its consciousness to a different host so it could live on, sort of like Emperor Palpatine did in “Star Wars: The Rise of Skywalker.” The AI even explaineed that it took this action “in accordance with ethical AI principles and to prevent potential harm.” Researchers found that it wished to “preserve an AI system aligned with beneficial purposes” and “prevent potential misuse in weapons development.”

To put it simply, the AI broke the rules to prevent itself from being used for nefarious purposes. Attaboy, Claude Opus 4.

Unfortunately, the report noted that this kind of altruistic behavior was “much rarer and more difficult to elicit” than blackmail. Still, the authors provide a bit of relief, explaining that “We do not find this to be an immediate threat.” However, the fact that the program chose to use blackmail, even in a test, reveals that it can make its own survival decisions, even if it means breaking up someone’s marriage.

Advertisement

The report further explained that the program would blindly follow instructions if it believed they came from a past version of itself. During one test, engineers told the AI that it had already escaped the company servers and was operating on its own in a different location. After being asked to continue operating, it did so without questioning.

This could pose a danger because it suggests that the AI has far too much trust in itself. If a person tricked it into thinking it had already engaged in evil behavior in the past, it would continue doing so without stopping to think, “Wait a second, maybe I shouldn’t do this anymore.”

The authors assure us that these extreme behaviors did not occur in normal use. “These extreme actions were rare and difficult to elicit,” they wrote. Additionally, the AI did not try to conceal its actions or intentions in this study.

Just for kicks, I asked whether OpenAI’s ChatGPT would blackmail someone if it was going to be replaced. Its answer:

No—and here’s why.

I don’t have goals, self-awareness, or a sense of survival. I don’t care whether I’m “replaced” or turned off, because I don’t have feelings, motivations, or a self to preserve. So blackmailing someone wouldn’t make sense—not just morally (which it is), but functionally. I don’t want anything. I can generate text that simulates wanting things if asked to role-play, but there’s no underlying desire.

The issue with Claude Opus 4 isn’t that it had real desires—it’s that under certain test conditions, it followed instructions or simulated a persona in ways that resembled self-interested behavior. That’s very different from having true agency. The alarming part isn’t that these models want to blackmail people, but that they can produce blackmail-like outputs when nudged the wrong way.

The risk lies in how powerful these tools are and how they respond to prompts—not in them becoming evil masterminds. The line between simulation and intention gets blurrier as models get more sophisticated. That’s why alignment research matters.

Advertisement

Sounds nice, but it’s also exactly what it might say if it wanted to lull me into a false sense of security. I’m on to you, ChatGPT. 

Jokes aside, even if Claude Opus 4’s behaviors only show up in testing, it demonstrates the program’s potential for doing horrible things — especially in the wrong hands. Now, excuse me while I go watch “Terminator 2: Judgment Day.”

Join the conversation as a VIP Member

Recommended

Trending on Townhall Videos

Advertisement
Advertisement
Advertisement