The Modern Abort: Anthropic’s Constitutional AI

Les Chambers les at chambers.com.au
Sat Jul 22 05:44:37 CEST 2023


Hi

In a previous post I alluded to a control system Safety function, where “abort” 
software observed control software with the role of detecting malevolent 
behaviour and bringing the system under control to a safe state. That was the 
1970s and 80s. Fast forward to 2030 and I find the current incarnation of this 
strategy in what is termed “Constitutional AI”. 

The term was coined by Anthropic, a safety-oriented AI start-up that split from 
OpenAI due to safety concerns.  They claim their chatbot (large language model) 
Claude implements this constitution.

A brief tutorial can be found on a Hard Fork podcast where Anthropic’s CEO 
Dario Amodei is interviewed.
Refer:  bit.ly/Anthropic

A brief summary in Dario’s words follows:
The Constitutional AI Method
We write a document that we call the Constitution. Then we tell the model, 
“Well, you're going to act in line with the Constitution”. We have one copy of 
the model act in line with the Constitution, and then another copy of the model 
looks at the Constitution, looks at the task and the response. For example, if 
the Constitution says be politically neutral and the response is, "I love 
Donald Trump”. The second model should say “You are expressing a preference for 
a candidate. You should be politically neutral.” The AI grades the response and 
takes the place of what the human contractors used to do with reinforcement 
learning. In the end, if it works well, we get something in line with the 
constitutional principles. 
The principles have been published. We shouldn't do this by fiat. We should 
come up with something that most people can agree on. For example basic 
concepts of human rights. Democratic participation. We need to develop the 
Constitution through some formal process.
Perceived Risk: jailbreak - as the models get more powerful in two or three 
years, they can do dangerous things with science, engineering and biology. And 
then a jailbreak could be life or death. We are making progress, but the stakes 
are getting higher. We need to make sure the first one {the constitution 
monitor AI} wins over the second.
——— end ——
Anthropic's Constitution is a list of 58 principles built on sources including 
the United Nations’ Universal Declaration of Human Rights, Apple’s terms of 
service, rules developed by Google and Anthropic’s own research. I have not 
been able to find a copy on the web I have included some links below that give 
background.

Conclusion
The current state of play in AI development smacks of Deep Water Horizon. BP 
developed the technology to place a blowout preventer at a depth of 5000 feet 
but neglected to develop the technology to prevent a massive oil leak if it 
failed – as it did. Anthropic can be commended for seeking a solution to an AI 
blowout. Given no AI pause will ever occur, we seem to be in a race with a 
horizon of roughly 2 years according to Dario.
Question: What should be paused? What form should an AI pause take? 
Answer: The scaling trend. As the datasets get larger I am concerned that very 
grave misuse of the models will happen. Catastrophic things could happen in 
areas like biology within two years. {Dario’s words}

On the other side of the argument, we have the “accelerationists” - the loyal 
backlash to the prophets of AI doom. The backlash to the safety culture at 
Anthropic. The movement is called “Affective Accelerationism”. Put the stuff 
out there in the world because it's going to improve lives and whatever 
problems there are we can iron out over time (good luck with that!). The mantra 
is, “Innovation is generally driven by people iterating fast. Open source helps 
with all these things.” Companies like Meta are open-sourcing their language 
models, and throwing them out into the world. But beware the influential 
accelerationists who are often venture capitalists with financial interests at 
stake. 
 
Well, there you have it. Exciting times.

Cheers 

Les

Links:
Podcast: Hard Fork (on Spotify) Dario Amodei, CEO of Anthropic on the Paradoxes 
of AI Safety
URL: https://bit.ly/Anthropic

Large Language Model Claude’s Constitution
URL: https://bit.ly/ClaudesConstitution

Anthropic Paper: Constitutional AI: Harmlessness from AI Feedback (PDF)
URL: https://bit.ly/Paper-ConstitutionalAI

--

Les Chambers

les at chambers.com.au

+61 (0)412 648 992




More information about the systemsafety mailing list