[SystemSafety] AI and the virtuous test Oracle - action now!

Sun Jul 23 05:27:07 CEST 2023

Brendan
I assume your “massively complex critical systems” include AI agents. So 

If Continuous Authority to Operate (CATO)  refers to the continuous monitoring 
and evaluation of an organization's information systems and processes to ensure 
they meet the necessary security requirements and maintain an authorized state 
of operation. 
 the solution we are approaching is Constitutional AI, where we 
set an AI to monitor an AI for malevolent behaviour, where that malevolent 
behaviour is defined as non-compliance with a Constitution under the control of 
humans.  As per my previous post, Anthropic is working on this, but the problem 
is not yet solved.

Anthropic is also working on Mechanistic Interpretability - what is this?
Dario Amodei’s words: “It's the science of figuring out what is going on inside 
the models. Explaining why and how it came up with the solutions it is 
providing. Important when it's creating output that we didn't expect. Like a 
brain scan to find out what's going on inside. “
Dario reported that this is still a research project. It's not commercial. They 
may not get results for a year or so. You can currently ask an AI to go through 
a problem step by step and expose its logic but this is not 100% reliable. We 
can safely say that there is no foolproof method for asking an AI how it came 
up with a solution or how it will approach a problem beforehand.

Given that an AI is likely to change its behaviour as a function of the data it 
senses in its environment, there is a need for continuous validation. This can 
only be achieved with a monitoring AI that is a permanent feature of a system 
for its operational life. An adult in the room if you like.

Further, we need principles laid down for AI development and deployment. I 
refer to some excellent points made by Stuart Russell in his book Human 
Compatible:
“Machines are beneficial to the extent that their actions can be expected to 
achieve our [human] objectives. [not their own objectives]” 

“The machine’s only objective is to maximize the realization of human 
preferences. The machine is initially uncertain about what those preferences 
are. The ultimate source of information about human preferences is human 
behaviour.”

“Uncertainty about objectives implies that machines will necessarily defer to 
humans: they will ask permission, they will accept correction, and they will 
allow themselves to be switched off.[amen to that]”

We already have the switched-off issue with the Blockchain running Bitcoin. It 
runs on more than 12,000 servers – try turning that off.

Then there is the issue of AIs that can modify themselves by writing their own 
code and replicating themselves. 

My conclusion is that if you are contemplating using an AI for anything that 
could damage human life or property just don't. 

Les

> > On 27 Jun 2023, at 1:45 pm, Les Chambers <les at chambers.com.au> wrote:
> > 
> > [â€¦.] cling 
> > to regulating processes that have ceased to exist - are likely to be 
overrun 
> > and made redundant. 
> > 
> > In favour of organisations such as:
> > 
> > - The Center for Human-Compatible AI at UC Berkeley
> > - The Future of Life Institute
> > - The Center for AI Safety (CAIS)
> > - Stanford Center for AI Safety
> > 
> > My view is that this is not a steady-as-she-goes situation
> 
> Late to the party, but I am moved to raise a few points.
> 
> â€¢ The question of whether AI raises a safety/security threat is 
fundamentally about the nature of our environmental control systems. 
>   They have many dangerous features with questionable functionality that are 
tolerable only under the presupposition of the good intentions of operators.
> 
> â€¢ My view of the state-or-art is that automated reasoning shines in two 
areas.
> 	- Plan-space search - making automated systems hard to beat for 
tactical and even strategic â€œgame" play
>         - Classification - making automated systems hard to beat for 
â€œinsightsâ€ into large data sets 
> 
> â€¢ The ML community is particularly â€œentrepreneurialâ€ and doesnâ€™t 
like being told to think before they leap. In particular, if [the] AI 
[community] is the problem, [the] AI [community] must be the solution.
> 
> Since I have not seen this discussed here, Iâ€™d like to raise a question 
that has recently been on my mind.
> 
> What do we think of the big push to authorise the deployment of massively 
complex critical systems through using automated [low] assurance techniques? 
The so called â€œcontinuous authority to operateâ€ cATO.

--

Les Chambers

les at chambers.com.au

+61 (0)412 648 992