[SystemSafety] AI and the virtuous test Oracle - action now!
Les Chambers
les at chambers.com.au
Sun Jul 23 05:27:07 CEST 2023
Brendan
I assume your massively complex critical systems include AI agents. So
If Continuous Authority to Operate (CATO) refers to the continuous monitoring
and evaluation of an organization's information systems and processes to ensure
they meet the necessary security requirements and maintain an authorized state
of operation.
the solution we are approaching is Constitutional AI, where we
set an AI to monitor an AI for malevolent behaviour, where that malevolent
behaviour is defined as non-compliance with a Constitution under the control of
humans. As per my previous post, Anthropic is working on this, but the problem
is not yet solved.
Anthropic is also working on Mechanistic Interpretability - what is this?
Dario Amodeis words: It's the science of figuring out what is going on inside
the models. Explaining why and how it came up with the solutions it is
providing. Important when it's creating output that we didn't expect. Like a
brain scan to find out what's going on inside.
Dario reported that this is still a research project. It's not commercial. They
may not get results for a year or so. You can currently ask an AI to go through
a problem step by step and expose its logic but this is not 100% reliable. We
can safely say that there is no foolproof method for asking an AI how it came
up with a solution or how it will approach a problem beforehand.
Given that an AI is likely to change its behaviour as a function of the data it
senses in its environment, there is a need for continuous validation. This can
only be achieved with a monitoring AI that is a permanent feature of a system
for its operational life. An adult in the room if you like.
Further, we need principles laid down for AI development and deployment. I
refer to some excellent points made by Stuart Russell in his book Human
Compatible:
Machines are beneficial to the extent that their actions can be expected to
achieve our [human] objectives. [not their own objectives]
The machines only objective is to maximize the realization of human
preferences. The machine is initially uncertain about what those preferences
are. The ultimate source of information about human preferences is human
behaviour.
Uncertainty about objectives implies that machines will necessarily defer to
humans: they will ask permission, they will accept correction, and they will
allow themselves to be switched off.[amen to that]
We already have the switched-off issue with the Blockchain running Bitcoin. It
runs on more than 12,000 servers try turning that off.
Then there is the issue of AIs that can modify themselves by writing their own
code and replicating themselves.
My conclusion is that if you are contemplating using an AI for anything that
could damage human life or property just don't.
Les
> > On 27 Jun 2023, at 1:45 pm, Les Chambers <les at chambers.com.au> wrote:
> >
> > [â¦.] cling
> > to regulating processes that have ceased to exist - are likely to be
overrun
> > and made redundant.
> >
> > In favour of organisations such as:
> >
> > - The Center for Human-Compatible AI at UC Berkeley
> > - The Future of Life Institute
> > - The Center for AI Safety (CAIS)
> > - Stanford Center for AI Safety
> >
> > My view is that this is not a steady-as-she-goes situation
>
> Late to the party, but I am moved to raise a few points.
>
> ⢠The question of whether AI raises a safety/security threat is
fundamentally about the nature of our environmental control systems.
> They have many dangerous features with questionable functionality that are
tolerable only under the presupposition of the good intentions of operators.
>
> ⢠My view of the state-or-art is that automated reasoning shines in two
areas.
> - Plan-space search - making automated systems hard to beat for
tactical and even strategic âgame" play
> - Classification - making automated systems hard to beat for
âinsightsâ into large data sets
>
> ⢠The ML community is particularly âentrepreneurialâ and doesnât
like being told to think before they leap. In particular, if [the] AI
[community] is the problem, [the] AI [community] must be the solution.
>
> Since I have not seen this discussed here, Iâd like to raise a question
that has recently been on my mind.
>
> What do we think of the big push to authorise the deployment of massively
complex critical systems through using automated [low] assurance techniques?
The so called âcontinuous authority to operateâ cATO.
--
Les Chambers
les at chambers.com.au
+61 (0)412 648 992
More information about the systemsafety
mailing list