[SystemSafety] Fwd: Re: How Many Miles of Driving Would It Take to Demonstrate Autonomous Vehicle Reliability?

Sat Apr 23 06:34:07 CEST 2016

Hi Phil
Your paper covers a fascinating subject. I'm glad you're looking into it.
I agree that the core problem is validating non-deterministic algorithms.

The proactive measures you're suggesting such as fault injection should definitely be in the toolbox. An untested bit flip was responsible for one of the two near misses in my career. A more robust system that triggered a safety-related control action with a 32-bit word instead of one bit would have made the system impervious to this kind of fault. I sincerely hope that no one programming a stores management system drops a bomb or fires a missile from a warplane with anything less than a 64-bit unique command word - so too the control that applies the brakes to a vehicle.

I can't help thinking though that this is all pretty low-level stuff, in the realm of best practice that we should be practising already. The Monitor/Actuator architecture is also a good idea. We were practising this in the 1970 s on chemical reactor control systems. A process specialist worked independently of the control system development team identifying unsafe states of the plant (I think this is what you were referring to as "deductively-generated safety envelope"). When an unsafe state was detected (we called them abort conditions) the monitor software took over control and unconditionally restored the plant to a safe state. Fortunately for us the failover mechanism was often simple. That is, de-energising outputs to final control elements, which caused the return springs to close control valves. In chemical processing just shutting a valve or putting a reactor on full cooling is enough to preserve safety. This is clearly not the case with driverless cars - orders of magnitude more complex. 

Which leads me to my major point on your paper: you mentioned that "Vehicle-level testing won’t be enough to ensure safety." - I agree, it's necessary but not sufficient. But I'd point out that it is more necessary than ever before and needs to be put on steroids. I'm referring to not physical vehicle testing but simulated vehicle testing. The behaviour of non-deterministic algorithms , in terms of pass/fail, become deterministic when they hit the real world, or at least the simulated real-world. A sensor system mistakes a human being for a paper bag: yes/no. A sensor recognises a bus in heavy rain with lightning flashes: yes/no. As you pointed out, the infinite variety of possible scenarios can never be tested by driving a vehicle around the street. No matter how many vehicles you deploy you will never see enough black swans to fully test your system. With simulators however you can whistle up a different simulated black swan every few milliseconds. I learnt this on a project where substantial effort was sunk into an automated test rig. The three things I learnt from this which are relevant to driverless cars are:
1. The time scaling that becomes possible with automated testing can expose a system to orders of magnitude more bad scenarios (black swans) than it will ever see in its operational life. With enough computing power you could expose a vehicle sensor system to a few hundred years worth of human beings, buses, paper bags and plastic bags and so on - overnight.
2. Without automated testing it would have been impossible for us to properly regression test all the software modifications that were coming through during the life of the project. This is particularly relevant to automobiles in the current environment where Elon Musk is routinely providing vehicle owners with upgrades, and like lemmings his customers lap it up, it's accepted because it's part of the culture now,  they think they are driving mobile phones.
3. Effective automated test rigs are expensive to build. You need a whole team working on them. They also require maintenance. This inevitably puts the V&V group in conflict with the project manager and any salesman or managing director who has a perverted need to make a profit out of a project (I just can't stand those guys).

The next obvious question is: how to build an effective test rig for a sensor system built on machine learning? My solution is to leverage the special effects technology that is now incredibly mature in the movie business. I watched the bonus material "making of" stuff on Game of Thrones season five last night and was blown away by how far they have progressed. Anyone who has watched the episode featuring "The Massacre at Hardhome" would have to agree. 
Millions of dollars must've been poured into a 20 minute sequence, which demonstrates anything can be done these days with the will and enough money. Surely this is justified if the world is going to embrace self driving vehicle technology. Aviation couldn't function without simulators why not automobiles?

Full disclosure: I am heavily biased towards simulators. After the above case study project I had the pleasure of flying a strike jet into the ground mach 2.6. I survived, apparently. I'm therefore in furious agreement with your pronouncement: "Thus, alternate methods of validation are required, potentially including approaches such as simulation ...". "potentially????" Not potentially, it must be a major area of focus.

As a sidebar (and I hope I have your meaning correct) your comment: "One way to manage the complexity of requirements is to constrain operational concepts and engage in a phased expansion of requirements." Your pronouncement is sensible and eminently logical but, as we speak, it is being gleefully ignored by companies such as Tesla. Have you seen the horror videos of people sitting in the back seat of their cars having breakfast while the vehicle drives them to work. The Internet is redolent with videos of near misses in hands-off-the-wheel scenarios. The "phased expansion" is happening on Internet time and the driving public clearly does not have the maturity or the training to absorb it safely. Contrast this with the years of training required to qualify a person to drive an automated air vehicle. Even Tesla is advising its customers to "be careful", which is a bit like a casino advising problem gamblers to "gamble responsibly".  I've seen this human factors phenomenon several times in my career and here it is happening again: technology creep. Technology creeps into a new application domain and is given unjustifiable trust by incumbent bunnies blinded by the light of "cool" and accidents happen. What can we do but endure?

Thanks for the paper.

Cheers
Les

-----Original Message-----
From: systemsafety [mailto:systemsafety-bounces at lists.techfak.uni-bielefeld.de] On Behalf Of Philip Koopman
Sent: Saturday, April 23, 2016 8:58 AM
To: systemsafety at techfak.uni-bielefeld.de
Subject: [SystemSafety] Fwd: Re: How Many Miles of Driving Would It Take to Demonstrate Autonomous Vehicle Reliability?

I presented a paper on exactly this set of related problems at the SAE
World Congress last week.  Validating machine learning is for sure a
tough problem. So is deciding how ISO 26262 fits in.  And quality of the
training data.  And some other problems besides. Below is abstract and
pointer to the paper and presentation slides. Constructive feedback
welcome for follow-on work we are doing, although likely I will reply
individually rather than to the list. (Note that this paper was camera
ready before the RAND report was public.  Several folks have been
thinking about this topic for quite a while and just now are the results
becoming public.)

http://betterembsw.blogspot.com/2016/04/challenges-in-autonomous-vehicle.html

Challenges in Autonomous Vehicle Testing and Validation
        Philip Koopman & Michael Wagner
        Carnegie Mellon University; Edge Case Research LLC
        SAE World Congress, April 14, 2016

Abstract:
Software testing is all too often simply a bug hunt rather than a well
considered exercise in ensuring quality. A more methodical approach than
a simple cycle of system-level test-fail-patch-test will be required to
deploy safe autonomous vehicles at scale. The ISO 26262 development V
process sets up a framework that ties each type of testing to a
corresponding design or requirement document, but presents challenges
when adapted to deal with the sorts of novel testing problems that face
autonomous vehicles. This paper identifies five major challenge areas in
testing according to the V model for autonomous vehicles: driver out of
the loop, complex requirements, non-deterministic algorithms, inductive
learning algorithms, and fail operational systems. General solution
approaches that seem promising across these different challenge areas
include: phased deployment using successively relaxed operational
scenarios, use of a monitor/actuator pair architecture to separate the
most complex autonomy functions from simpler safety functions, and fault
injection as a way to perform more efficient edge case testing. While
significant challenges remain in safety-certifying the type of
algorithms that provide high-level autonomy themselves, it seems within
reach to instead architect the system and its accompanying design
process to be able to employ existing software safety approaches.

Cheers,
-- Phil

-- 
Phil Koopman -- koopman at cmu.edu -- www.ece.cmu.edu/~koopman

-------- Forwarded Message --------
Subject:     Re: [SystemSafety] How Many Miles of Driving Would It Take
to Demonstrate Autonomous Vehicle Reliability?
Date:     Fri, 22 Apr 2016 12:10:56 +0100
From:     Mike Ellims <michael.ellims at tesco.net>
To:     'Matthew Squair' <mattsquair at gmail.com>, 'Martyn Thomas'
<martyn at 72f.org>
CC:     'Bielefield Safety List' <systemsafety at techfak.uni-bielefeld.de>

Hi Matthew,

   >Â  Really if ever there was a solid economic argument for deploying
industrial scale formal method and proofs this would be it.

To a machine learning system? How would you provide a formal proof that
such a system had learnt the right response for all possible
circumstances? I can conceive that it could be applied to the algorithms
for learning but not to the learning itself. That is, you could show
that the learning system does what it was specified to do, assuming that
the specification is correct; but not that it was taught correctly or
completely. For that I suspect that you will need some sort of
statistical approach. How to do that is off course a major problem.

And Hi Martyn

   > Recertification after software change.Â  Or do we just accept the
huge attack surface that a fleet of AVs presents?

For â€œrecertificationâ€ Goggleâ€™s approach to date seems to be to
rerun all the driving done so far via simulationâ€¦ Iâ€™m not sure what
your implying with the comment on attack surfaces. Some far, as far as I
can tell aside from updates there is not vehicle to vehicle
communications. GPS is probably vulnerable to spoofing and jamming which
could be an issue but one would hope that had been accounted for as it
would count as a sensor failureâ€¦

   > The way in which AVs could change the safety of the total road
transport system. Is anyone studying total accidents rather than AV
accidents?

Yes, lots and lots of people mostly government bodies that that collect
the accident data in the first place and they tend to commission
detailed studies from outside organization (that donâ€™t quite answer
the question your interested in). In addition to that there are a few
manufacture/academic partnerships that study major road accidents in
forensic detail alongside police (I know of one in Germany and one in
the UK) which is intended to address many of the limitations to police
investigations. In addition some of the big auto manufactures have their
own departments e.g. VW have their own statistics department looking at
this. In addition there is a large academic community concerned
examining traffic accidents.

As an aside, some time ago we were discussing wheels fall off of cars. I
attempted to track down an answer to this from the online traffic stats
as there is a field for it in the STATS19 form (filled out by police).
However with some digging via email and a couple of phone calls to the
Dept. of Transport it stopped dead with no answer because itâ€™s a
write-in field on the form and the data isnâ€™t transferred to any of
the computer systems. If itâ€™s not on the computer they donâ€™t want to
know.

Cheers.

_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE