[SystemSafety] Agile methods

Tue Sep 3 15:33:55 CEST 2013

>From the number of responses to the question on the use of "agile" methods in our domain, I assume that there is a fair amount of interest in the possibility.
I have not used recent "agile" methods, but as a manager of teams of developers, I had experience over several years in using - and, indeed, developing - an early forerunner to what is now called 'agile'. So I'll report on some of my key experiences and offer a few thoughts based on them.

Having done my reporting and offered my thoughts, I see that I have written rather a lengthy discourse. Sorry for that, but I hope that a few of you follow it through.

Initial consideration suggests that it is improbable that agile methods would be deemed suitable for safety-related applications.
But before coming to a conclusion, it makes sense to consider:
What problems we would wish a "new" method to overcome;
Whether is would overcome them - and, thus, make it worthwhile to move to it;
What disadvantages (if any) the agile method would introduce;
Whether they would affect our prime objectives (e.g. achieving safety in an effective, efficient and economic way);
Whether the disadvantages outweigh (or, at least, potentially outweigh) the advantages and, if not, whether the advantages make it worthwhile to try to eliminate or alleviate the disadvantages.

In the 1980s, I led teams that employed what we then referred to as "evolutionary" or "incremental" "development" or "delivery". Delivery was the goal, development was the means, so the name used at any given time reflected the context, or perspective, rather than a method("ology" - ugg). 

When we commenced, there was no method and there was no past experience on which we could rely. There was one academic that I was aware of, and one eminent consultant, whom I knew well, who advocated it. But they did not do so from positions of experience but, rather, from musing on how the then pervasive waterfall model could be improved on. I recall my friend assuring me that "you can make a new delivery every week", and, although such glibness alerted me to his ignorance, I also knew that my teams needed a new approach. We were mired in large projects with flawed specifications and only vague prospects of completion.

So we embarked on an evolutionary experiment.

Of course, we did so naively, and our hopes - and expectations - of a smooth path into a new world of advanced-level software engineering was soon shocked and shaken. I won't take time and space here to tell of all our adventures, but I'll remark on some of the lessons that we learned - lessons that I think it would be wise to address when considering agile methods for the development of safety-related software.

First, what were the advantages that we expected to achieve from incremental and evolutionary development and delivery?
The principal one was:
1.  To overcome the classic waterfall ("big-bang") problems of (a) taking a long development time to arrive at a working system; (b) and then discovering that what had been developed - and tested - did not meet the customers' real (and often, new) requirements.
Other advantages along the way were:
2. To develop the first working system quickly
And, thus, by providing early utility, to gain the customers' credibility and good will.
3. To obtain rapid feedback from customers and their users
And thus determine early if what was delivered was what they wanted, so that rapid corrective action could be taken.
3. To Iterate easily towards "the" correct and complete specification
As now, in those days it was almost impossible to derive a correct and complete specification in the first place. Customers - neither business-level customers nor users - knew what they really wanted, or even needed. So, though we hoped that each delivery would match the real requirements for the functions that it provided, we guessed that, in actuality, each would be a prototype on the basis of which we would quickly home-in on the real requirements.
4. By achieving the other hoped-for advantages, we expected to gain the esteem and confidence of the customers and their users.
5. As development manager, I hoped for improved morale in my staff
We, the developers, were so often dealt a bad hand - in the form of unspecified projects with short expected delivery times - that we always knew, from the start of each project, that there would be late delivery and then further requirements for a great deal of change, and so there were not many "high points" for us. So I hoped that new confidence from the customers and users would elevate, and maintain, the levels of the developers' morale.

At this point, before I discuss the lessons that we learned, I'll ask again: what advantages do proposers of agile methods anticipate (or hope for)? in their use in the development of safety-related software? Remember C? Surely, we would not advocate using agile methods for safety-related software just because they are widely used in other domains? Surely, we must have specific and proven advantages in mind - and, of course, I mean advantages that apply to us and not to some other domain, such as console games?

I'll now move on to some of the lessons that we learned, at least some of which might be considered disadvantages - and, certainly, problematic. Of course, they overlap and, in some cases, are inter-dependent, so my listing them separately and sequentially does not mean that I consider them to be discrete.

1. Specification. 
The prospect of incremental development resulted in many customers only specifying, in the first place, those features that initially came into their minds, or those that they most wanted. A big problem was when we had to start with requirements specified only by users; then we had no feel for the strategic whole of the system and, therefore, no context - or, at least, insufficient context - for effective planning.
This problem was probably of greater consequence in those days then now, for then the choice of hardware and the allowances for other resources, such as memory, were more critical.
Giving a thought to safety-related systems, it is difficult to imagine experienced planners and developers choosing to take a leap into the dark and commence development of this or that function without having a clear picture of the full system.
Without a specification, how is a project costed? How is it contracted-out? How is success determined?

2. Estimation
If the duration and cost of a project are not of any consequence, then it wouldn't matter if there was no basis for estimation of these quantities. But if they matter, and if confidence is to be placed in their planning, then confidence needs also to be placed in their estimation - the basis of which is a clear understanding of exactly what is to be done.
How could we determine what safety features are necessary if we cannot envisage the working system?

3. The Nature of Deliveries
"You can make a new delivery every week," was what my friend assured me. This, of course, was an exaggeration, but it meant that he perceived the frequent "addition" of new parts of the system. But the assumptions behind this must necessarily have included (a) the functions to be developed are small; (b) each is well defined; (c) a delivery consists only of one or a small number of discrete functions.
Yet, a stated advantage of the method is that early feedback can be generated by customers and their users and acted on by developers, so that quick change is made to the developing system. So, by definition, a delivery is almost certain to contain a new version of one or more functions already in use, due to requirements for change and also because of the need for maintenance - i.e. change due to discovering deviation from specification.
So a delivery is rather complex, and has far greater and wider implications than suggested by the glib patter of those advocating evolutionary (or incremental) methods.

4. Re-validation and the Frequency of Deliveries
There are those who would test only the new additional functions and implicitly assume that (a) the other changes are correct, and (b) no effect on the existing (now previous) system has been caused.
But, given what I just said about the composition of deliveries, many, if not most, of us would immediately deduce that the  new version of the working system, following a delivery, must necessarily be re-validated.
Now, early in a project, when the working system is small, re-validation may be rapidly achieved. But when the system becomes larger and the interconnections of its functions more complex, re-validation requires more planning, more design, more execution, more resources, and more time.
It wasn't long before we had to allow six weeks for it. Now, that was in the '80s, when everything was slower, when automation was minimal, and when we had to design and create our own testing tools.
So, these days, it should require a shorter time while, at the same time, being more thorough. But it still requires a great deal of planning. With each delivery introducing changes to the working system, as well as new functions, re-validation cannot be based only on regression testing.
So, an important conclusion is that a new delivery cannot be made every week, or at any short intervals. One cannot (or, should not) be made at shorter intervals than are required for re-validation of the system. Indeed, if the delivery interval is close to the re-validation time, then some staff must permanently be engaged in re-validation - and in most organisations, this would not be considered efficient or effective use of those staff.
With a re-validation time of six weeks, we settled on a delivery interval of three months. (And, by ensuring punctuality, we did gain the respect of our customers.)
And, one further point, which I believe to be pertinent to the safety domain:
Can it be considered safe to have a safety-related system subject to almost permanent change? The only periods during which we could be confident in its safety are those between the end of re-validation and the introduction of the subsequent delivery - a minority of it "working" time. And not to re-validate could not be an option.

5. Project Accounting
If development is for an in-house project, it may not be of great concern to the company whether its development staff (planners, designers, programmers, testers, etc.) are engaged in development according to a specification or in change due to new requests or maintenance. On the other hand, these distinctions may be of interest to such a company, and they are often essential to a project manager's accounting. They should certainly be of concern when the project is contracted out.
When the developers' normal role is to receive and act quickly on customers' feedback, development becomes a combination of development according to an original specification (however slight it may be), change according to new requests, and change due to the need for maintenance. How are they distinguished, in effort, in time, and in cost?
Companies typically bid low for a project, confident that they will be able to charge high for changes, which they "know" to be inevitable. So it is crucial that a customer considers these matters in contracting. The best way to reduce the risk of spiralling costs is for the customer also to have a specification in which he has confidence. Does this defeat the purpose, or the value, of agile? I suspect not.
Every week, or so, I hear on the news of some project - frequently a government IT project - whose costs are "spiralling". This has always been the case. But it seems that the development of development methods has not addressed, and has not reduced, this loss to companies and, perhaps mostly, to taxpayers. But then, when the spirals of banknotes leave the customer, where do they go to? It would be sensible to be careful in contracting-out a project based on agile development. 

6. Control of Change
Requests for change overwhelmed us. Users were delighted for us to be working closely with them, they were delighted to have a working system early, and they were uncharacteristically willing to be involved in specifying what they wanted. A prototype can have that effect. But now the users were doing the specifying that they had neglected to do when it should have been done. And, also, they were rectifying the gaps which, to be fair to them, they could not have filled because they had not known what the computer could provide. But also, the system was giving them new ideas, and they were translating these into requirements.
We faced the threat of an infinite project.
Moreover, with the huge number of "requests for change", whether to improve what had been provided or for new features, we were in danger of not finding time to develop the features defined in the original specification - that is to say, of refining, or tweeking, what already worked at the expense of pushing the project forward.
We had to develop processes for managing change, for distinguishing between the various types of change request, for prioritising them, and for planning their implementation. We had to define rules for accepting and rejecting them. And, as indicated earlier, we had to define rules for accounting for them. 

7. Strategic Basis for the System
A step that we had to take was to get the customer to provide a high-level strategic planner to distinguish between requests for change that should be accepted and those that were inappropriate to the project in hand. We had previously understood the importance of strategic definition of a project, but the inundation by requests for change, brought on by our new method, brought home to us the reality that we now could not live without it.
It was not easy to convince the customer that involvement in the project of a high-level planner was a good idea, and it probably would not be easy now. But, without clear strategic definition of a project's boundaries, on what basis can you reject requests for the introduction of new features, most of which would appear to be very useful?
But obtaining strategic involvement was only the first step. He now needed a basis for his decisions, and he realised that he had none: there had not been clear strategic boundaries placed on the project and the system. Indeed, this had been typical of all projects. And it was not easy to rectify because, typically, a system would be part-owned by several senior managers or directors, and bringing them together to define their joint system was not something that they had previously contemplated, or that they readily agreed was necessary. How it was achieved is another story.
One of my best remembered, and applied, lessons was the importance of strategic definition.

8. Configuration Management
Once the project is underway, there are, necessarily, several versions of the system, for example, the version in service, the one it has replaced, at least two under development, and perhaps several historic versions. Further, with a delivery introducing changes to several, and perhaps numerous, software modules, the complexity of configuration control is considerable. But this is not unknown to most of you, so I don't need to elaborate on it. We had to design and create software libraries and control procedures to meet our needs, but, since then, tools have been created to take a great deal of the burden off the shoulders of humans.

9. Documentation
It was extremely difficult to maintain correct documentation. One very obvious contributory factor was the rate of change. And exacerbation of this was caused by the number of versions of the system in existence at any one time. For example, considering only a single feature, we might have a version in service containing a temporary fix following the discovery of a fault, the version, now out of service, that it has replaced, and a third version, with a permanent fix, planned for introduction in six months time (scheduled for the next-but-one delivery). Typically, knowing that the temporary version will only be in service for a short time, staff might decide not to adjust the documentation to record it - resulting in the current documentation not matching the current system. And then, of course, there is the matter of finding time to record the permanent change, get the new documentation inspected and signed-off, and so on. Much of it did not get done. And I suspect that much of it wouldn't get done now.
Documentation almost never gets proper attention. With evolutionary processes, with high rates of change, the causes of un-maintained documentation are magnified, with obvious results.

As I recall, these were the most demanding of the problems thrown up by our evolutionary method. It was an experiment for us, and we had no textbooks or other guidance to steer by, so every new difficulty caused us to have to design a remedy and to divert staff from development duties to implement it. It may be that modern agile methods have solutions smoothly designed into them. Or it may not. 
Many of the problems that we encountered may not be perceived as being impediments to the use of agile in safety-related software development. But many certainly are. I imagine that safety-minded engineers and managers would want to pause and give some thought to matters of specification, strategic definition, validation and re-validation, the rate of change and change control, and documentation, among others.
But my purpose here is not to pass judgment, merely to offer some notes on my own experiences.
I wrote a book on them in the 90s, but it did not make the best-seller lists and is now out of print.

I imagine that I have written too much and that few will have read to this point. But having written it, I'll leave it written. To those of you who make it this far, many thanks for your endurance.

Felix.

On 30 Aug 2013, at 18:02, René Senden wrote:

> Dear all,
> 
> Do any of you have practical experience with reconciling established agile
> software development with software safety requirements (e.g. IEC-61508 or
> DO-178..) ?
> 
> Best regards,
> Rene
> 
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE