[SystemSafety] Qualifying SW as "proven in use" [Measuring Software]

Wed Jun 26 04:15:33 CEST 2013

Reading the presentation of Les Hatton's 2008 paper, "The role of
empiricism in improving the reliability of future software" he found using
empirical techniques in a large scale study (of NAG Fortran and C
libraries) that cyclomatic complexity was 'effectively useless' and that no
metric strongly correlated (some actually weakly anti-correlated).

So it does seem that there is a basis on which we can empirically judge the
efficacy of software metrics.

So perhaps we should not use metrics, period?

On Wed, Jun 26, 2013 at 10:54 AM, Derek M Jones <derek at knosof.co.uk> wrote:

> Steve,
>
> > I think we both strongly agree that there really needs to be a lot more
> > evidence.
>
> Yes.  No point quibbling over how little little might be.
>
> > But I'm not looking for a correlation of overall, total code base
> > cyclomatic complexity to overall defects. I'm looking for the correlation
> > of cyclomatic complexity within a single function/method to the defect
> > density within that same single function/method.
>
> Left to their own devices developers follow fairly regular patterns
> of code usage.  An extreme outlier of any metric is suspicious and
> often worth some investigation; it might be the case that
> the developer had a bad day or perhaps that function has to implement
> some complicated application functionality. or something else.
>
> Outliers are the low hanging fruit.
>
> The problems start, or rather the time wasting starts, when
> specific numbers get written into documents and is used to
> judge what developers produce.
>
> > along, what we need in the end is a balancing of a collection of
> syntactic
> > complexity metrics. When functions/methods are split, it always increases
> > fan out. When functions/methods are merged, it always decreases fan out.
> > The complexity didn't go away, it just moved to a different place in the
> > code. So having a limit in only one place easily allows people to squeeze
> > it into any other place. Having a set of appropriate limits means there's
> > a lot less chance of it going unnoticed somewhere else.
>
> Yes, what we need to lots of good quality data for lots of code
> attributes so we can start looking at these trade-offs.
> Unfortunately the only good quality data I have involves small
> numbers of attributes.
>
> Having seen what a hash some researchers make of analysing the data
> they have I am loath to accept finding where the data is not made
> available.
>
> > accident. Just the same, I'm basically arguing for more professionalism
> in
> > the software industry. I mean seriously, the programmer who was
> > responsible for that single C++ class with a single method of 3400 lines
> > of code with a cyclomatic complexity over 2400 is a total freaking moron
> > who has no business whatsoever in the software industry.
>
> We are not going to move towards professionalism until there are less
> software development jobs than half competent developers.  Hiring
> people based on their ability to spell 'software' is not an
> environment where professionalism takes root.
>
> I keep telling people that the best way to reduce faults in code is
> to start sending developers to prison.  Nobody take me seriously (ok,
> yes, it would probably be a difficult case to bring).
>
> > And, we will also always need semantic evaluation of code (which, as I
> > said earlier, has to be done by humans) because syntax-based metrics
> alone
> > will probably always be game-able.
>
> Until strong AI arrives that will not happen.
> Even the simpler issue of identifier semantics is still way beyond our
> reach.  See:
> http://www.coding-guidelines.com/cbook/sent792.pdf
> for more than you could ever want to know about identifier selection
> issues.
>
> >
> > Regards,
> >
> > -- steve
> >
> >
> >
> >
> > -----Original Message-----
> > From: Derek M Jones <derek at knosof.co.uk>
> > Organization: Knowledge Software, Ltd
> > Date: Tuesday, June 25, 2013 4:21 PM
> > To: "systemsafety at techfak.uni-bielefeld.de"
> > <systemsafety at techfak.uni-bielefeld.de>
> > Subject: Re: [SystemSafety] Qualifying SW as "proven in use"
> > [Measuring    Software]
> >
> > Steve,
> >
> > ...
> >> "local vs. global" categories, it's just that nobody has yet published
> >> any
> >> data identifying which ones should be paid attention to and which ones
> >> should be ignored.
> >
> > So you agree that there is no empirical evidence.
> >
> > Your statement is also true of almost every metrics paper published
> > todate.
> >
> > With so many different metrics having been proposed at least one of
> > them is likely to agree with the empirical data that is yet to be
> > published.
> >
> > You cited the paper: “A Practical Guide to Object-Oriented Metrics”
> > as the source of the cyclomatic complexity vs fault correlation
> > claim.  Fig 4 looks like it contains the data.  No standard
> > deviation is given for the values, but this would have to be
> > very large to ruin what looks like a reasonable correlation.
> >
> > Such a correlation can often be found, however:
> >
> >      o cyclomatic complexity is just one of many 'complexity'
> > metrics that have a high correlation with quantity of code,
> > so why not just measure lines of code?
> >
> >      o once developers know they are being judged by some metric
> > or other they can easily game the system by actions such as
> > splitting/merging functions.  If the metric has a causal connection
> > to the quantity of interest, e.g., faults, then everybody is happy
> > for developers to what what they will to reduce the metric,
> > but if the connection is simply a correlation (based on code
> > written by developers not trying to game the system) then
> > developers doing whatever it takes to improve the metric value
> > is at best wasted time.
> >
> >>
> >> -----Original Message-----
> >> From: Todd Carpenter <todd.carpenter at adventiumlabs.com>
> >> Date: Monday, June 24, 2013 7:20 PM
> >> To: "systemsafety at techfak.uni-bielefeld.de"
> >> <systemsafety at techfak.uni-bielefeld.de>
> >> Subject: Re: [SystemSafety] Qualifying SW as "proven in use"
> >> [Measuring   Software]
> >>
> >> ST> For example, the code quality measure "Cyclomatic Complexity"
> >> (reference:
> >> ST> Tom McCabe, ³A Complexity Measure², IEEE Transactions on Software
> >> ST> Engineering, December, 1976) was validated many years ago by simply
> >>
> >> DMJ> I am not aware of any study that validates this metric to a
> >> reasonable
> >> DMJ> standard.  There are a few studies that have used found a medium
> >> DMJ> correlation in a small number of data points.
> >>
> >> Les Hatton had an interesting presentation in '08, "The role of
> >> empiricism
> >> in improving the
> >> reliability of future software" that shows there is a strong correlation
> >> between
> >> source-lines-of-code and cyclomatic complexity, and that defects follow
> a
> >> power law distribution:
> >>
> >>
> >>
> http://www.leshatton.org/wp-content/uploads/2012/01/TAIC2008-29-08-2008.pd
> >> f
> >>
> >> Just another voice, which probably just adds evidence to the argument
> >> that
> >> we haven't yet found a
> >> trivial metric to predict bugs...
> >>
> >> -TC
> >>
> >> On 6/24/2013 6:38 PM, Derek M Jones wrote:
> >>> All,
> >>>
> >>>> Actually, getting the evidence isn't that tricky, it's just a lot of
> >>>> work.
> >>>
> >>> This is true of most things (+ getting the money to do the work).
> >>>
> >>>> Essentially all one needs to do is to run a correlation analysis
> >>>> (correlation coefficient) between the proposed quality measure on the
> >>>> one
> >>>> hand, and defect tracking data on the other hand.
> >>>
> >>> There is plenty of dirty data out there that needs to be cleaned up
> >>> before it can be used:
> >>>
> >>>
> >>>
> http://shape-of-code.coding-guidelines.com/2013/06/02/data-cleaning-the-n
> >>> e
> >>> xt-step-in-empirical-software-engineering/
> >>>
> >>>
> >>>> For example, the code quality measure "Cyclomatic Complexity"
> >>>> (reference:
> >>>> Tom McCabe, ³A Complexity Measure², IEEE Transactions on Software
> >>>> Engineering, December, 1976) was validated many years ago by simply
> >>>
> >>> I am not aware of any study that validates this metric to a reasonable
> >>> standard.  There are a few studies that have used found a medium
> >>> correlation in a small number of data points.
> >>>
> >>> I have some data whose writeup is not yet available in a good enough
> >>> draft form to post to my blog.  I only plan to write about this
> >>> metric because it is widely cited and is long overdue for relegation
> >>> to the history of good ideas that did not stand the scrutiny of
> >>> empirical evidence.
> >>>
> >>>> finding a strong positive correlation between the cyclomatic
> complexity
> >>>> of
> >>>> functions and the number of defects that were logged against those
> same
> >>>
> >>> Correlation is not causation.
> >>>
> >>> Cyclomatic complexity correlates well with lines of code, which
> >>> in turn correlates well with number of faults.
> >>>
> >>>> functions (I.e., code in that function needed to be changed in order
> to
> >>>> repair that defect).
> >>>
> >>> Changing the function may increase the number of faults.  Creating two
> >>> functions where there was previously one will reduce an existing peak
> >>> in the distribution of values, but will it result in less faults
> >>> overall?
> >>>
> >>> All this stuff with looking for outlier metric values is pure hand
> >>> waving.  Where is the evidence that the reworked code is better not
> >>> worse?
> >>>
> >>>> According to one study of 18 production applications, code in
> functions
> >>>> with cyclomatic complexity <=5 was about 45% of the total code base
> but
> >>>> this code was responsible for only 12% of the defects logged against
> >>>> the
> >>>> total code base. On the other hand, code in functions with cyclomatic
> >>>> complexity of >=15 was only 11% of the code base but this same code
> was
> >>>> responsible for 43% of the total defects. On a per-line-of-code basis,
> >>>> functions with cyclomatic complexity >=15 have more than an order of
> >>>> magnitude increase in defect density over functions measuring <=5.
> >>>>
> >>>> What I find interesting, personally, is that complexity metrics for
> >>>> object-oriented software have been around for about 20 years and yet
> >>>> nobody (to my knowledge) has done any correlation analysis at all (or,
> >>>> at
> >>>> a minimum they have not published their results).
> >>>>
> >>>> The other thing to remember is that such measures consider only the
> >>>> "syntax" (structure) of the code. I consider this to be *necessary*
> for
> >>>> code quality, but far from *sufficient*. One also needs to consider
> the
> >>>> "semantics" (meaning) of that same code. For example, to what extent
> is
> >>>> the code based on reasonable abstractions? To what extent does the
> code
> >>>> exhibit good encapsulation? What are the cohesion and coupling of the
> >>>> code? Has the code used "design-to-invariants / design-forchange"? One
> >>>> can
> >>>> have code that's perfectly structured in a syntactic sense and yet
> it's
> >>>> garbage from the semantic perspective. Unfortunately, there isn't a
> way
> >>>> (that I'm aware of, anyway) to do the necessary semantic analysis in
> an
> >>>> automated fashion. Some other competent software professionals need to
> >>>> look at the code and assess it from the semantic perspective.
> >>>>
> >>>> So while I applaud efforts like SQALE and others like it, one needs to
> >>>> be
> >>>> careful that it's only a part of the whole story. More work--a lot
> >>>> more--needs to be done before someone can reasonably say that some
> >>>> particular code is "high quality".
> >>>>
> >>>>
> >>>> Regards,
> >>>>
> >>>> -- steve
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Peter Bishop <pgb at adelard.com>
> >>>> Date: Friday, June 21, 2013 6:04 AM
> >>>> To: "systemsafety at techfak.uni-bielefeld.de"
> >>>> <systemsafety at techfak.uni-bielefeld.de>
> >>>> Subject: Re: [SystemSafety] Qualifying SW as "proven
> >>>> in    use"    [Measuring    Software]
> >>>>
> >>>> I agree with Derek
> >>>>
> >>>> Code quality is only a means to an end
> >>>> We need evidence to to show  the means actually helps to achieve the
> >>>> ends.
> >>>>
> >>>> Getting this evidence is pretty tricky, as parallel developments for
> >>>> the
> >>>> same project won't happen.
> >>>> But you might be able to infer something on average over multiple
> >>>> projects.
> >>>>
> >>>> Derek M Jones wrote:
> >>>>> Thierry,
> >>>>>
> >>>>>> To answer your questions:
> >>>>>> 1°) Yes, there is some objective evidence that there is a
> correlation
> >>>>>> between a low SQALE index and quality code.
> >>>>>
> >>>>> How is the quality of code measured?
> >>>>>
> >>>>> Below you say that SQALE DEFINES what is "good quality" code.
> >>>>> In this case it is to be expected that a strong correlation will
> exist
> >>>>> between a low SQALE index and its own definition of quality.
> >>>>>
> >>>>>> For example ITRIS has conducted a study where the "good quality"
> code
> >>>>>> is statistically linked to a lower SQALE index, for industrial
> >>>>>> software actually used in operations.
> >>>>>
> >>>>> Again how is quality measured?
> >>>>>
> >>>>>> No, there is not enough evidence, we wish there would be more people
> >>>>>> working on getting the evidence.
> >>>>>
> >>>>> Is there any evidence apart from SQALE correlating with its own
> >>>>> measures?
> >>>>>
> >>>>> This is a general problem, lots of researchers create their own
> >>>>> definition of quality and don't show a causal connection to external
> >>>>> attributes such as faults or subsequent costs.
> >>>>>
> >>>>> Without running parallel development efforts that
> >>>>> follow/don't follow the guidelines it is difficult to see how
> >>>>> reliable data can be obtained.
> >>>>>
> >>>>
> >>>
> >>
> >> _______________________________________________
> >> The System Safety Mailing List
> >> systemsafety at TechFak.Uni-Bielefeld.DE
> >>
> >>
> >>
> >> _______________________________________________
> >> The System Safety Mailing List
> >> systemsafety at TechFak.Uni-Bielefeld.DE
> >>
> >
>
> --
> Derek M. Jones                  tel: +44 (0) 1252 520 667
> Knowledge Software Ltd          blog:shape-of-code.coding-guidelines.com
> Software analysis               http://www.knosof.co.uk
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE
>

-- 
*Matthew Squair*
*
*
Mob: +61 488770655
Email: MattSquair at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20130626/8dcdbeb7/attachment-0001.html>