[SystemSafety] Qualifying SW as "proven in use" [Measuring Software]

Fri Jun 28 02:41:00 CEST 2013

Thanks, interestingly one of the other results found by Hatton was evidence
for the clustering of defects. Would be interesting to run a study on the
NAG library to see what the fine scale metrics looked like and how they
correlated to defect clusters.

On Fri, Jun 28, 2013 at 1:10 AM, Steve Tockey <Steve.Tockey at construx.com>wrote:

>
>  Matthew,
> Yes, presuming the components aren't method sized then that's exactly what
> I'm saying. Consider a simple example of 2 classes, A and B, both of which
> have 5 methods. Class A's 5 methods are all of average cyclomatic
> complexity and with correspondingly average defect densities. When you add
> up the defect counts for the five class A methods and divide that by the
> sum of the cyclomatic complexities of those same methods, the
> defects-per-complexity-point will be, as expected, fairly average.
>
>  On the other hand, four of class B's methods are very low cyclomatic
> complexity while one method is a very high. Correspondingly, the four of
> class B's low complexity methods have low defect densities while the high
> complexity method has a high defect density. When you add up the defect
> counts for the five class B methods and divide that by the sum of the
> cyclomatic complexities of those same methods, the
> defects-per-complexity-point will also tend to the same average.
>
>  So the correlation of defects to cyclomatic complexity at the class (or
> higher) level will always tend to an average because the low defect counts
> for the low complexity methods will tend to compensate for the high defect
> counts in the high complexity methods. The bigger the "component" (I.e.,
> the more functions/methods in the set), the more this averaging effect
> would  appear. I'm not at all surprised by people not seeing any
> correlation beyond the function/method level and I continue to wonder why
> people even keep looking for it there.
>
>  I would be willing to bet that if the data used in the Hatton study were
> broken down to the function/method level, a clear correlation would be
> appear.
>
>
>  Regards,
>
>  -- steve
>
>
>
>
>   From: Matthew Squair <mattsquair at gmail.com>
> Date: Wednesday, June 26, 2013 8:42 PM
> To: Steve Tockey <Steve.Tockey at construx.com>, Bielefield Safety List <
> systemsafety at techfak.uni-bielefeld.de>
>
> Subject: Re: [SystemSafety] Qualifying SW as "proven in use" [Measuring
> Software]
>
>   Thanks Steve,
>
>  The full paper can be found at the link below, note that the metrics
> were applied to each of the three thousand odd components in the library. I
> take it that you'd say that even at component level (presuming components
> are not method sized) that the view is still too wide to generate a
> meaningful correlation?
>
>  http://www.leshatton.org/wp-content/uploads/2012/01/NAG01_01-08.pdf
>
>
> On Wed, Jun 26, 2013 at 10:18 PM, Steve Tockey <Steve.Tockey at construx.com>wrote:
>
>>
>>  "Reading the presentation of Les Hatton's 2008 paper, "The role of
>> empiricism in improving the reliability of future software" he found using
>> empirical techniques in a large scale study (of NAG Fortran and C
>> libraries) that cyclomatic complexity was 'effectively useless' and that no
>> metric strongly correlated (some actually weakly anti-correlated)."
>>
>>  I looked at the slide deck on his web site and it appears to me that
>> he's making the same mistake I referred to earlier:
>>
>>  ----- begin cut here -----
>>
>>   Maybe we are using different applications of cyclomatic complexity to
>> code? Yes, sure, increasing the total number of lines of code in some code
>> base will almost certainly increase the total number of decisions in that
>> code base, and probably by roughly an equal proportion. 10,000 lines of
>> code with 2000 decisions almost certainly implies close to 4000 decisions
>> in 20,000 lines of code.
>>
>>   But I'm not looking for a correlation of overall, total code base
>> cyclomatic complexity to overall defects. I'm looking for the correlation
>> of cyclomatic complexity within a single function/method to the defect
>>  density within that same single function/method. Figure 4 in the
>> Schroeder
>> paper shows a strong correlation of function/method-level cyclomatic
>> complexity and function/method-level defect density. Again, reverse
>> engineering from the numbers in Figure 4, shows that the defect density
>> goes up by more than an order of magnitude between cyclomatic complexity
>> less than/equal to 5 vs greater than/equal to 15 ***within a single
>> function***.
>>
>>  ----- end cut here -----
>>
>>  My interpretation of Hatton's results is that he's looking at total
>> cyclomatic complexity in the entire code base. It's not relevant at that
>> level. Look at it at the function/method level and it becomes relevant.
>>
>>  "So perhaps we should not use metrics, period?"
>>
>>  That a tool gets mis-applied is not the fault of the tool, it's the
>> fault of the tool user. People should be educated in proper use of tools
>> before they use them…
>>
>>
>>  Regards,
>>
>>  -- steve
>>
>>
>>
>>   From: Matthew Squair <mattsquair at gmail.com>
>> Date: Tuesday, June 25, 2013 7:15 PM
>> To: Bielefield Safety List <systemsafety at techfak.uni-bielefeld.de>
>>
>> Subject: Re: [SystemSafety] Qualifying SW as "proven in use" [Measuring
>> Software]
>>
>>   Reading the presentation of Les Hatton's 2008 paper, "The role of
>> empiricism in improving the reliability of future software" he found using
>> empirical techniques in a large scale study (of NAG Fortran and C
>> libraries) that cyclomatic complexity was 'effectively useless' and that no
>> metric strongly correlated (some actually weakly anti-correlated).
>>
>>  So it does seem that there is a basis on which we can empirically judge
>> the efficacy of software metrics.
>>
>>  So perhaps we should not use metrics, period?
>>
>>
>>
>>  On Wed, Jun 26, 2013 at 10:54 AM, Derek M Jones <derek at knosof.co.uk>wrote:
>>
>>> Steve,
>>>
>>> > I think we both strongly agree that there really needs to be a lot more
>>> > evidence.
>>>
>>>  Yes.  No point quibbling over how little little might be.
>>>
>>> > But I'm not looking for a correlation of overall, total code base
>>> > cyclomatic complexity to overall defects. I'm looking for the
>>> correlation
>>> > of cyclomatic complexity within a single function/method to the defect
>>> > density within that same single function/method.
>>>
>>>  Left to their own devices developers follow fairly regular patterns
>>> of code usage.  An extreme outlier of any metric is suspicious and
>>> often worth some investigation; it might be the case that
>>> the developer had a bad day or perhaps that function has to implement
>>> some complicated application functionality. or something else.
>>>
>>> Outliers are the low hanging fruit.
>>>
>>> The problems start, or rather the time wasting starts, when
>>> specific numbers get written into documents and is used to
>>> judge what developers produce.
>>>
>>> > along, what we need in the end is a balancing of a collection of
>>> syntactic
>>> > complexity metrics. When functions/methods are split, it always
>>> increases
>>> > fan out. When functions/methods are merged, it always decreases fan
>>> out.
>>> > The complexity didn't go away, it just moved to a different place in
>>> the
>>> > code. So having a limit in only one place easily allows people to
>>> squeeze
>>> > it into any other place. Having a set of appropriate limits means
>>> there's
>>> > a lot less chance of it going unnoticed somewhere else.
>>>
>>>  Yes, what we need to lots of good quality data for lots of code
>>> attributes so we can start looking at these trade-offs.
>>> Unfortunately the only good quality data I have involves small
>>> numbers of attributes.
>>>
>>> Having seen what a hash some researchers make of analysing the data
>>> they have I am loath to accept finding where the data is not made
>>> available.
>>>
>>> > accident. Just the same, I'm basically arguing for more
>>> professionalism in
>>> > the software industry. I mean seriously, the programmer who was
>>> > responsible for that single C++ class with a single method of 3400
>>> lines
>>> > of code with a cyclomatic complexity over 2400 is a total freaking
>>> moron
>>> > who has no business whatsoever in the software industry.
>>>
>>>  We are not going to move towards professionalism until there are less
>>> software development jobs than half competent developers.  Hiring
>>> people based on their ability to spell 'software' is not an
>>> environment where professionalism takes root.
>>>
>>> I keep telling people that the best way to reduce faults in code is
>>> to start sending developers to prison.  Nobody take me seriously (ok,
>>> yes, it would probably be a difficult case to bring).
>>>
>>> > And, we will also always need semantic evaluation of code (which, as I
>>> > said earlier, has to be done by humans) because syntax-based metrics
>>> alone
>>> > will probably always be game-able.
>>>
>>>  Until strong AI arrives that will not happen.
>>> Even the simpler issue of identifier semantics is still way beyond our
>>> reach.  See:
>>> http://www.coding-guidelines.com/cbook/sent792.pdf
>>> for more than you could ever want to know about identifier selection
>>> issues.
>>>
>>> >
>>> > Regards,
>>> >
>>> > -- steve
>>> >
>>> >
>>> >
>>> >
>>> > -----Original Message-----
>>> > From: Derek M Jones <derek at knosof.co.uk>
>>> > Organization: Knowledge Software, Ltd
>>> > Date: Tuesday, June 25, 2013 4:21 PM
>>> > To: "systemsafety at techfak.uni-bielefeld.de"
>>> > <systemsafety at techfak.uni-bielefeld.de>
>>> > Subject: Re: [SystemSafety] Qualifying SW as "proven in use"
>>> > [Measuring    Software]
>>> >
>>> > Steve,
>>> >
>>> > ...
>>> >> "local vs. global" categories, it's just that nobody has yet published
>>> >> any
>>> >> data identifying which ones should be paid attention to and which ones
>>> >> should be ignored.
>>> >
>>> > So you agree that there is no empirical evidence.
>>> >
>>> > Your statement is also true of almost every metrics paper published
>>> > todate.
>>> >
>>> > With so many different metrics having been proposed at least one of
>>> > them is likely to agree with the empirical data that is yet to be
>>> > published.
>>> >
>>> > You cited the paper: “A Practical Guide to Object-Oriented Metrics”
>>> > as the source of the cyclomatic complexity vs fault correlation
>>> > claim.  Fig 4 looks like it contains the data.  No standard
>>> > deviation is given for the values, but this would have to be
>>> > very large to ruin what looks like a reasonable correlation.
>>> >
>>> > Such a correlation can often be found, however:
>>> >
>>> >      o cyclomatic complexity is just one of many 'complexity'
>>> > metrics that have a high correlation with quantity of code,
>>> > so why not just measure lines of code?
>>> >
>>> >      o once developers know they are being judged by some metric
>>> > or other they can easily game the system by actions such as
>>> > splitting/merging functions.  If the metric has a causal connection
>>> > to the quantity of interest, e.g., faults, then everybody is happy
>>> > for developers to what what they will to reduce the metric,
>>> > but if the connection is simply a correlation (based on code
>>> > written by developers not trying to game the system) then
>>> > developers doing whatever it takes to improve the metric value
>>> > is at best wasted time.
>>> >
>>> >>
>>> >> -----Original Message-----
>>> >> From: Todd Carpenter <todd.carpenter at adventiumlabs.com>
>>> >> Date: Monday, June 24, 2013 7:20 PM
>>> >> To: "systemsafety at techfak.uni-bielefeld.de"
>>> >> <systemsafety at techfak.uni-bielefeld.de>
>>> >> Subject: Re: [SystemSafety] Qualifying SW as "proven in use"
>>> >> [Measuring   Software]
>>> >>
>>> >> ST> For example, the code quality measure "Cyclomatic Complexity"
>>> >> (reference:
>>> >> ST> Tom McCabe, ³A Complexity Measure², IEEE Transactions on Software
>>> >> ST> Engineering, December, 1976) was validated many years ago by
>>> simply
>>> >>
>>> >> DMJ> I am not aware of any study that validates this metric to a
>>> >> reasonable
>>> >> DMJ> standard.  There are a few studies that have used found a medium
>>> >> DMJ> correlation in a small number of data points.
>>> >>
>>> >> Les Hatton had an interesting presentation in '08, "The role of
>>> >> empiricism
>>> >> in improving the
>>> >> reliability of future software" that shows there is a strong
>>> correlation
>>> >> between
>>> >> source-lines-of-code and cyclomatic complexity, and that defects
>>> follow a
>>> >> power law distribution:
>>> >>
>>> >>
>>> >>
>>> http://www.leshatton.org/wp-content/uploads/2012/01/TAIC2008-29-08-2008.pd
>>> >> f
>>> >>
>>> >> Just another voice, which probably just adds evidence to the argument
>>> >> that
>>> >> we haven't yet found a
>>> >> trivial metric to predict bugs...
>>> >>
>>> >> -TC
>>> >>
>>> >> On 6/24/2013 6:38 PM, Derek M Jones wrote:
>>> >>> All,
>>> >>>
>>> >>>> Actually, getting the evidence isn't that tricky, it's just a lot of
>>> >>>> work.
>>> >>>
>>> >>> This is true of most things (+ getting the money to do the work).
>>> >>>
>>> >>>> Essentially all one needs to do is to run a correlation analysis
>>> >>>> (correlation coefficient) between the proposed quality measure on
>>> the
>>> >>>> one
>>> >>>> hand, and defect tracking data on the other hand.
>>> >>>
>>> >>> There is plenty of dirty data out there that needs to be cleaned up
>>> >>> before it can be used:
>>> >>>
>>> >>>
>>> >>>
>>> http://shape-of-code.coding-guidelines.com/2013/06/02/data-cleaning-the-n
>>> >>> e
>>> >>> xt-step-in-empirical-software-engineering/
>>> >>>
>>> >>>
>>> >>>> For example, the code quality measure "Cyclomatic Complexity"
>>> >>>> (reference:
>>> >>>> Tom McCabe, ³A Complexity Measure², IEEE Transactions on Software
>>> >>>> Engineering, December, 1976) was validated many years ago by simply
>>> >>>
>>> >>> I am not aware of any study that validates this metric to a
>>> reasonable
>>> >>> standard.  There are a few studies that have used found a medium
>>> >>> correlation in a small number of data points.
>>> >>>
>>> >>> I have some data whose writeup is not yet available in a good enough
>>> >>> draft form to post to my blog.  I only plan to write about this
>>> >>> metric because it is widely cited and is long overdue for relegation
>>> >>> to the history of good ideas that did not stand the scrutiny of
>>> >>> empirical evidence.
>>> >>>
>>> >>>> finding a strong positive correlation between the cyclomatic
>>> complexity
>>> >>>> of
>>> >>>> functions and the number of defects that were logged against those
>>> same
>>> >>>
>>> >>> Correlation is not causation.
>>> >>>
>>> >>> Cyclomatic complexity correlates well with lines of code, which
>>> >>> in turn correlates well with number of faults.
>>> >>>
>>> >>>> functions (I.e., code in that function needed to be changed in
>>> order to
>>> >>>> repair that defect).
>>> >>>
>>> >>> Changing the function may increase the number of faults.  Creating
>>> two
>>> >>> functions where there was previously one will reduce an existing peak
>>> >>> in the distribution of values, but will it result in less faults
>>> >>> overall?
>>> >>>
>>> >>> All this stuff with looking for outlier metric values is pure hand
>>> >>> waving.  Where is the evidence that the reworked code is better not
>>> >>> worse?
>>> >>>
>>> >>>> According to one study of 18 production applications, code in
>>> functions
>>> >>>> with cyclomatic complexity <=5 was about 45% of the total code base
>>> but
>>> >>>> this code was responsible for only 12% of the defects logged against
>>> >>>> the
>>> >>>> total code base. On the other hand, code in functions with
>>> cyclomatic
>>> >>>> complexity of >=15 was only 11% of the code base but this same code
>>> was
>>> >>>> responsible for 43% of the total defects. On a per-line-of-code
>>> basis,
>>> >>>> functions with cyclomatic complexity >=15 have more than an order of
>>> >>>> magnitude increase in defect density over functions measuring <=5.
>>> >>>>
>>> >>>> What I find interesting, personally, is that complexity metrics for
>>> >>>> object-oriented software have been around for about 20 years and yet
>>> >>>> nobody (to my knowledge) has done any correlation analysis at all
>>> (or,
>>> >>>> at
>>> >>>> a minimum they have not published their results).
>>> >>>>
>>> >>>> The other thing to remember is that such measures consider only the
>>> >>>> "syntax" (structure) of the code. I consider this to be *necessary*
>>> for
>>> >>>> code quality, but far from *sufficient*. One also needs to consider
>>> the
>>> >>>> "semantics" (meaning) of that same code. For example, to what
>>> extent is
>>> >>>> the code based on reasonable abstractions? To what extent does the
>>> code
>>> >>>> exhibit good encapsulation? What are the cohesion and coupling of
>>> the
>>> >>>> code? Has the code used "design-to-invariants / design-forchange"?
>>> One
>>> >>>> can
>>> >>>> have code that's perfectly structured in a syntactic sense and yet
>>> it's
>>> >>>> garbage from the semantic perspective. Unfortunately, there isn't a
>>> way
>>> >>>> (that I'm aware of, anyway) to do the necessary semantic analysis
>>> in an
>>> >>>> automated fashion. Some other competent software professionals need
>>> to
>>> >>>> look at the code and assess it from the semantic perspective.
>>> >>>>
>>> >>>> So while I applaud efforts like SQALE and others like it, one needs
>>> to
>>> >>>> be
>>> >>>> careful that it's only a part of the whole story. More work--a lot
>>> >>>> more--needs to be done before someone can reasonably say that some
>>> >>>> particular code is "high quality".
>>> >>>>
>>> >>>>
>>> >>>> Regards,
>>> >>>>
>>> >>>> -- steve
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> -----Original Message-----
>>> >>>> From: Peter Bishop <pgb at adelard.com>
>>> >>>> Date: Friday, June 21, 2013 6:04 AM
>>> >>>> To: "systemsafety at techfak.uni-bielefeld.de"
>>> >>>> <systemsafety at techfak.uni-bielefeld.de>
>>> >>>> Subject: Re: [SystemSafety] Qualifying SW as "proven
>>> >>>> in    use"    [Measuring    Software]
>>> >>>>
>>> >>>> I agree with Derek
>>> >>>>
>>> >>>> Code quality is only a means to an end
>>> >>>> We need evidence to to show  the means actually helps to achieve the
>>> >>>> ends.
>>> >>>>
>>> >>>> Getting this evidence is pretty tricky, as parallel developments for
>>> >>>> the
>>> >>>> same project won't happen.
>>> >>>> But you might be able to infer something on average over multiple
>>> >>>> projects.
>>> >>>>
>>> >>>> Derek M Jones wrote:
>>> >>>>> Thierry,
>>> >>>>>
>>> >>>>>> To answer your questions:
>>> >>>>>> 1°) Yes, there is some objective evidence that there is a
>>> correlation
>>> >>>>>> between a low SQALE index and quality code.
>>> >>>>>
>>> >>>>> How is the quality of code measured?
>>> >>>>>
>>> >>>>> Below you say that SQALE DEFINES what is "good quality" code.
>>> >>>>> In this case it is to be expected that a strong correlation will
>>> exist
>>> >>>>> between a low SQALE index and its own definition of quality.
>>> >>>>>
>>> >>>>>> For example ITRIS has conducted a study where the "good quality"
>>> code
>>> >>>>>> is statistically linked to a lower SQALE index, for industrial
>>> >>>>>> software actually used in operations.
>>> >>>>>
>>> >>>>> Again how is quality measured?
>>> >>>>>
>>> >>>>>> No, there is not enough evidence, we wish there would be more
>>> people
>>> >>>>>> working on getting the evidence.
>>> >>>>>
>>> >>>>> Is there any evidence apart from SQALE correlating with its own
>>> >>>>> measures?
>>> >>>>>
>>> >>>>> This is a general problem, lots of researchers create their own
>>> >>>>> definition of quality and don't show a causal connection to
>>> external
>>> >>>>> attributes such as faults or subsequent costs.
>>> >>>>>
>>> >>>>> Without running parallel development efforts that
>>> >>>>> follow/don't follow the guidelines it is difficult to see how
>>> >>>>> reliable data can be obtained.
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >> _______________________________________________
>>> >> The System Safety Mailing List
>>> >> systemsafety at TechFak.Uni-Bielefeld.DE
>>> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> The System Safety Mailing List
>>> >> systemsafety at TechFak.Uni-Bielefeld.DE
>>> >>
>>> >
>>>
>>> --
>>> Derek M. Jones                  tel: +44 (0) 1252 520 667
>>> Knowledge Software Ltd          blog:shape-of-code.coding-guidelines.com
>>> Software analysis               http://www.knosof.co.uk
>>> _______________________________________________
>>> The System Safety Mailing List
>>> systemsafety at TechFak.Uni-Bielefeld.DE
>>>
>>
>>
>>
>>  --
>> *Matthew Squair*
>> *
>> *
>> Mob: +61 488770655
>> Email: MattSquair at gmail.com
>>
>
>
>
>  --
> *Matthew Squair*
> *
> *
> Mob: +61 488770655
> Email: MattSquair at gmail.com
>

-- 
*Matthew Squair*
*
*
Mob: +61 488770655
Email: MattSquair at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20130628/08cdc245/attachment-0001.html>