Re: [SystemSafety] Qualifying SW as "proven in use" [Measuring Software]

From: Todd Carpenter < >
Date: Mon, 24 Jun 2013 21:20:37 -0500

ST> For example, the code quality measure "Cyclomatic Complexity" (reference:
ST> Tom McCabe, ³A Complexity Measure², IEEE Transactions on Software
ST> Engineering, December, 1976) was validated many years ago by simply

DMJ> I am not aware of any study that validates this metric to a reasonable
DMJ> standard. There are a few studies that have used found a medium DMJ> correlation in a small number of data points.

Les Hatton had an interesting presentation in '08, "The role of empiricism in improving the reliability of future software" that shows there is a strong correlation between source-lines-of-code and cyclomatic complexity, and that defects follow a power law distribution:

http://www.leshatton.org/wp-content/uploads/2012/01/TAIC2008-29-08-2008.pdf

Just another voice, which probably just adds evidence to the argument that we haven't yet found a trivial metric to predict bugs...

-TC

On 6/24/2013 6:38 PM, Derek M Jones wrote:
> All,
>
>> Actually, getting the evidence isn't that tricky, it's just a lot of work.
>
> This is true of most things (+ getting the money to do the work).
>
>> Essentially all one needs to do is to run a correlation analysis
>> (correlation coefficient) between the proposed quality measure on the one
>> hand, and defect tracking data on the other hand.
>
> There is plenty of dirty data out there that needs to be cleaned up
> before it can be used:
> http://shape-of-code.coding-guidelines.com/2013/06/02/data-cleaning-the-next-step-in-empirical-software-engineering/
>
>
>> For example, the code quality measure "Cyclomatic Complexity" (reference:
>> Tom McCabe, ³A Complexity Measure², IEEE Transactions on Software
>> Engineering, December, 1976) was validated many years ago by simply
>
> I am not aware of any study that validates this metric to a reasonable
> standard. There are a few studies that have used found a medium
> correlation in a small number of data points.
>
> I have some data whose writeup is not yet available in a good enough
> draft form to post to my blog. I only plan to write about this
> metric because it is widely cited and is long overdue for relegation
> to the history of good ideas that did not stand the scrutiny of
> empirical evidence.
>
>> finding a strong positive correlation between the cyclomatic complexity of
>> functions and the number of defects that were logged against those same
>
> Correlation is not causation.
>
> Cyclomatic complexity correlates well with lines of code, which
> in turn correlates well with number of faults.
>
>> functions (I.e., code in that function needed to be changed in order to
>> repair that defect).
>
> Changing the function may increase the number of faults. Creating two
> functions where there was previously one will reduce an existing peak
> in the distribution of values, but will it result in less faults
> overall?
>
> All this stuff with looking for outlier metric values is pure hand
> waving. Where is the evidence that the reworked code is better not
> worse?
>
>> According to one study of 18 production applications, code in functions
>> with cyclomatic complexity <=5 was about 45% of the total code base but
>> this code was responsible for only 12% of the defects logged against the
>> total code base. On the other hand, code in functions with cyclomatic
>> complexity of >=15 was only 11% of the code base but this same code was
>> responsible for 43% of the total defects. On a per-line-of-code basis,
>> functions with cyclomatic complexity >=15 have more than an order of
>> magnitude increase in defect density over functions measuring <=5.
>>
>> What I find interesting, personally, is that complexity metrics for
>> object-oriented software have been around for about 20 years and yet
>> nobody (to my knowledge) has done any correlation analysis at all (or, at
>> a minimum they have not published their results).
>>
>> The other thing to remember is that such measures consider only the
>> "syntax" (structure) of the code. I consider this to be *necessary* for
>> code quality, but far from *sufficient*. One also needs to consider the
>> "semantics" (meaning) of that same code. For example, to what extent is
>> the code based on reasonable abstractions? To what extent does the code
>> exhibit good encapsulation? What are the cohesion and coupling of the
>> code? Has the code used "design-to-invariants / design-forchange"? One can
>> have code that's perfectly structured in a syntactic sense and yet it's
>> garbage from the semantic perspective. Unfortunately, there isn't a way
>> (that I'm aware of, anyway) to do the necessary semantic analysis in an
>> automated fashion. Some other competent software professionals need to
>> look at the code and assess it from the semantic perspective.
>>
>> So while I applaud efforts like SQALE and others like it, one needs to be
>> careful that it's only a part of the whole story. More work--a lot
>> more--needs to be done before someone can reasonably say that some
>> particular code is "high quality".
>>
>>
>> Regards,
>>
>> -- steve
>>
>>
>>
>>
>>
>> -----Original Message-----
>> Date: Friday, June 21, 2013 6:04 AM
>> To: "systemsafety_at_xxxxxx >> Subject: Re: [SystemSafety] Qualifying SW as "proven
>> in use" [Measuring Software]
>>
>> I agree with Derek
>>
>> Code quality is only a means to an end
>> We need evidence to to show the means actually helps to achieve the ends.
>>
>> Getting this evidence is pretty tricky, as parallel developments for the
>> same project won't happen.
>> But you might be able to infer something on average over multiple projects.
>>
>> Derek M Jones wrote:
>>> Thierry,
>>>
>>>> To answer your questions:
>>>> 1°) Yes, there is some objective evidence that there is a correlation
>>>> between a low SQALE index and quality code.
>>>
>>> How is the quality of code measured?
>>>
>>> Below you say that SQALE DEFINES what is "good quality" code.
>>> In this case it is to be expected that a strong correlation will exist
>>> between a low SQALE index and its own definition of quality.
>>>
>>>> For example ITRIS has conducted a study where the "good quality" code
>>>> is statistically linked to a lower SQALE index, for industrial
>>>> software actually used in operations.
>>>
>>> Again how is quality measured?
>>>
>>>> No, there is not enough evidence, we wish there would be more people
>>>> working on getting the evidence.
>>>
>>> Is there any evidence apart from SQALE correlating with its own
>>> measures?
>>>
>>> This is a general problem, lots of researchers create their own
>>> definition of quality and don't show a causal connection to external
>>> attributes such as faults or subsequent costs.
>>>
>>> Without running parallel development efforts that
>>> follow/don't follow the guidelines it is difficult to see how
>>> reliable data can be obtained.
>>>
>>
>



The System Safety Mailing List
systemsafety_at_xxxxxx Received on Tue Jun 25 2013 - 04:20:58 CEST

This archive was generated by hypermail 2.3.0 : Sat Apr 20 2019 - 18:17:05 CEST