[SystemSafety] Qualifying SW as "proven in use" [Measuring Software]

Tue Jun 25 20:38:08 CEST 2013

Todd wrote:

"Just another voice, which probably just adds evidence to the argument
that we haven't yet found a
trivial metric to predict bugs..."

Yes, precisely. I believe it can never be a single number anyway. It will
have to be a set of numbers that will need to be balanced overall.

Elaborating on my previous post: I think that the relevant syntactic
complexity measures can be divided into two categories. Cyclomatic
complexity and depth of (decision) nesting would be in the first category,
measures of "local complexity". By this, I mean that the scope of the
complexity measure is internal to a single function (e.g., method in OO
programming). Fan out would be in the second category, measures of "global
complexity". Here, I mean measuring complexity of how that function
relates to other functions in its environment.

As I said, those 3 measures are the only ones I have reasonable data (and
experience) justifying their relevance. I'm convinced that there are a
whole set of other relevant measures out there, I just don't know what
they are because nobody's published the results of any correlation
analysis. I'm aware of two different proposed complexity metrics suites
for OO software:

Mark Lorenz and Jeff Kidd, Object Oriented Software Metrics, Prentice
Hall, 1994

Shyam Chidamber and Chris Kemerer, “A Metrics Suite for Object Oriented
Design”, IEEE Transactions on Software Engineering, vol 20, no 6, June
1994 (can be found at
http://faculty.salisbury.edu/~stlauterburg/COSC425/MetricForOOD_ChidamberKe
merer94.pdf)

All of the measures in both of those sets can be categorized into the
"local vs. global" categories, it's just that nobody has yet published any
data identifying which ones should be paid attention to and which ones
should be ignored. And, as I said, the trick is to strike an appropriate
overall balance between the local measures and the global ones.

Cheers,

-- steve

-----Original Message-----
From: Todd Carpenter <todd.carpenter at adventiumlabs.com>
Date: Monday, June 24, 2013 7:20 PM
To: "systemsafety at techfak.uni-bielefeld.de"
<systemsafety at techfak.uni-bielefeld.de>
Subject: Re: [SystemSafety] Qualifying SW as "proven in use"
[Measuring	Software]

ST> For example, the code quality measure "Cyclomatic Complexity"
(reference:
ST> Tom McCabe, ³A Complexity Measure², IEEE Transactions on Software
ST> Engineering, December, 1976) was validated many years ago by simply

DMJ> I am not aware of any study that validates this metric to a reasonable
DMJ> standard.  There are a few studies that have used found a medium
DMJ> correlation in a small number of data points.

Les Hatton had an interesting presentation in '08, "The role of empiricism
in improving the
reliability of future software" that shows there is a strong correlation
between
source-lines-of-code and cyclomatic complexity, and that defects follow a
power law distribution:

http://www.leshatton.org/wp-content/uploads/2012/01/TAIC2008-29-08-2008.pdf

Just another voice, which probably just adds evidence to the argument that
we haven't yet found a
trivial metric to predict bugs...

-TC

On 6/24/2013 6:38 PM, Derek M Jones wrote:
> All,
>
>> Actually, getting the evidence isn't that tricky, it's just a lot of
>>work.
>
> This is true of most things (+ getting the money to do the work).
>
>> Essentially all one needs to do is to run a correlation analysis
>> (correlation coefficient) between the proposed quality measure on the
>>one
>> hand, and defect tracking data on the other hand.
>
> There is plenty of dirty data out there that needs to be cleaned up
> before it can be used:
> 
>http://shape-of-code.coding-guidelines.com/2013/06/02/data-cleaning-the-ne
>xt-step-in-empirical-software-engineering/
>
>
>> For example, the code quality measure "Cyclomatic Complexity"
>>(reference:
>> Tom McCabe, ³A Complexity Measure², IEEE Transactions on Software
>> Engineering, December, 1976) was validated many years ago by simply
>
> I am not aware of any study that validates this metric to a reasonable
> standard.  There are a few studies that have used found a medium
> correlation in a small number of data points.
>
> I have some data whose writeup is not yet available in a good enough
> draft form to post to my blog.  I only plan to write about this
> metric because it is widely cited and is long overdue for relegation
> to the history of good ideas that did not stand the scrutiny of
> empirical evidence.
>
>> finding a strong positive correlation between the cyclomatic complexity
>>of
>> functions and the number of defects that were logged against those same
>
> Correlation is not causation.
>
> Cyclomatic complexity correlates well with lines of code, which
> in turn correlates well with number of faults.
>
>> functions (I.e., code in that function needed to be changed in order to
>> repair that defect).
>
> Changing the function may increase the number of faults.  Creating two
> functions where there was previously one will reduce an existing peak
> in the distribution of values, but will it result in less faults
> overall?
>
> All this stuff with looking for outlier metric values is pure hand
> waving.  Where is the evidence that the reworked code is better not
> worse?
>
>> According to one study of 18 production applications, code in functions
>> with cyclomatic complexity <=5 was about 45% of the total code base but
>> this code was responsible for only 12% of the defects logged against the
>> total code base. On the other hand, code in functions with cyclomatic
>> complexity of >=15 was only 11% of the code base but this same code was
>> responsible for 43% of the total defects. On a per-line-of-code basis,
>> functions with cyclomatic complexity >=15 have more than an order of
>> magnitude increase in defect density over functions measuring <=5.
>>
>> What I find interesting, personally, is that complexity metrics for
>> object-oriented software have been around for about 20 years and yet
>> nobody (to my knowledge) has done any correlation analysis at all (or,
>>at
>> a minimum they have not published their results).
>>
>> The other thing to remember is that such measures consider only the
>> "syntax" (structure) of the code. I consider this to be *necessary* for
>> code quality, but far from *sufficient*. One also needs to consider the
>> "semantics" (meaning) of that same code. For example, to what extent is
>> the code based on reasonable abstractions? To what extent does the code
>> exhibit good encapsulation? What are the cohesion and coupling of the
>> code? Has the code used "design-to-invariants / design-forchange"? One
>>can
>> have code that's perfectly structured in a syntactic sense and yet it's
>> garbage from the semantic perspective. Unfortunately, there isn't a way
>> (that I'm aware of, anyway) to do the necessary semantic analysis in an
>> automated fashion. Some other competent software professionals need to
>> look at the code and assess it from the semantic perspective.
>>
>> So while I applaud efforts like SQALE and others like it, one needs to
>>be
>> careful that it's only a part of the whole story. More work--a lot
>> more--needs to be done before someone can reasonably say that some
>> particular code is "high quality".
>>
>>
>> Regards,
>>
>> -- steve
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Peter Bishop <pgb at adelard.com>
>> Date: Friday, June 21, 2013 6:04 AM
>> To: "systemsafety at techfak.uni-bielefeld.de"
>> <systemsafety at techfak.uni-bielefeld.de>
>> Subject: Re: [SystemSafety] Qualifying SW as "proven
>> in    use"    [Measuring    Software]
>>
>> I agree with Derek
>>
>> Code quality is only a means to an end
>> We need evidence to to show  the means actually helps to achieve the
>>ends.
>>
>> Getting this evidence is pretty tricky, as parallel developments for the
>> same project won't happen.
>> But you might be able to infer something on average over multiple
>>projects.
>>
>> Derek M Jones wrote:
>>> Thierry,
>>>
>>>> To answer your questions:
>>>> 1°) Yes, there is some objective evidence that there is a correlation
>>>> between a low SQALE index and quality code.
>>>
>>> How is the quality of code measured?
>>>
>>> Below you say that SQALE DEFINES what is "good quality" code.
>>> In this case it is to be expected that a strong correlation will exist
>>> between a low SQALE index and its own definition of quality.
>>>
>>>> For example ITRIS has conducted a study where the "good quality" code
>>>> is statistically linked to a lower SQALE index, for industrial
>>>> software actually used in operations.
>>>
>>> Again how is quality measured?
>>>
>>>> No, there is not enough evidence, we wish there would be more people
>>>> working on getting the evidence.
>>>
>>> Is there any evidence apart from SQALE correlating with its own
>>> measures?
>>>
>>> This is a general problem, lots of researchers create their own
>>> definition of quality and don't show a causal connection to external
>>> attributes such as faults or subsequent costs.
>>>
>>> Without running parallel development efforts that
>>> follow/don't follow the guidelines it is difficult to see how
>>> reliable data can be obtained.
>>>
>>
>

_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE

-------------- next part --------------
A non-text attachment was scrubbed...
Name: default[1].xml
Type: application/xml
Size: 3222 bytes
Desc: default[1].xml
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20130625/8bc36653/attachment.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: default[2].xml
Type: application/xml
Size: 3222 bytes
Desc: default[2].xml
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20130625/8bc36653/attachment-0001.xml>