[SystemSafety] McCabe¹s cyclomatic complexity and accounting fraud

Thu Mar 29 20:10:17 CEST 2018

Derek,
With all due respect, I am very conscious of and concerned about the
Accounting Fraud issue you focus so much on. I even agree that it is a
serious problem today. However, I don’t think that taking an approach of
“since we don’t know everything that needs to be known about code
complexity metrics today then we should abandon everything that has
anything to do with it”.

As I said, I have seen a single C++ method of over 3400 SLOC and a
Cyclomatic Complexity of over 2400. I routinely see functions in the 150
to 350 range. I have seen functions with over 50 parameters. Are you
saying that code with as many as 2400+ decisions in a single function is
acceptable? Are you saying that code with over 50 parameters is
acceptable? Unless there is SOME limit, then developers will continue to
write code like that.

To address your specific questions:

“A vector of what, low viability metrics?” ― no, of course not. I can’t
tell you what that vector needs look like today because nobody has done
enough empirical research. I am convinced that a meaningful set of high
value code complexity metrics do exist, It’s just that nobody knows
exactly what it is yet.

“The whole point is to commit accounting fraud?” ― again, no. Once that
meaningful vector has been established, it won’t allow accounting fraud.
The fault you keep raising with cyclomatic complexity and just squeezing
the complexity somewhere else would be solved if a sufficient set of other
code complexity metrics were known to catch it and prevent too much of it
being pushed there.

“What is an "appropriate balance"?  Do you have a formula for this?” ― No,
unfortunately, I don’t have a complete formula for it. Yet. Again, more
empirical research is required here. What I am pretty confident of today
is:
*) Function-level cyclomatic complexity certainly less than 15, ideally
less than 10
*) Function-level decision nesting certainly less than 7, ideally less
than 4
*) Parameters on a function certainly fewer than 7, ideally fewer than 4
*) Function fan out certainly less than 11, ideally less than 7

Should there be other metrics with other limits? Yes. What should they be?
I don’t know. I have some suspicions, but I’m not ready to set limits on
them. Yet.

I can detail how I think we need to push the research to find that whole
formula (essentially Multi-variate correlation analysis). The problem is
that research has not been done yet. But, rather than saying “this whole
field is crap”, I think we should be saying, “Here is what we think we
know today, but we really do need to do a lot more work in this area
before we can claim to know everything. Are you interested enough in the
subject to help us push it forward?"

“What does having an appropriate balance buy you?” ― it buys you
syntactically well-structured code that is a whole lot easier to write,
read, and maintain. Are you really willing to assert that code with single
functions that have over 150 decisions is as easy to write, read, and
maintain as code where no single function has over 14 decisions? Are you
really willing to assert that code with single functions having over 50
parameters is as easy to write, read, and maintain as code where no single
function has over 6 parameters?

Further, such code is also highly likely to contain fewer defects than
otherwise because it is so well-structured. Are you really willing to
claim that code with single functions that have over 150 decisions has no
more defects than code where no single function has over 14 decisions? Are
you really willing to claim that code with single functions having over 50
parameters has no more defects than code where no single function has over
6 parameters?

Ok, I freely admit that I cannot support this (today) with a ton of
empirical research. But having been in the software industry for over 40
years and having worked directly or indirectly with thousands of projects,
I’m pretty damn confident.

“Are you not embarrassed, having to rely on this figure?” ― I certainly
wish there was more, and more reliable, data that I could cite. But this
is all that seems to be available at this point. When more, and more
reliable, empirical data is available I will certainly switch to it
instead.

So let me ask you, “How do you propose to get developers to create code of
any reasonable quality at all?”

Cheers,

― steve

-----Original Message-----
From: Derek M Jones <derek at knosof.co.uk>
Organization: Knowledge Software, Ltd
Date: Wednesday, March 28, 2018 at 3:01 PM
To: Steve Tockey <Steve.Tockey at construx.com>,
"systemsafety at lists.techfak.uni-bielefeld.de"
<systemsafety at lists.techfak.uni-bielefeld.de>
Subject: Re: [SystemSafety] McCabe¹s cyclomatic complexity and accounting
fraud

Steve,

> ³Software complexity is not a number, it is a vector²

A vector of what, low viability metrics?  Throw enough in and
some pattern will emerge?

> Of course one can simply refactor code to reduce Cyclomatic Complexity
>and
> yet the inherent complexity didn¹t go away. It just moved. But that¹s
> kinda the whole point. Knowing that, can I call it, ³local complexities²

The whole point is to commit accounting fraud?
The box has to be ticked and everybody else does it.

> like Cyclomatic Complexity and Depth of Decision Nesting can be traded
> for, can I call it, ³global complexities² like Fan Out, the developer¹s
> goal should be to strike an appropriate balance between them. Not too
>much
> local complexity balanced with not too much local complexity.

What is an "appropriate balance"?  Do you have a formula for this?
What does having an appropriate balance buy you?

> This is covered in a lot more detail in Appendix N of the manuscript for
> my new book, available at:
>
> https://www.dropbox.com/sh/jjjwmr3cpt4wgfc/AACSFjYD2p3PvcFzwFlb3S9Qa?dl=0

You cite an author name for what I take to be this paper:
https://pdfs.semanticscholar.org/e3d6/6c47ee0ddb37868c51ca30840084263ee1f1.
pdf

More of a semi-puff piece paper than a description of serious
research.Anyway, you are relying on Figure 4 to back up your claims
(reproduced
in your figure N-2).

"Figure 4 illustrates the results of using the cyclomatic complexity
metric to analyze one of the PowerBuilder systems."

What about the other 17 systems that the paper refers to?
Do they show very different behavior?

There are several ways of interpreting that plot.  It could be
a percentage of lines of code or a percentage of methods.  The
original paper is not clear.

There are lots of Not Availables in Table 1.

Are you not embarrassed, having to rely on this figure?

>
>
> -----Original Message-----
> From: systemsafety <systemsafety-bounces at lists.techfak.uni-bielefeld.de>
> on behalf of Derek M Jones <derek at knosof.co.uk>
> Organization: Knowledge Software, Ltd
> Date: Wednesday, March 28, 2018 at 7:21 AM
> To: "systemsafety at lists.techfak.uni-bielefeld.de"
> <systemsafety at lists.techfak.uni-bielefeld.de>
> Subject: Re: [SystemSafety] McCabe¹s cyclomatic complexity and accounting
> fraud
>
> Paul,
>
>> There is the reported McCabe Complexity value for each function in a
>> system. Yes, you can do things to reduce individual function complexity,
>> and probably should. However, you then need to take the measure a step
>> further. For every function that calls other functions, you have to sum
>> the
>
> I agree that the way to go is to measure a collection of functions
> based on their caller/callee relationship.
>
> This approach makes it much harder to commit accounting fraud and
> might well produce more reproducible results.
>
>> for the entire system on this basis. It becomes clear when you have too
>> many
>> functions with high complexity factors as it pushed up the average
>> complexity
>> value disproportionately. It still should not be the only measure
>>though.
>
> Where do the decisions in the code (that creates this 'complexity' come
> from)?  The algorithm that is being implemented.
>
> If the algorithm has lot of decision points, the code will contain
> lots of decision points.  The measurement process needs to target the
> algorithm first, and then compare the complexity of the algorithm with
> the complexity of its implementation.  The code only needs looking at
> if its complexity is much higher value than the algorithm.
>

--
Derek M. Jones           Software analysis
tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com

________________________________
This message may contain confidential information and is intended only for the individual(s) named. If you are not a named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this communication by mistake and delete it from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the content of this message that arise as a result of e-mail transmission.

MannKind Corporation All Rights Reserved
30930 Russell Ranch Rd., Suite 301, Westlake Village, CA 91362
[mnkd2018]