[SystemSafety] Kolmogorov complexity measures

Olwen Morgan olwen at phaedsys.com
Mon Nov 19 20:57:54 CET 2018



First a quote from WikiPedia:


"Inalgorithmic information theory 
<https://en.wikipedia.org/wiki/Algorithmic_information_theory>(a 
subfield ofcomputer science 
<https://en.wikipedia.org/wiki/Computer_science>andmathematics 
<https://en.wikipedia.org/wiki/Mathematics>), the*Kolmogorov 
complexity*of an object, such as a piece of text, is the length of the 
shortestcomputer program 
<https://en.wikipedia.org/wiki/Computer_program>(in a 
predeterminedprogramming language 
<https://en.wikipedia.org/wiki/Programming_language>) that produces the 
object as output. It is a measure of thecomputational 
<https://en.wikipedia.org/wiki/Computation>resources needed to specify 
the object, and is also known as*descriptive 
complexity*,*Kolmogorov–Chaitin 
<https://en.wikipedia.org/wiki/Gregory_Chaitin>complexity*,*algorithmic 
complexity*,*algorithmic entropy*, or*program-size complexity*. It is 
named afterAndrey Kolmogorov 
<https://en.wikipedia.org/wiki/Andrey_Kolmogorov>, who first published 
on the subject in 1963.^[1] 
<https://en.wikipedia.org/wiki/Kolmogorov_complexity#cite_note-1> ^[2]" 
<https://en.wikipedia.org/wiki/Kolmogorov_complexity#cite_note-2>

^(Wikipedia should be ashamed for not calling him, "Andrei Nikolaevich 
Kolmogorov". Incorrect transliteration and omission of patronymics is 
inexcusable.)

^Now a C program that attempts to obey in C the kinds of rules that 
you'd have to follow in SPARK Ada. Look at the codings for gcd1 and gcd2:
^

#include <stdio.h>
#include <assert.h>

/* function to compute gcd of two positive integers a, b */
int gcd1(int a, int b)
{
     /* note the single-assignment style:
      *
      * (1): p, q, r are all defined in the smallest possible scope,
      * (2): p, q, r are all initialised at declaration, and
      * (3): each of p, q, r is modified in exactly one place in their 
respective scopes.
      *
      * These usages are intended to make the code easy to analyse.
      */

     int p = a;
     int q = b;
     int r = p%q;

     while (r > 0) /*The local loop invariant can be written down at 
sight from this code */
     {
         p = q;
         q = r;
         r = p%q;
     }

     /* Note: it doesn't matter if a < b when the function is called, 
since the
      * first iteration through the loop will just swap them and then 
carry on
      */

     return q;
}

/* recursive version: shorter than gcd1 but not admissible

  * in circumstances where recursive coding is prohibited
  */


int gcd2(int a, int b)
{
     int r = a%b;

     return (r == 0 ? b : gcd2(b, r));
}


int main (void)
{
     int s = 35;
     int t = 77;

     assert(t != 0);    /* halts if t == 0 to avoid division by 0 in 
functions */

     printf("gcd1(%i, %i) = %i\n", s, t, gcd1(s, t));
     printf("gcd2(%i, %i) = %i\n", s, t, gcd2(s, t));

     return 0;
}


Obviously the recursive implementation would be banned in critical code 
to ensure that storage requirements are compile-time computable. So gcd2 
will not be regarded as acceptable for estimating Kolmogorov complexity.

BUT

can anyone come up with a shorter one than gcd1 that complies with 
reasonable C language subset requirements for critical systems development?


I'm guessing that it's reasonable to measure length in terms of the 
number of lexical tokens (otherwise you'd have to regard as more complex 
an implementation that used longer variable names). If you used the same 
algorithm as gcd1 in SPARK Ada, would it be longer based on lexeme count?


Just an amusing diversion for techies,

Olwen








-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20181119/decb7b11/attachment-0001.html>


More information about the systemsafety mailing list