Apr 1, 2016

Happy Bleeding Edge

Good day to you, dear friends!

The upcoming C++19 standard is upcoming very soon, and the word is it's going to have many exciting new features!

Just to mention a few:

  • The language becomes garbage-collected. It means you never need to use "delete" anymore. For the backwards compatiblity, it's not an error if you still do, though
  • The curly braces syntax becomes deprecated (in an attempt to get rid of the "curly brace languages" club bad rep). The Python-like syntax is now preferred
  • One Definition Rule is now relaxed. You can define all you want; the winner is picked randomly at runtime using the Mersenne Twister
  • Going forward, friends are not anymore allowed to see each other privates
  • It is a compilation error to have tabs in the code. If the amount of comment lines in a file is less than 33.3%, it is also a compilation error (this should force good coding practices)
  • Substitution Failure Is Now An Error
  • Resource Acquisition Is No Longer an Initialization
  • The syntax for the main() function has been modernized (the correct syntax is now "public static void main ...")
  • A new modifier keyword, thread_safe, has been introduced. Insert it anywhere into your code to make it thread-safe
  • The standard library has been extended with many new useful algorithms. For example, std::fast_sort allows to sort arrays in a constant time (the precondition is that the input should be ordered)
  • Now it is OK to divide by zero, dereference NULL pointers and compare floating point values for equality: nothing bad will ever happen!

In order to keep our skills up to date and bleeding edge, let's practice some C++19.

I've prepared a little C++19 quiz for you:

What does this code do?

// note that this code requires a modern C++19 compliant compiler
#include <stdio.h>

int O = 5
char buf[6]
public static void main() thread_safe:
    float m[] = { 0.16437186, 0.57526314, 0.20729951, 0.13258759, 0.23461139 }
    for (int i = O; i --> 0;):
        for (int j = O, _= *(int*)(i + m); j --> 0; _>>=O):
            j[buf] = O*13 + (_&31)
        printf("%s\n", buf)

Enjoy!

NOTE: if you don't have an access to a bleeding-edge C++19 compiler, the code could be easily backported to a C++98 standard - just replace the Pythonic code blocks and Javaesque main() signature with an old C++ one!


UPDATE, APR 6 2016:

Posted on the 1st of April, the list of “new C++19 standard features” was indeed an "obviously non-funny" april’s fool thing.

Of course, there is no such thing as C++19, neither the committee would ever consider features as ridiculous as those mentioned (which is a shame, I would personally love to have tabs banished from existence :)).

As silly as the joke was, I tried to allude to a few topics there, in particular about “syntax vs semantics” in programming languages.

Or “form vs shape”, to put it in a philosophical perspective.

One can often hear arguments about the language syntax. The discussions are usually more heated at the levels of the “curly braces”, since this is the area where everyone is entitled to their own opinion (pretty much like on “tabs vs spaces”).

Let’s look at several forms of the same piece of code, in some hypothetical programming languages:

snippet 1:

for (i = 0; i < 5; i++) {
    for (j = 0; j < 5; j++) {
        println(i*j);
    }
}

snippet 2:

(for [i 0] (< i 5) (inc i) 
    (for [j 0] (< j 5) (inc j)
        (println (* i j))))

snippet 3:

for i = 0; i < 5; i++:
    for j = 0; j < 5; j++:
        println(i*j)

snippet 4:

for i = 0, i < 5, i = i + 1 do
    for j = 0, j < 5, j = j + 1 do
        println(i*j)
    end
end

snippet 5:

DO 1 i=0,4
   DO 1 j=0,4
      1 PRINT *,i*j

snippet 6:

For i = 0 To 5 Step 1
    For j = 0 To 5 Step 1
        Print(i*j)
    Next j
Next i

Which one is the “best”?..

Whatever your opinion is, let’s not forget that language syntax is purely a human, psychological construct!

In that light, it’s a straightforward code transformation to come back from the “C++19” syntax to a valid C++ syntax. We just replace Python-style code blocks with C++ curly braces, and of course remove the pseudo-Java main() function signature:

#include <stdio.h>

int O = 5;
char buf[6];
int main() {
    float m[] = { 0.16437186, 0.57526314, 0.20729951, 0.13258759, 0.23461139 };
    for (int i = O; i --> 0;) {
        for (int j = O, _ = *(int*)(i + m); j --> 0; _>>=O) {
            j[buf] = O*13 + (_&31);
        }
        printf("%s\n", buf);
    }
}

This code does not require a “C++19-compliant compiler” anymore. In fact, it compiles via a C99 compiler, runs and prints something!

It’s still not comprehensible, though. So let’s remove some more cruft, including the infamous “--> operator” and quirky C array indexing and try to find better names for the variables:

#include <stdio.h>

const int   BITS_PER_CHAR = 5;
const int   MASK = 31; // binary b11111, corresponding to BITS_PER_CHAR
const float WORDS_F[] = { 0.16437186, 0.57526314, 0.20729951, 0.13258759, 0.23461139 };
const int   NUM_WORDS = 5;
const int   NUM_CHARS = 5;

char buf[6];
int main() {
    for (int i = NUM_WORDS - 1; i >= 0; i--) {
        int bits = *(int*)&WORDS_F[i]; // reinterpret float value as int, bitwise
        for (int j = NUM_CHARS - 1; j >= 0; j--) {
            buf[j] = 'A' + (bits&MASK);
            bits >>= BITS_PER_CHAR;
        }
        printf("%s\n", buf);
    }
}

Now we can see (well, more or less) what does it try to do.

It extracts bit patterns (5 bits each) from the binary representation of the floating point values, interprets them as characters and prints that as words, one word per floating point value.

The floating point values are picked up specifically to encode some message.

Assuming that float is a single precision IEEE-754 floating point value (which most of the time is the case, but is not ultimately guaranteed), it is represented via 32 bits of data:

  • Mantissa in the lowest 23 bits
  • 8 bits for the shift-negative exponent
  • 1 bit for sign in the highest bit

Each character in the text is encoded via 5 bits, which is an offset from the character 'A' (i.e. 0 is 'A', 1 is 'B', 2 is 'C' etc.)

So in total each text word, such as "HAPPY", would take 25 bits. This is the whole 23 bits of mantissa plus cutting extra two bits into the exponent:

Note that unused exponent bits are padded with "011111", which is done in order to have the magic numbers look "nice":

So in the end, given a few assumptions we make about the current platform, the code would print:

HAPPY
APRIL
FIRST
RGRDS
CQUIZ

In addition to the "syntax argument fallacy", I'll throw in a few more maximas I was trying to allude to:

  • Garbage collection does not necessarily magically help to solve all the memory problems
  • It is essential to have a deterministic behaviour in the code
  • Thread safety is not something that can be achieved by magic, one still has to understand what's going on in a concurrent environment
  • Encapsulation is essential in design, but just having a "private" keyword in the language does not magically enable it
  • One has to know what are the complexity guarantees of the standard (or any other) library algorithms, if any. Otherwise it's just relying on magic (see above)
  • Certain things are fundamental to the way hardware works, and programming language can't be blamed for them not working "as expected"

There are many more programming language design related fallacies that people, myself included, may be subjected to.

I believe that working with such a complicated language as C++ does not necessarily make one a better programmer.

But it may, in a sense, force one into a certain awareness about such fallacies' existence.

No comments :