Jan 27, 2010

Code opera: using Gource to watch a story of Perforce depot

Gource is a neat open-source tool which allows to visualize dynamics of source code repositories, using a planar, spring-based tree layout ("balloon tree").

Showing P4 depot
Why being interested in Perforce in particular - a proprietary, non-free (and rather expensive) software?.. For a couple of reasons:
* It's quite commonly used in a big, corporate software shops (including the one I am working at), and still theoretically can be used in open software projects, for free (as in a beer)
* Gource does not seem to have an official support for it yet (it supports Git/Mercurial out of the box, and SVN/CVS via contrib Python scripts)

As an example we can check Adobe Source Libraries:
export P4PORT=stlab.adobe.com:10666
export P4USER=guest
echo "AdobeGuest" | p4 login
perl gource-p4.pl //adobe_source_libraries/ > gil.log
gource --log-format custom gil.log
(it may take a little while to gather the log file).

Another example is Jam (open-source build system) depot. This time in Windows command line:
p4 set P4PORT=public.perforce.com:1666
p4 set P4USER=anon
p4 login
perl gource-p4.pl //public/jam/src/ > jam.log
gource --log-format custom jam.log
The gource-p4.pl script looks like this:
#!/usr/bin/perl

$root = $ARGV[0];
map { my($change, $user) = $_ =~ /Change (.*) on.*by (.*?)@.*/;
    map { my($file, $act, $time) = 
            $_ =~ /$root(.*)\n.+headAction (.*)\n.+\n.+headTime (.*)\n.*/; 
        $act =~ s/edit|integrate|branch/M/; 
        $act =~ s/delete/D/; $act =~ s/add/A/; 
        $line = $time."|".$user."|".$act."|".$file."|\n";
        print $line if $file;
    } split(/\.+ depotFile /, `p4 fstat -e $change $root...`);
} reverse split(/\n+/, `p4 changes $root...`);


Granted, both examples of open-source Perforce depots visualizations do not look very exciting, because neither of them seems to be a primary development depot, and most of the commits are done by a few maintenance people.

However, I've also tried it at the company where I work, watching one-year's worth evolution of the main code branch, and that was quite insightful.

Gource also allows to supply a folder with images to be used as user portraits, which adds personality to the show, so I used that as well.

Stories in code
And I was quite amazed with what I've seen: the code tells stories.

To an outsider it might look like just a boring pictures flying around and zapping with lasers some colored circles tagged by the file names.
And generally that's the way how it is - plain boring. Boring when observed out of the context, that is.

And even if one of those flying pictures was myself, I was quite under-impressed at first.

But then small stories began to pop in my head:

"Oh yes, I remember that refactoring when part of public API was renamed and it triggered the chain reaction..."
"Here, ... Bill integrates his branch to ours"
"Oh, he found the reason for that bug everyone was puzzled about... see, three seemingly unrelated folders were changed to fix it"
"We have quite an activity of people working at the same on that clump of files, maybe we should improve modularity there?.. and then some people have to touch too many different files at once, it seems... could it be bad cohesiveness of code?"
"Yeah... there, Peter makes that commit with funny comments we laughed at all together"
"Right... Christmas. No commits, obviously... And then, see - what a rush! People fly around like crazy, apparently having a few days off was a good idea"
"And here Mark and Frank are working on that new API together... it's nice to see an activity in the unit tests' folders"
"Oh, see, John have just committed his new system there, in that new tidy bunch of files... cool stuff, he knows his job... surgical changes"
"Ah, that... such a pity we had to revert that... And now - ha-ha, see, Brandon started to fly around like crazy, such a productivity again, huh?.. What was that? A new girlfriend or something?"

And so on.

That really reminds me the topic of emergent storytelling in games, when players create their own stories and enjoy them:
Human beings like stories. Our brains have a natural affinity not only for enjoying narratives and learning from them, but also for creating them. In the same way that your mind sees an abstract pattern and resolves it into a face, your imagination sees a pattern and resolves it into a story.
Games have always had a close affinity with story-making. Adding a few lines of description to a video game or a background and artwork to an abstract board game gives dramatic context and an added sense of depth, allowing the player to create an internal narrative as the game progresses.
(from "Second Person: Role-Playing and Story in Games and Playable Media")

Here it's not a game, but rather a meta-game of a kind (it's quite ironic that this is rather a process of making a game in this particular case, because I work in a game development company). There are still quite a few parallels, it seems.

Quite frankly, Gource provides only the most basic display. It does not tell too much about the code structure and how well different areas of work are factored between different contributors (even though quite often well-though folder structure is a sign of overall good code organization).

But I can imagine even more narrative devices added, for example:
* Fetching the information from the continuous integration system and display some crazy particle explosions when build was broken
* Doing similar, but even more intense display when automatic regression tests were broken
* Integrate with bug tracker, and give some visual cues about the nature of work being done (bugs, regressions, crashes, new features)
* Visually tracking the physical code dependencies as well (like #include graphs in C++) and their evolution

I believe all this is can be useful in a practical sense as well.
It seems that entertaining things generally have bigger cognitive potential. Andy Hunt tells in "Pragmatic Thinking&Learning":
...In fact, additional studies have shown exactly that: positive emotions are essential to learning and creative thinking. Being "happy" broadens your thought processes and brings more of the brain's hardware online.
Aesthetics make a difference, whether it's an user interface, the layout of your code and comments, the choices of variable names, the arrangement of your desktop or whatever.
Nice things matter, especially if they provide a useful view, yet another mental projection upon the code base, which allows to extend one's mental models of this code in a fun, aesthetically pleasing way.

Work should not necessary be boring.

So what?..
From the practical perspective, one quite satisfying thing to me was that such generally useless one evening exercise gave me quite a bit of awareness about several (seemingly unrelated) things:

* Reading the source code of Gource (C++), compiling and running it both on Linux and Windows
* Making up somewhat artificial problem in this context (how to use it with Perforce) and figuring out what steps can be taken to solve it
* Getting to know a little bit of Perl (yes, this is my first Perl script ever... it must be quite obvious from the code, anyway), while solving a real-life problem
* Trying to do it in a (somewhat) functional programming style
* Improving my regular expressions skills
* Getting to know a bit of p4 command line interface, again while solving a real-life problem
* Getting some insight about the Adobe GIL and Perforce Jam open source code bases
* Using Gource to visualize evolution of the code base at the company where I work.
* Reflecting on it

I am certainly not sure if regular solving of such small artificial problems (and learning a few random things while doing that) make you a better developer or something.
Perhaps it does. But if it does not - who cares?..
It was fun, after all.

Jan 20, 2010

Boost C++ libraries and game engines


There is a neat development cost calculator on the boost web site. Pretty easy to use. You enter "code only" into the "include" combo box and get your magic answer:
$188,229,523
Wow. I mean... Wow! You get almost two hundred millions for free, into your personal disposal!
Only that makes it worth including Boost into your project, whatever you develop. Right?..

If we talk game engines in particular, there is certainly no need for all the excessive functionality the Boost libraries provide.
However, there are surprisingly many parts of it, which I've seen being reimplemented in the small and big game engines over and over again (and, frankly, implemented quite a few bits of functionality myself).
Such as:
 - these usually find their place in the "core" part of the game engine. Also there are more specialized parts, like:
All of these facilities (and many more) can be considered of direct use for your typical modern game engine.

Also, it's no doubt that boost code is of very high quality, tried and true, well-tested, portable and besides it is slowly moving into C++ standard. It uses template metaprogramming, which presumably in certain cases can improve efficiency due to aggressive inlining and doing some stuff at compile time. The code is also considered to be generic, so one is supposed to be able to flex it to high degrees adapting to one's own needs.

Jason Gregory in Game Engine Architecture book (in my opinion, rather good one) mentions the topic:
  • Boost provides a lot of useful facilities not available in STL.
  • In some cases, Boost provides alternatives to work around certain problems with STL's design or implementation.
  • Boost does a great job of handling some very complex problems, like smart pointers. (Bear in mind that smart pointers are complex beasts, and they can be performance hogs. Handles are usually preferable; see Section 14.5 for details).
  • Th Boost libraries' documentation is usually excellent. Not only does the documentation explain what each library does and how to use it, but in most cases it also provides an excellent in-depth discussion of the design decisions, constraints, and requirements that went into constructing the library. As such, reading the Boost documentation is a great way to learn about the principles of software design.
If you are already using STL, then Boost can serve an excellent extension and/or alternative to many STL's features. However, be aware of the following caveats:
  • Most of the Boost classes are templates, so all one needs in order to use them is the appropriate set of header files. However, some of the Boost libraries build into rather large .lib files and may not be feasible for use in very small-scale game projects.
  • While the world-wide Boost community is an excellent support network, the Boost libraries come with no guarantees. If you encounter a bug, it will ultimately be your team's responsibility to work around or fix it.
  • Backward compatibility may not be supported.
  • The Boost libraries are distributed under the Boost Library License. Read the license information carefully to be sure it is right for your engine.
But frankly, while I agree about smart pointers and documentation, there are bigger  concerns usually popping up regarding Boost:
  • Compiling time - due to all inter-dependencies and heavy template use
  • Code readability 
  • Performance (usually as a tradeoff for flexibility and safety, there are quite a few horror evidences)
  • Easiness of misuse and building extra complexity out from nothing
  • Versioning problems
  • Huge size of the library itself, when used as a third-party dependency
It turns out that there is still a lot of controversy regarding the very topic of using boost in games. People are quite often cautious about it, and in many cases avoid using it altogether.

Of course, it does not really come as much surprise in regards to the game industry as specific branch of software development. The mental model of your typical game engine architecture, sketched directly from my head into an (almost) UML diagram looks like this:

...which hopefully explains a lot.

Recently I've started to try porting some of five years old code to Linux, and on the quest of eradicating windows.h dependencies found out that there is a class called "FilePath", which allows to do some basic filesystem operations and uses WinApi directly.

It came into my mind that its functionality is an ad-hoc implementation of a subset of what boost::filesystem library provides. Except that the latter is portable, has more functionality, is more stable and well-documented.

There are certain biases I've got, and due to all aforementioned factors (including the NIH syndrome), the decision did not seem to be as simple to make.
Another mental model of mine has crystallized with time, and here's the sketch of it:


I remember Joe Armstrong saying in Coders at Work:
Seibel: But do you think it's really feasible to really open up all those black boxes, look inside, see how they work, and decide how to tweak them to one's own needs?
Armstrong: Over the years I've kind of made a generic mistake and the generic mistake is not to open the black box. To mentally think, this black box is so impenetrable and so difficult that I won't open it[...] But it's not actually difficult. [...] ...you should certainly consider the possibility of opening them.
Shattering the personal biases by means of opening black boxes sounds like a good plan.
So, where do all the dependencies come from?
Here's the layout of the whole boost include directory (which is roughly 50 Mb, while the whole boost directory is 200 Mb), made with StepTree:



We don't really need all of that. Luckily, there is bcp utility, which is part of boost distribution and does exactly that - it allows to strip away only the subset of code needed. So, if I am interested in boost::filesystem library:
cd boost
mkdir ../boost_fs
bcp --boost=boost filesystem.hpp boost_fs
 It creates boost_fs folder, which has only the code needed to compile this particular library and use it. However, this folder is around 8 Mb, which is a bit more than could be expected (keep in mind that most of this code is header files, which most certainly are going to be included into the project). The layout looks like this (this time rendered in WinDirStat):

The boost::filesystem code itself takes just about 3%! The rest is occupied mostly by:
  • mpl - which is an "all-around c++ template tricks" kind of library
  • preprocessor - which is a library to do the copypasting job for the programmer in very smart way
  • type_traits - which allows to get and use some basic information about templated types in case of templated classes 
Granted, code reuse is generally a good thing, and  boost (having the goal of being highly reusable library itself) is known to do it heavily, cross-referencing between the libraries. But this looks a bit extreme.
Ironically, though, the development guidelines for the Boost libraries do discuss the topic of excessive library interdependencies.
Let's run again bcp in "report mode":
bcp --report filesystem.hpp fs_report.html
This generates html report file with all the dependencies gathered. At the bottom of this file there are all the include chains gathered (the file is big, mind you).
This representation is only helpful in showing that there are very many header inclusion paths starting from boost/filesystem.hpp and most of them end in either of other aforementioned boost libraries.
Let's try to build an inclusion graph using Doxygen:
sudo apt-get doxygen graphviz
cd ../boost_fs
doxygen -g
emacs Doxyfile
Then edit the file to ensure parameters are set:
HAVE_DOT = YES
RECURSIVE = YES
EXTRACT_ALL = YES
Save, and then:
doxygen Doxyfile
The part of inclusion graph for the main header, boost/filesystem.hpp look like this:


The gate to the flurry of includes seems to be opened via boost/iterator/iterator_facade.hpp, which has another graph on it's own:

Browsing these graphs on its own already starts to give some insight, but to get things even more clear we could try to inspect even different representation of the source code... the source code itself.

Doing that, it becomes more clear where do these dependencies come from. For example, there is a directory iterator, which is templated by the path type, which in fair enough, as the path can have possibly Unicode representation. Iterator classes are generic in a sense that they don't do many assumptions about the particular type uses, and thus require the general metaprogramming facilities provided by mpl and typetraits. Mpl, in turn, uses preprocessor libraries for its needs. And then there are preprocessed headers for several compilers... quite a lot of things are happening in order to provide the functionality which you actually may not need.

In this particular case there is, for example, physfs library which might suite the needs even better without providing nearly all the generic facilities as boost does.

Does it mean that there is no use for boost? Of course not:
  • One still can use it, but physical dependencies tracking is important even more than usual. In case of filesysem it could be wrapped with an additional interface, so as few cpp translation units as possible include boost/filesystem.hpp and the rest which is pulled in together with it.
  • One can also possibly take the parts of interest and "strip" them out of unneeded generic parts. Here's a good example:  OgreAny.h
  • One can read the documentation and, most importantly, the code to borrow good practices and ideas (and to examine the "bad" ones as well).
Whatever you do, just make sure you crack the black box open first.

Jan 11, 2010

Programmer's arrogance graph

We know that majority of programmers, even the most ingenious ones, are egotistical bastards, driven by arrogance.

Programmer's arrogance is both a blind, powerful driving force and also the reason for many failures.
For example, anecdotal evidence tells that initial programmer's estimate has to be multiplied at least twice to get the "realistic estimate":
Why are competent coders so bad at estimating? There are a number of reasons.  The main ones are:
  • Unforeseeable problems:
Many of the problems that come up during software development are unforeseeable.  If you have ever started a "simple" home improvement project and later found it was much more complicated than you realized...then you know first hand how programming can be...even for the experts.

  • Misunderstood/unclear requirements:
When the requirements are unclear, the programmer usually underestimates what it takes to build your software.  To use an analogy, they may estimate building your software as if it were a comfortable house.  Only mid-project do they realize that you were expecting the Taj Mahal! 
Or they just fall a victim to their own arrogance. Which is a bit simpler explanation.
Same stands for "Not Invented Here", "Invented here but not by myself", "My co-workers are all jerks" syndromes. And many more. Oh well.

Now, I've got a theory (which is built on empirical experience, of course), that if we imagine that amount of "arrogance" can be measured with a scalar value (let's call it a "magnitude of arrogance"), and if we try to build a graph of this value changing with the lifetime of the programmer, we might get pretty similar shape in 80% of cases:



The person (remember that we talk about future programmer) starts on some level of initial arrogance as a child/teenager, goes to school where everything is very new and unknown at first (point marked as "S" on the lifeline, which is horizontal).

But then he (not being sexist, just intentionally taking only males here) suddenly realizes that he's the "smartest kid in the class". The next thing he figures out is that his programming teacher "does not know a sh*t". Besides some boring, irrelevant and ages old stuff, that is. And sure he does not have any bleeding edge knowledge about, say, patching KDE under freeBSD.

Naturally, this skyrockets the level of arrogance significantly, and it keeps fluctuating somewhere at the top until our geek graduates and gets a job ("J").

Here the arrogance might drop down slightly again, because of the things being new and not familiar.
However, being used to the "best kid in the class" status, our soon-to-become programmer quite quickly catches up.

It might happen because of initial tasks being moderately doable and initial responsibility being not quite high, so this "gee, I can do the stuff" feeling warms up the ego.

Also, when coming to a corporate environment, young programmers often get a maintenance job in some legacy codebase. Knowing the nature of (most of the) legacy code bases, one would not get surprised if quite soon our kid gets to realize the thing.

See, it happens that all these "experienced" folks out there don't know a sh*t about programming either. As Gerald M. Weinberg puts it:
We often find material in programs that is [...] really present because of the history of the development of the program. For example, once the [...] function is changed to an [...] function, there is no longer any reason for the program [...] to appear. Nevertheless, things being what they are in the programming business, it is unlikely that anyone is going to delve into a working program and modify it just because the definition of [...] function has been changed. And so, some years later, a novice programmer who is given the job of modifying this program will congratulate himself for knowing more about [the subject] than the person who originally wrote this program. Since that person is probably his supervisor, an unhealthy attitude may develop - which, incidentally, is another psychological reality of programming life which we shall have to face eventually.
Soon he realizes, though, that things do not quite work as it appeared at first. There are several failures happening, which he fortunately realizes should be blamed on himself, and he suddenly faces the understanding that he knows, in fact, nothing about his job.

Then goes a long and painful learning, getting better and better every day, and finally getting to the next level of personal development, where one can look back and say: "see, I am not nearly as lame as I used to be when I started".

So we get another boost of overconfidence, which may or may be not followed by similar, multiple and abrupt drops and consequent slow raises.

What usually happens next is the point "P".
Which is promotion to the lead position (it does not seem that there is too many ways to get "promoted" in corporate programming business without becoming a manager), or becoming an entrepreneur, or changing the working place to work in considerably more "advanced" company. You name it.

The fresh feeling of power gives another arrogance boost. It only lasts that long, though, and soon again come disappointments.

And so on.

After many bumps on the road, if being lucky, passionate, persistent and introspective, one might slowly approach the level, which is marked as "H" on the vertical axis.

Which is "The Humble Programmer", as E.W.Dijkstra puts it:
We shall do a much better programming job, provided that we approach the task with a full appreciation of its tremendous difficulty, provided that we stick to modest and elegant programming languages, provided that we respect the intrinsic limitations of the human mind and approach the task as Very Humble Programmers.
The question is how close projection of this point upon the horizontal axis will happen to be to the point "R" (retirement).

And to the point "D", which comes next.