Jan 20, 2010

Boost C++ libraries and game engines

There is a neat development cost calculator on the boost web site. Pretty easy to use. You enter "code only" into the "include" combo box and get your magic answer:
Wow. I mean... Wow! You get almost two hundred millions for free, into your personal disposal!
Only that makes it worth including Boost into your project, whatever you develop. Right?..

If we talk game engines in particular, there is certainly no need for all the excessive functionality the Boost libraries provide.
However, there are surprisingly many parts of it, which I've seen being reimplemented in the small and big game engines over and over again (and, frankly, implemented quite a few bits of functionality myself).
Such as:
 - these usually find their place in the "core" part of the game engine. Also there are more specialized parts, like:
All of these facilities (and many more) can be considered of direct use for your typical modern game engine.

Also, it's no doubt that boost code is of very high quality, tried and true, well-tested, portable and besides it is slowly moving into C++ standard. It uses template metaprogramming, which presumably in certain cases can improve efficiency due to aggressive inlining and doing some stuff at compile time. The code is also considered to be generic, so one is supposed to be able to flex it to high degrees adapting to one's own needs.

Jason Gregory in Game Engine Architecture book (in my opinion, rather good one) mentions the topic:
  • Boost provides a lot of useful facilities not available in STL.
  • In some cases, Boost provides alternatives to work around certain problems with STL's design or implementation.
  • Boost does a great job of handling some very complex problems, like smart pointers. (Bear in mind that smart pointers are complex beasts, and they can be performance hogs. Handles are usually preferable; see Section 14.5 for details).
  • Th Boost libraries' documentation is usually excellent. Not only does the documentation explain what each library does and how to use it, but in most cases it also provides an excellent in-depth discussion of the design decisions, constraints, and requirements that went into constructing the library. As such, reading the Boost documentation is a great way to learn about the principles of software design.
If you are already using STL, then Boost can serve an excellent extension and/or alternative to many STL's features. However, be aware of the following caveats:
  • Most of the Boost classes are templates, so all one needs in order to use them is the appropriate set of header files. However, some of the Boost libraries build into rather large .lib files and may not be feasible for use in very small-scale game projects.
  • While the world-wide Boost community is an excellent support network, the Boost libraries come with no guarantees. If you encounter a bug, it will ultimately be your team's responsibility to work around or fix it.
  • Backward compatibility may not be supported.
  • The Boost libraries are distributed under the Boost Library License. Read the license information carefully to be sure it is right for your engine.
But frankly, while I agree about smart pointers and documentation, there are bigger  concerns usually popping up regarding Boost:
  • Compiling time - due to all inter-dependencies and heavy template use
  • Code readability 
  • Performance (usually as a tradeoff for flexibility and safety, there are quite a few horror evidences)
  • Easiness of misuse and building extra complexity out from nothing
  • Versioning problems
  • Huge size of the library itself, when used as a third-party dependency
It turns out that there is still a lot of controversy regarding the very topic of using boost in games. People are quite often cautious about it, and in many cases avoid using it altogether.

Of course, it does not really come as much surprise in regards to the game industry as specific branch of software development. The mental model of your typical game engine architecture, sketched directly from my head into an (almost) UML diagram looks like this:

...which hopefully explains a lot.

Recently I've started to try porting some of five years old code to Linux, and on the quest of eradicating windows.h dependencies found out that there is a class called "FilePath", which allows to do some basic filesystem operations and uses WinApi directly.

It came into my mind that its functionality is an ad-hoc implementation of a subset of what boost::filesystem library provides. Except that the latter is portable, has more functionality, is more stable and well-documented.

There are certain biases I've got, and due to all aforementioned factors (including the NIH syndrome), the decision did not seem to be as simple to make.
Another mental model of mine has crystallized with time, and here's the sketch of it:

I remember Joe Armstrong saying in Coders at Work:
Seibel: But do you think it's really feasible to really open up all those black boxes, look inside, see how they work, and decide how to tweak them to one's own needs?
Armstrong: Over the years I've kind of made a generic mistake and the generic mistake is not to open the black box. To mentally think, this black box is so impenetrable and so difficult that I won't open it[...] But it's not actually difficult. [...] ...you should certainly consider the possibility of opening them.
Shattering the personal biases by means of opening black boxes sounds like a good plan.
So, where do all the dependencies come from?
Here's the layout of the whole boost include directory (which is roughly 50 Mb, while the whole boost directory is 200 Mb), made with StepTree:

We don't really need all of that. Luckily, there is bcp utility, which is part of boost distribution and does exactly that - it allows to strip away only the subset of code needed. So, if I am interested in boost::filesystem library:
cd boost
mkdir ../boost_fs
bcp --boost=boost filesystem.hpp boost_fs
 It creates boost_fs folder, which has only the code needed to compile this particular library and use it. However, this folder is around 8 Mb, which is a bit more than could be expected (keep in mind that most of this code is header files, which most certainly are going to be included into the project). The layout looks like this (this time rendered in WinDirStat):

The boost::filesystem code itself takes just about 3%! The rest is occupied mostly by:
  • mpl - which is an "all-around c++ template tricks" kind of library
  • preprocessor - which is a library to do the copypasting job for the programmer in very smart way
  • type_traits - which allows to get and use some basic information about templated types in case of templated classes 
Granted, code reuse is generally a good thing, and  boost (having the goal of being highly reusable library itself) is known to do it heavily, cross-referencing between the libraries. But this looks a bit extreme.
Ironically, though, the development guidelines for the Boost libraries do discuss the topic of excessive library interdependencies.
Let's run again bcp in "report mode":
bcp --report filesystem.hpp fs_report.html
This generates html report file with all the dependencies gathered. At the bottom of this file there are all the include chains gathered (the file is big, mind you).
This representation is only helpful in showing that there are very many header inclusion paths starting from boost/filesystem.hpp and most of them end in either of other aforementioned boost libraries.
Let's try to build an inclusion graph using Doxygen:
sudo apt-get doxygen graphviz
cd ../boost_fs
doxygen -g
emacs Doxyfile
Then edit the file to ensure parameters are set:
Save, and then:
doxygen Doxyfile
The part of inclusion graph for the main header, boost/filesystem.hpp look like this:

The gate to the flurry of includes seems to be opened via boost/iterator/iterator_facade.hpp, which has another graph on it's own:

Browsing these graphs on its own already starts to give some insight, but to get things even more clear we could try to inspect even different representation of the source code... the source code itself.

Doing that, it becomes more clear where do these dependencies come from. For example, there is a directory iterator, which is templated by the path type, which in fair enough, as the path can have possibly Unicode representation. Iterator classes are generic in a sense that they don't do many assumptions about the particular type uses, and thus require the general metaprogramming facilities provided by mpl and typetraits. Mpl, in turn, uses preprocessor libraries for its needs. And then there are preprocessed headers for several compilers... quite a lot of things are happening in order to provide the functionality which you actually may not need.

In this particular case there is, for example, physfs library which might suite the needs even better without providing nearly all the generic facilities as boost does.

Does it mean that there is no use for boost? Of course not:
  • One still can use it, but physical dependencies tracking is important even more than usual. In case of filesysem it could be wrapped with an additional interface, so as few cpp translation units as possible include boost/filesystem.hpp and the rest which is pulled in together with it.
  • One can also possibly take the parts of interest and "strip" them out of unneeded generic parts. Here's a good example:  OgreAny.h
  • One can read the documentation and, most importantly, the code to borrow good practices and ideas (and to examine the "bad" ones as well).
Whatever you do, just make sure you crack the black box open first.


Unknown said...

Very excellent weblog right here.....!Your website rather a lot fast up ! What web host are you the usage? Can I am getting your associate hyperlink to your host? I wish my web site loaded up as fast as yours lol
Go on Bliss Hosting Co for the best hosting in all over the world

Unknown said...

I recently found many useful information in your website especially this blog page.Children Music classes Sydney Among the lots of comments on your articles. Thanks for sharing.