#### Prologue

Once upon a time, there was a kind and happy farmer, named Bob.

Every year he planted a row of ten thousand carrots, and that would yield a few truckloads of the crunchy vegetables.

Thing is, the carrots needed to be planted very carefully.

He was a firm believer that in order to get a great crop, one should plant the carrots at exactly 1.25 inches distance from each other, so that every carrot gets enough room for growth, but at the same time does not stand too far away from the neighbors.

Like this:

(also, the carrot row was offset from the start of the field by exactly 1254.8 inches).

Year after year, he was doing it manually, until once he discovered that he could learn to program computers and automate the carrot planting process with their help.

#### Bob's program to plant the carrots

Armed with this new knowledge and with a C++ compiler, he jumped right into writing the code (it did look like quite a simple task, mind you):

So, there was a piece of code to set up the constants:

```
const int NUM_CARROTS = 10000; // total number of carrots to plant
const float OFFSET_WIDTH = 1254.8f; // distance to the start of the carrot patch (inches)
const float PLANT_INTERVAL = 1.25f; // the optimal planting interval (inches)
```

And the actual planting code:

```
float carrot_pos[NUM_CARROTS]; // carrot positions go here
// plants a single carrot
void plant_carrot(int idx) {
carrot_pos[idx] = OFFSET_WIDTH + idx*PLANT_INTERVAL;
}
// plants all carrots
void plant_carrots() {
for (int i = 0; i < NUM_CARROTS; i++) plant_carrot(i);
}
```

And the code to print the resulting carrot positions:

```
void print_carrots() {
for (int i = 0; i < NUM_CARROTS; i++)
printf("Carrot %d goes to %.2f inches\n", i, carrot_pos[i]);
}
```

Easy! When running, it would print:

```
Carrot 0 goes to 1254.80 inches
Carrot 1 goes to 1256.05 inches
Carrot 2 goes to 1257.30 inches
Carrot 3 goes to 1258.55 inches
...
```

Up until carrot number 9999. Precisely as Bob expected, totally matching his previous manual planting experience.

Inspired by the great success, Bob decided to pursue the software development best practices, and write a test for this code.

The test would take every carrot and check the distance to the previous one (this distance must be exactly 1.25 inches, as already mentioned):

```
// finds if a single carrot is planted correctly
bool is_planted_ok(int idx) {
return (carrot_pos[idx] - carrot_pos[idx - 1]) == PLANT_INTERVAL;
}
// checks that all of the carrots are planted correctly
void test_carrots() {
int num_bad = 0;
for (int i = 1; i < NUM_CARROTS; i++) {
if (!is_planted_ok(i)) {
printf("The carrot %d is planted incorrectly!!!\n", i);
num_bad++;
}
}
if (num_bad == 0) printf("All carrots placed well!!!\n");
else printf("%d of %d carrots placed incorrectly... :(\n", num_bad, NUM_CARROTS);
}
```

#### Meet the Angry Carrot

Confident, Bob run the test.

And imagine his dismay! Instead of the expected "All carrots placed well", to see this:

```
The carrot 2273 is planted incorrectly!!!
1 of 10000 carrots placed incorrectly... :(
```

The carrot number 2273! A single, an only one from 10000! Why, in the whole world?..

Not believing his own eyes, he tried and double checked the printout:

```
...
Carrot 2271 goes to 4093.55 inches
Carrot 2272 goes to 4094.80 inches
Carrot 2273 goes to 4096.05 inches
Carrot 2274 goes to 4097.30 inches
Carrot 2275 goes to 4098.55 inches
...
```

The numbers look perfectly fine on the screen! So why does the test fail???

Despaired, Bob turned to the online forums.

There were a lot of very helpful people there. They immediately suggested him to read a fundamental work called "What every carrot planter should know about carrot planting", where he'd presumably find answers to all of his questions.

Except he didn't; the fundamental work was undeniably full of wisdom (judging by the amount of Greek symbols in there, for one thing) but it never mentioned a single carrot, neither the possible repercussions of their planting thereof.

Suffice to say, that was a real disaster.

#### Meet the Floating Point Fairy

Indeed, that's when the Floating Point Fairy appears:

"Hello, Bob. I am the Floating Point Fairy, and I popped out from the thin air in order to help you. Or at least, try to. You are not the first one experiencing these troubles, you know."

#### Decimal in binary and inexact representation

Let's start from a revelation:

*Number 1254.8 is not what what you think it is. Neither is 0.8, nor a lot of other normally-looking, nice decimal fractional numbers.*

You know, 1254.8 is a decimal value, but inside computers all numbers are always binary (well, except maybe for Soviet Russia). And they use a special binary floating point representation: there is a bit for the sign, a few bits for the "exponent", and the rest is used for the so-called "mantissa"... but first let's see how do we get there.

There are certain rules of how this decimal to binary number conversion happens. We can look at the whole part (before the dot) and the fractional part (after the dot) separately: those two get simply combined afterwards, to get the resulting number in the binary representation.

##### Decimal to binary (whole numbers)

For the whole number the decimal->binary conversion algorithm is to iteratively divide the number by 2, until we get to 0. At every step, the modulo of division contributes to the final binary number (added from the left side).

For example, to convert decimal number 135 to binary we would perform a sequence of steps:

Current Decimal | Current Binary |
---|---|

135 | |

67 | 1 |

33 | 11 |

16 | 111 |

8 | 0111 |

4 | 00111 |

2 | 000111 |

1 | 1000111 |

10000111 |

Thus, 135 (Decimal) == 10000111 (Binary).

Note that the resulting binary representation is always finite (since the original decimal number is finite, and since we are dividing by 2, we'll eventually get to 0).

##### Decimal to binary (fraction, sum of powers of 1/2)

For the fractional part, the algorithm is a kind of reverse: we iteratively multiply by 2 (until we get to 1), and the leftmost digit of the multiplication result (the one that happens to be before the dot) contributes to the rightmost side of the resulting binary number.

For example, for the decimal 0.6875 the steps are:

Current Decimal | Current Binary |
---|---|

0.6875 | |

[1]0.375 | 1 |

[0]0.75 | 10 |

[1]0.5 | 101 |

[1] | 1011 |

Thus, 0.6875 (Decimal) == 0.1011 (Binary). Which is also finite.

##### Decimal to binary (fraction, inexact)

Now, let's look at another number, 0.8, and try to apply the same algorithm to it:

Current Decimal | Current Binary |
---|---|

0.8 | |

[1]0.6 | 1 |

[1]0.2 | 11 |

[0]0.4 | 110 |

[0]0.8 | 1100 |

[1]0.6 | 11001 |

... | ... |

See what happens?.. While multiplying by 2, we keep cycling through 0.8, 0.6, 0.2, 0.4... it goes into an infinite loop!

Thus, 0.8 (Decimal) == 0.11001100(1100)...

#### Rounding errors

What happens to those infinite binary tails?.. They get chopped!

So, as you now see, what we store in the computer for those numbers, is not an exact representation, because the exact one in this case would take an infinite number of digits, and we can't afford that many. We only have 23 bits for the blue part (which is called mantissa), so we need to chop and round.

Different rounding rules may be used for this chopping. And that's how we get those red digits, which are, in a sense, garbage, since they don't exactly correspond to the number that we want.

#### Error cancellation (and the catastrophic one)

What may happen if we add a chopped number with the "red garbage" (OFFSET_WIDTH, which is 1254.8) to a number without garbage (multiple of PLANT_INTERVAL, which is 1.25)? It will always be a number with red garbage (i.e. non-exact number, affected by the rounding error).

What happens afterwards, when we subtract two such numbers (with the "red garbage")? In our particular case two things happen:

- Either the red garbage tails from these two numbers miraculously "cancel" each other, so that we again incidentally get the number without garbage (and it's equal to 1.25). This happens in 9999 out of 10000 cases (again, for our particular example)
- Or, in a single case, for the carrot number 2273 the bits in the subtracted garbage tails happen to be totally different (due to the rounding specifics) and what happens instead is so-called catastrophic cancellation, when garbage suddenly gets promoted to what is supposed to be significant digits.

The interesting thing is that carrots 2272 and 2273 are corresponding to the offsets 4094.80 and 4096.05. Note how 4096, which is 2^12, is right between them.

Anyway, the conclusion is: whenever there is an inexact decimal number representation involved (together with arithmetic operations), there is a high chance that you'll be badly affected by the rounding errors.

#### Let's fix it: Exact arithmetics

"I know!!!" - exclaimed Bob - "If the problem is that we get rounding errors because of the inexact number representation, just let's make sure that all the numbers are always exact, and we avoid rounding errors altogether!":

```
const float OFFSET_WIDTH = 1254.5f; // <== NOTE: slightly changed, and now is exact in binary
```

And here we go, all carrots placed well!

"Sure," - replied the Fairy - "it worked for you now, but don't you think it's a fragile approach? You have to keep a close eye on every number and operation (granted, your arithmetic is very simple here)... besides, you changed the offset of the carrot patch, haven't you? Don't you think it might upset the carrots?"

#### Let's fix it: Use an epsilon

"Alright," - said Bob - "I know what I'll do. I won't compare the floating numbers exactly, but rather that they are close enough... that's what many people on the forums suggested, anyway".

"Will you, now?.. And how close do you think is close enough?" - mused the Fairy.

"Well... very close. I'll pick some small number with enough zeros after the dot":

```
const float EPSILON = 0.000001f; // kind of small
bool is_planted_ok(int idx) {
float d = carrot_pos[idx] - carrot_pos[idx - 1];
// check that numbers are not further away than EPSILON
return (fabs(d - PLANT_INTERVAL) <= EPSILON);
}
```

However, that didn't work. The angry carrot was still angry.

"Hmm... maybe that was too many zeros, after all. Let's remove one":

```
const float EPSILON = 0.00001f; // should be big enough?..
```

...unfortunately, that did not work either.

"Alright, one more zero, then":

```
const float EPSILON = 0.0001f; // maybe now it's big enough?..
```

Still failing! And that's already quite a big epsilon, mind you!

We can't keep removing zeros after the dot just like that!

Let's go more fine-grained:

```
const float EPSILON = 0.0002f; // this MUST be enough
```

Guess what? It still did not help. Alright, a final attempt:

```
const float EPSILON = 0.0004f; // WHAT ABOUT NOW?.. Q_Q
```

Whew. That finally worked!

"But how reliable is that?" - asked the Fairy - "You sure it will work with different numbers? Are you going to pick this number every time you tweak your constants?"

#### Let's fix it: Relative epsilon

"Uhm. Nope, that does not feel right" - said Bob - "But I found a better way on the Internet! Let's use a relative epsilon":

```
const float EPSILON = 0.0002f; // the "relative" epsilon
inline bool float_close_relative(float x, float y, float eps) {
return (fabs(x - y) <= std::max(fabs(x), fabs(y))*eps);
}
bool is_planted_ok( int idx) {
float d = carrot_pos[idx] - carrot_pos[idx - 1];
return float_close_relative(d, PLANT_INTERVAL, EPSILON);
}
```

"Alright, that worked." - said the Fairy - "But what if you bump your epsilon back down a notch?"

```
const float EPSILON = 0.0001f; // should work as well, shouldn't it?..
```

"Oh no! What's going on? It's failing again!" - exclaimed Bob.

"You see," - answered the Fairy - "relative epsilon comparison only fixes one thing: that you'd need different EPSILON values when comparing big numbers and when comparing the numbers close to 1. In your case PLANT_INTERVAL is 1.25, which is already quite close to one, so while this relative epsilon formula is a generally good idea, it does not fix your problem... oh, and besides yours does not handle corner cases very well, but let's not go there."

#### Let's fix it: Bump the precision

"Alright, I know what else I could try." - said Bob after some contemplation - "I will just bump the precision of all numbers from single (32 bit) to double (64 bit) and see what happens":

```
const double OFFSET_WIDTH = 1254.8;
const double PLANT_INTERVAL = 1.25;
double carrot_pos [NUM_CARROTS];
void plant_carrot(int idx) {
carrot_pos[idx] = OFFSET_WIDTH + idx*PLANT_INTERVAL;
}
const double EPSILON = 0.000001;
bool is_planted_ok(int idx) {
double d = carrot_pos[idx] - carrot_pos[idx - 1];
return (abs(d - PLANT_INTERVAL) <= EPSILON);
}
```

What happened is that it worked: the test passed. "Is this the solution, then?" - cautiously asked Bob.

"It may be." - answered the Fairy - "Sometimes. You have to realize, though, that by simply increasing the precision of your computations, you might as well just postpone your problem until later. Which is rarely a good thing. Besides, keep in mind that this way you use twice the memory for the data... which can affect the performance due to the increased cache misses, decreased SSE register utilization and what have you"

#### Let's fix it: Modify the formulas

"OK then, I can do it a smart way. Someone on the forums mentioned that one could take an algorithmic approach and modify the formulas, so that to avoid catastrophic cancellation and/or compensate for the rounding error":

```
const float EPSILON = FLT_EPSILON;
bool is_planted_ok(int idx) {
float cp = carrot_pos[idx - 1] + PLANT_INTERVAL;
return float_close_relative(carrot_pos[idx], cp, EPSILON);
}
```

"We have a subtraction with catastrophic cancellation inside the testing code itself, so instead of subtracting two neighboring carrot positions and comparing the result with the PLANT_INTERVAL, we will add the PLANT_INTERVAL to the previous carrot's position, and that should be close to the current carrot position!"

"Nice!" - smiled the Fairy - "And I've noticed that your epsilon is now not an arbitrary hand-picked number, but..."

"...but the smallest number such that 1.0 + EPSILON is not equal to 1.0 !" - proudly declared Bob.

"You are making a good progress. But could you find a way to avoid using epsilons altogether?.."

#### Let's fix it: An "arbitrary precision" library

"I did more research," - proclaimed Bob - "and found out that it's possible to do floating arithmetics directly in decimal format, avoiding the hardware binary representation. People do that for financial calculations (because that's the area where rounding errors translate directly in the lost money), for example there is BigDecimal class in Java. And there are libraries in other languages. So we could do something like:"

```
#include <boost/multiprecision/cpp_dec_float.hpp>
typedef boost::multiprecision::cpp_dec_float_100 decimal;
const decimal OFFSET_WIDTH("1254.8");
const decimal PLANT_INTERVAL("1.25");
decimal carrot_pos[NUM_CARROTS];
void plant_carrot(int idx) {
carrot_pos[idx] = OFFSET_WIDTH + idx*PLANT_INTERVAL;
}
bool is_planted_ok(int idx) {
return (carrot_pos[idx] - carrot_pos[idx - 1]) == PLANT_INTERVAL;
}
```

"And it works! See, no epsilons involved!"

"Well, that's an option, certainly." - said the Fairy - "Keep in mind, though, that there is a computation cost to pay, so using this can be an overkill. Besides, I see that you've included a Boost header..."

"Is there anything wrong with Boost?"

"Nothing, it's a very good, production-quality library." - quickly answered the Fairy - "Let's move on. What else could you do?"

#### Let's fix it: Fixed point computations

Soon after Bob came back: "I know what else we have not tried yet! Fixed-point arithmetic! Let's just sweep this pesky floating point under the rag and pretend it never existed!"

```
const long OFFSET_WIDTH = 125480; // distance to the start of the carrot patch (0.01 inches)
const long PLANT_INTERVAL = 125; // the planting interval (0.01 inches)
long carrot_pos[NUM_CARROTS]; // carrot positions go here
void plant_carrot( int idx) {
carrot_pos[idx] = OFFSET_WIDTH + idx*PLANT_INTERVAL;
}
bool is_planted_ok( int idx) {
return (carrot_pos[idx] - carrot_pos[idx - 1]) == PLANT_INTERVAL;
}
```

"See, it works! I am using only integers here!"

"Of course it does." - smiled the Fairy - "And it's a viable option, sometimes. Some embedded systems still may not have floating point implemented in hardware. But keep in mind, that your code may become much harder to comprehend, especially with complex computations. Any other ideas?"

#### Let's fix it: Rational numbers

"Gee," - Bob came back after a while - "I figured I could do the fraction computations myself, via rational numbers. I even wrote a little class for that":

```
class rational {
public:
rational(int nom = 0, int den = 1) {
int n = gcd10(nom, den);
_n = nom/n;
_d = den/n;
}
rational operator +(const rational& r) const { return rational(_n*r._d + r._n*_d, _d*r._d); }
rational operator -(const rational& r) const { return rational(_n*r._d - r._n*_d, _d*r._d); }
friend rational operator *(int n, const rational& rhs) { return rational(n*rhs._n, rhs._d); }
bool operator ==( const rational& r) const { return (_n == r._n) && (_d == r._d); }
private:
int _n, _d; // nominator/denominator
static int gcd10(int a, int b) { return (a%10 | b%10) ? 1 : 10*gcd10(a/10, b/10); }
};
```

"Now, I can plant with rational numbers, with none of angry carrots:"

```
const rational OFFSET_WIDTH = rational(1254) + rational(8, 10);
const rational PLANT_INTERVAL = rational(1) + rational(25, 100);
rational carrot_pos[NUM_CARROTS];
void plant_carrot( int idx) {
carrot_pos[idx] = OFFSET_WIDTH + idx*PLANT_INTERVAL;
}
bool is_planted_ok( int idx) {
return (carrot_pos[idx] - carrot_pos[idx - 1]) == PLANT_INTERVAL;
}
```

"Sure, why not." - said the Fairy - "Don't forget, though, that irrational numbers also exist. One fine day you may need a square root, or something... oh, and by the way, did you know that some programming languages have the native support for the rational numbers? Check out this Clojure REPL session:"

```
Clojure 1.3.0
user=> (def a 0.1)
user=> (def b 0.2)
user=> (def c 0.3)
user=> [a b c]
[0.1 0.2 0.3]
user=> (= c (+ a b))
false
user=> (def ar (/ 1 10))
user=> (def br (/ 2 10))
user=> (def cr (/ 3 10))
user=> [ar br cr]
[1/10 1/5 3/10]
user=> (= cr (+ ar br))
true
```

"So next time you might even want to use a different programming language. Right tool for the job, and all that. Now, as I mentioned, it's still not an ideal solution..."

#### For every answer, two more questions

At this point, the reader might have already started wondering if there is a happy end to this story.

Did Bob eventually find the solution?

Well, yes and no. He realized that floating-point is hard, and there is no an one-size-fits-all recipe to make it any easier. However, something that he did instead, was understanding and embracing this complexity.

After all, it could have been much worse without the IEEE 754 standard.

And that, in a sense, was a happy end.

#### Disclaimer

All characters appearing in this work are fictitious. Any resemblance to real persons, living or dead, is purely coincidental.

Except for the carrots. Those were real, goddamit.