2014-01-02

How to detect integer overflow in C and C++ addition and subtraction

This blog post explains how to detect integer overflow (and underflow) in C and C++ addition and subtraction, and it also gives example code.

Overflow (or underflow, we use these terms interchangeably) occurs when the result of an arithmetic operation cannot be represented as an integer of the same type (and size) as the operands. For unsigned addition, overflow indicates that the result is too large. For unsigned subtraction, overflow indicates that the result is negative. For signed addition and subtraction, overflow indicates that the result is either too small or too large.

When chaining additions, it's useful to compute the sum x + y + c, where c is the carry bit (either 0 or 1) resulting from the previous, less significant addition. Similarly, when chaining subtractions, it's useful to compute the difference x - y - c, where c is the borrow bit (either 0 or 1) resulting from the previous, less significant subtraction.

The freely available Chapter 2 (Basics) of the book Hacker's Delight has a detailed and informative subsection about overflow processing. The formulas presented below are based on formulas in that section. Please read the entire section of the book for a detailed explanation and more formulas (which are useful in other environments).

One simple observation is that signed addition overflows iff the sign of the two operands (x and y) are the same, but it's different from the sign of the sum. Based on similar observations we can devise the following formulas:

  • signed x + y + c overflows iff this is negative: ((x+y+c)^x)&((x+y+c)^y)
  • signed x + y + c overflows iff this is negative: z&(((x^z)+y+c)^~y) after z=(x^~y)&((1<<sizeof(x)*8-1)) (no temporary overflow)
  • signed x - y - c overflows iff this is negative: ((x-y-c)^x)&((x-y-c)^~y)
  • signed x - y - c overflows iff this is negative: z&(((x^z)-y-c)^y) after z=(x^~y)&((1<<sizeof(x)*8-1)) (no temporary overflow)
  • unsigned x + y + c overflows iff this is negative: (x&y)|((x|y)&~(x+y+c))
  • unsigned x - y - c overflows iff this is negative: (~x&y)|((~x|y)&(x-y-c))

Please note that none of the formulas above contain branches, so the CPU pipeline doesn't have to flushed in order to compute them. To convert the sign bit (i.e. negativity) to a bool (0 or 1), shift it down like this: (int)(((((x+y+c)^x)&((x+y+c)^y))>>(sizeof(x)*8-1))&1).

Please note that in standard C and C++ the result of addition and subtraction is undefined (!) if an overflow occurs. The GCC flags -fwrapv and -fno-strict-overflow disable this undefined behavior. But since our code can't be sure if it's compiled with these flags enabled, we must use an overflow-detection formula in which no temporary overflow occurs. Such formulas are also given above. Another option is casting the operands to the corresponding unsigned type, adding them as unsigned (which happens normally, only the least significant bits are kept, as many as possible), and then casting the result back to signed. To do so, we must add these explicit casts in x+y+c and x-y-c in the signed formulas above. These casts can get tricky if we don't know the type of the operands, because there is no overloaded generic cast in C (which e.g. casts int to unsigned and long long to unsigned long long).

See the final code on Github. It can be included as a .h file in C and C++ code. It works with GCC 4.1 and above and Clang 3.0 and above. It uses the GCC extension __typeof__ (also works in Clang) and it uses function overloading in C++ for the generic unsigned cast. In C, it uses the GCC extension __builtin_choose_expr for this cast. It also uses statement expressions in macro bodies to declare temporary variables to avoid useing the arguments more than once.

Further reading:

  • About the C11 _Generic selections (for implementing overridden functions and macros) in this blog post.
  • P99, a huge macro library for C99 (C dialects earlier than C11).
  • An article about proper overflow detection in all C and C++ arithmetic operations. Overflow detection is much harder to do correctly than what you think. The article contains many incorrect naïve implementations, and also the correct (complicated) implementations. Read it, it's worth it!

No comments: