Dude, where's my code?
From time to time, I have experienced what seems like vanishing code. I write out the source code, I think it is right, but then strange things start to happen when I compile it. In this article, I want to look at how it might look like the compiler has thrown away my code, and how to manage the advantages and disadvantages of a powerful, optimizing compiler.
Instruction set design
The modern world is full of RISC processors. The compact instruction set typically requires the compiler to generate more instructions, with the payoff that the simpler instructions can be implemented using less time and less power in the processor. So we have learned to expect compiler-generated assembly code to be a bit wordy, using far more assembly instructions that the input lines of C source code.
Having said that, the instruction set designers are always looking for instructions that are both somewhat easy to implement and also help allow frequently occurring code patterns to be expressed efficiently. A well-known example is count-leading-zeros. Once you have a barrel shifter, you can implement this using a modest amount of additional gates, and it is a very popular instruction, implemented by Arm, TriCore, PowerPC and RH850 instruction sets, not to mention x86. The C code to count leading zeros would require a loop, if not for a compiler intrinsic to generate the one assembly instruction.
So we know that there can be cases where an algorithm can be more compactly represented in assembly code than in C code.
Example: setting negative values to zero
1
2
3
4
5
6
7
8
9
10
11
12
13
sint32_t NegativeToZero( sint32_t x )
{
sint32_t result;
if( x < 0 )
{
result = 0;
}
else
{
result = x;
}
return result;
}
The assembly code generated by the free Gnu Arm toolchain for
the humble Arm Cortex-M processor
is one instruction, plus one
instruction to return. What happened to all the code? Where
is the test for a negative value? Where is the conditional
code? Arm has conditional instructions that can be use for
branchless code, but BIC r0, r0, r0, ASR #31 is not a
conditional instruction. What is going on?
The answer lies in the ASR #31. An arithmetic shift right by 31 bits turns the two’s complement sign bit in to a mask of all ones (0xFFFFFFFF) for a negative number, or all zeros for a positive number. BIC then ANDs the value with the bitwise negation of this mask to change any negative number to zero, while leaving any positive number unchanged.
Confusion
Such optimizations provide excellent run-time performance,
but they also cause some problems when profiling and debugging.
I experienced this with exactly the example above. I wanted to
see how often negative values were being produced and changed
to zero, so I placed a breakpoint on the line result = 0;
I was very frustrated to see that the breakpoint was hit for
every value. In fact, I only ever observed positive values.
After some head-scratching, I switched the debugger from
“source only” mode to “interleaved” mode, where you see
both the source code and the assembly code. My frustration
only grew, when I discovered that the compiler appeared
to have generated only one instruction, and that instruction
did not seem like a plausible translation of my C source code.
Eventually, I puzzled out what was going on with the
sign bit and the mask. I temporarily
added __asm( "nop" ); next to result = 0; to force a
conditional jump and got
some real metrics about the frequency of negative values.
For non-embedded projects, you will typically disable optimization while debugging. In many embedded projects, disabling optimization makes the code too large for the program flash, and/or too slow to run in any realistic way. So we are forced to debug in the presence of a certain level of compiler optimization.
Conclusions
In the end, having very efficient code for thousands (or even millions) of final products is a benefit that far outweighs a bit of inconvenience when debugging. So such compiler optimization is something to celebrate and be grateful for. And also, we want to minimize confusion that can arise.
I hope that learning from my mistakes might save you a bit of time, next time you find yourself in this situation. Distilling the above into three points, we might say:
- Be aware that optimized code can be much smaller than you expect.
- If you start to suspect this kind of situation, take a look at the assembly code.
- Even if you cannot disable optimization for the entire project, you can temporarily disrupt the optimization enough to meet your profiling and debugging requirements.