Measurement fallacies
One of the most important texts on creating performant software remains Michael Abrash’s Graphics Programming Black Book. Around 30 years after publishing, it still speaks clearly and accurately to the challenges of timing and optimizing software running on increasingly complicated hardware, whose performance features are opaque and hard to predict.
I claim that the central theme of the Graphics Programming Black Book is that measurement must be used to validate theories about software optimization. I do not disagree with that message in any way. On the contrary, many of my own ideas for software optimizations were quickly and ruthlessly killed off by conducting some measurements. Conversely, some less promising approaches were revealed to be immensely valuable by measurement.
However, I also observe some serious problems with the idea that every optimization must be guided only by measurement data, and in this article, I want to look into those problems.
Variability and accuracy of measurement
Some optimizations have a very large impact and their impact is very easy to measure. However, in a resource-bound project, we might also be very interested in smaller optimizations, especially if we can combine several of them to reach a larger objective.
I worked on a successful project to reduce overall CPU load of a largely finished embedded project by 10%. This was achieved using more that ten optimizations, some of which made quite a small contribution. In fact, some contributions were so small, that we never had conclusive measurement data to prove that they were worthwhile. Although it seemed like we were going in the right direction, the timing improvements for the more subtle changes might also be explained by other timing variability. We only know that the combined optimizations met the overall performance requirement.
I am very confident that if we had allowed only optimizations for which we could clearly prove a benefit for that optimization in isolation, we would not have achieved the goal of 10% improvement.
Cost of measurement
Another problem with the idea of guiding all optimization with measurement is that you can only measure software that runs sufficiently that the measurements are meaningful. In many cases, the cost of each optimization is high, and you can only implement a small number of optimizations. The choice of which to implement and measure is typically guided by experience, judgement and guesswork, with varying degrees of success.
For years, I underestimated the benefit of GCC likely/unlikely annotations for embedded software. I had formed a belief that they were only really useful for high-end non-embedded processors with huge pipelines and layered cache and memory architectures. Not knowing that they provided real optimizations for my projects, I did not measure their impact.
Local minima
A third issue with measurement-based optimization, is the classic problem of a local minimum in a global search. I observed this in one project where a number of algorithms were measured, and then the winner was optimized still further. This was duly selected as the code to go into production. It did not take long before someone had the idea to carefully optimize one of the other algorithms, and demonstrate that, for the real data, that result outperformed the production code by some margin.
I worked on another project where the first optimization made a very frequently called function slower. Before it could be rejected, the developer was able to point out that the function was now much smaller, and viable to be inlined. Measurement with the inline function demonstrated a large improvement in overall performance.
Conclusions
Clearly, measurement remains the guiding light for system optimization. We must use measurement to validate system changes. And also, we cannot slavishly demand that every change be driven exclusively by measurement data. We need to use also engineering judgement to decide where to invest precious development and test resources.