Step by Step Optimisation

Original Author: Tony Albrecht

So, you’ve been given a task to optimise a system in a game. The original author(s) are no longer working in your studio (that’s always the case isn’t it?) and you need to lube up the code and insert a rocket. How do you go about it? Here are some step by step instructions to guide you through the optimisation process…

Grok your platform

In order to optimise anything, you need to first understand the platform you’re running on. This is more than just the hardware (although that is very important). It’s also the operating system and even the engine you’re using – anything that you can’t change is your platform. These things are your axioms that you have to work with and you need to use them to build your optimal solution. They are also your constraints, your limits. You need to know what it does well and what it does badly. Then do lots of the first and as little as possible of the latter.


To optimise something you must first measure it. If you have no benchmark, how will you know if your changes have improved performance or hindered it? An accurate benchmark can be quite problematic in a game – especially if (like most games) it is non-deterministic. If you can’t minimise frame by frame variation (pausing? Freeze the AI? A test level?) then you’ll either need a longer term average in a fairly consistent in-game region or a plot of those times over range of time. An even better solution is to use edit and continue to compile and relink the code on the fly so you can watch the performance worm rise or dip according to your changes (although this is only good for fairly local changes). Once you have established a decent benchmark you can move on to the next step.


Run your favourite profiler over the code and see if there is anything glaringly obvious that would benefit from optimisation – keep an eye out for excessive memory allocation/deallocation (if you’ve never traced through a malloc before I suggest you try it. Then try and tell me that its ok to malloc/free code in an inner loop), high frequency functions (small functions that are called many times in succession or, even worse, alternating with other small functions), inefficient data use, and obscure implicit construction or conversions (Oh, the evils of objects!). By all means fix the easy obvious stuff, but don’t delve down too far – before you shake the foundations of the codebase you need to understand what it does.


So, what is the code trying to do? Or, more importantly, what problem was the programmer who wrote this code trying to solve? What was their intention? Don’t get bogged down by implementation details – try and see it holistically. What are the inputs and the outputs? What are the transformations involved? You need to understand these first before you touch anything – yes, I appreciate the urge to optimise that obviously inefficient code, but will it really make a difference in the end? If you are going to rewrite parts of this code, then you’d better be sure you know what’s going on. Don’t blindly optimise without understanding what that code is doing.


Once you know what the problem being solved is, you need to delve down into it and see how it is solving it. This is where you look the structure of the code, the nitty gritty transformations of the data that compromise the guts of the algorithm. Try not to be too critical of the code or data just yet – ignore the over abundance of virtuals and the eye-stabbing templates, and just try to comprehend what is going on. Trace the data through the bowels of the code and watch what happens to it – where it goes in, where it comes out and what touches it in the middle.


Now that you have developed some empathy for the code, profile it again. You should now understand much more of what is going on and the profiled information will make more sense.  Figure out where the performance matters and where efficiency is irrelevant. Look at not just the low level but the high level algorithm – not just the code that runs slow, but the code that calls the slow code. Where do you need to fix it to make it go fast?


Now that we know what and how, we should consider why. Why was the code/data structured in that particular way?  You should always assume that the coder there before you knew what they were doing. That little nugget of crazy you’ve just stumbled across was probably put there for a reason – if you can’t figure out the reason then you probably don’t understand the code enough. Sure, the previous coder may have been a mouth breathing moron, but in all likelyhood they were about as smart as you and you’ll make the same mistakes if you don’t learn from theirs.


Once you know why, you should consider the alternatives – can you do the same thing in a different way? You understand the platform, you know what the code has to do, you know what the old code did and why – is there a better way? Take a step back and think of the big picture. You have the benefit of looking at a completed system. You know everything that the code has to do, whereas the original author(s) only had an idea of where they were going.  You also have benefit of a complete test suite – the running game. Given the gift of a complete overview of the system, can you do it in a better way? There are often many alternatives: Can you distribute the calculations over many frames? Can you reduce the fidelity of the calculations? (does it really need to be done in double precision?) Can you cache some of the information? Interpolate? Use level of detail? Do less? Run it on a different thread, concurrently or deferred?


Now is the time to optimise. This doesn’t necessarily mean cracking open the assembler or getting dirty with a SIMD processor. You have alternatives from the last step, try those that seem most appropriate. If there is no alternative, or it is the best option, look at your low level optimisation. You know, the fun stuff; streaming data, SOA formats, SIMD processing, loop unrolling, intrinsics.

But, a word of advice before you go crazy – keep it simple. Simple code is easier to optimise, easier to maintain, and just plain easier for the next poor sap that comes along to deal with.