Don’t scrimp on Hardware!

Original Author: Deano

Whilst true of all disciplines I expect, I’ll talk about this one purely from a coder perspective, mostly because as a code lead, over the years the efficiency of the team has been something I’ve spend a lot of time thinking about.

There are many aspect of team efficiency, but this article is going to focus on the easiest to implement. Which is simply “Open you wallets and buy the best, hell no, buy better than the best”. This makes sense in big budget AAA projects, where often spending the money is the easiest, fastest and most available why of improving efficiency NOW. Its not necessarily the best but its one that can be implemented with almost no downtime and no other resource costs except cash (and IT infrastructure). A cash solution is rare in large projects where communication, acquiring staff, deadlines and morale are all significant issues affecting efficiency but also much harder to solve.

Often saving a bit of money on HW is seen to make sense for the project (a penny saved, etc.), but its a false economy IMHO, in fact I’d go as far as to say, the best hardware is more important than the millions your likely to end up spending on AAA marketing. Because marketing has to have something to sell, and the easier way you can make a better product is to make your most precious asset (your team) happier and more productive.

Some of course will say, with a decent hierarchical, incremental build system, building your game shouldn’t require crazy hardware as you shouldn’t be compiling many files at once for a typical small update, to which I say… true and also aerial porcine encounters do happen (they do honest!). Its not that you can’t solve it in software, its just you won’t. Its hard to maintain, its expensive man-power wise and that cost goes up the more productive the team is and then it only helps if you really aren’t changing a major underling system. The reality is, the spider web of files will mean that most people will be compiling a significant number of files at once, a fair portion of their working day.

So optimize and pay for the worse case (which is the only case I’ve ever encountered), your build complexity will grow exponentially as the projects evolves. Now don’t get me wrong, there is much to be done and should be done in software to reduce build times but it doesn’t reduce that fact that good hardware == faster in almost all cases. And faster == more stuff done AND happier team members.

How MUCH?!

So i’ve convinced you to throw money at the problem, awesome! So then what should you buy?

This is where I’m likely to scare the managers reading this who until now have been smug thinking “we do that! we are awesome”. Outside my job I also (some where along the line, not sure where) got involved with serious hardware, the stuff you don’t buy, you build (or pay someone else to build), the stuff enterprises pay tens or hundreds of thousands of pounds and dollar for software support contracts per year :O

Now i’m not suggesting you spend that on software contracts, because tbh your programmers will likely love fiddling with a decent system but I do think you need to start thinking in that league as regards hardware. Start with a rough £10,000 or $15,000 budget per programmer per couple of years on hardware only, and your approaching what I call ‘good’.

So what should I get for that kind of cash?!

Suspect I’ve not got many left this far down this article, but here comes the more precise side of things.

Ignoring build distribution for the moment, building any large games consists of three main bottlenecks

  1. Memory – how much stuff I can fit in
  2. Cores/HW threads – how many things can I build at the same time
  3. Disk – how fast can I get stuff into memory so my cores can work on it

Memory

Memory = normal usage + ~1 GiB per core + RAM Disk for temp files + cache for all source code, tools, the game, etc. at the same time!

So lets justify that,

  • Normal usage is running IDE, web browser, email client, etc. Currently I find 8GiB is about right, as usually enough to run your PC build/tools/Debugger whilst actually building.
  • 1 GiB per core, my current rule of thumb is that to compile happily the compiler likes about 1 GiB of workspace, particularly true with large projects and templates (like boost), some compilers are much more frugal than others but others can be crazy wasteful. Given 1 GiB per core gives you the head room for the bad boys of the compiler world. You want 1 GiB per core, because we are going to make sure that every bit of the platform is ready so that no time is wasted, every core is going to used at the same time.
  • RAM Disk, compilers are chains of operations, that write out lots of temporary or rebuild-able files. You know that, the compiler knows that, but most filesystem don’t have a way of saying (except via tmpfs type systems, which essentially we are creating manually) we are writing, but if I lose it all its not really that important. No matter what disk architecture you go for, minimizing writes that don’t need to be persisted will maximise its performance. A RAM Disk makes this easy, you direct all your .lib, .obj, .pdb, .exe, etc. files that can be rebuilt to your temporary RAM Disk, this means 99% of your writes are now free of disk bottleneck and running at the fastest speed possible. When you reboot your system or it crashes worse case you have to rebuild them all, however most RAM disk have to option to writes permanent storage out on shutdown, so except for crashes it appears as normal very fast drive.
  • Cache, the best place to read your files is from your disk cache. 99% of your source files won’t change each compile, so they can sit in your systems file cache lovely. But you need enough to fit all the files your compile references AND all the other things you might be running inbetween. A common dev cycle is compile, debug, compile, debug, ad finitum. Which means you want enough cache for not only the development files but also any non-streamed files on disk (depending on platform, files to the console might go through system cache or not, but best to assume that will go through cache) and the debugger and other things that will fill you cache during the build/debug cycle.

The key take away point, is that if you don’t have enough memory to support your cores/hw threads, your wasting your CPUs time, and if you don’t have enough memory for cache/RAM Disk, you are forcing your disks to over extend themselves also slowing your development process down.

Core / HW Threads

Cores / HW Threads = number of files that can be build + 2

Practically you are going to be limited by how many cores and threads you can buy in a single system. The +2 is to maintain system responsiveness even under heavy load, in practice it doesn’t matter as we generally have more files than cores.

Cores and HW threads, actually do your compiling and building, so the more you have the more in theory you can get done at once. So as long as you have the disk and memory to feed them well enough, you want the most you can buy. This is where you leave the common world of desktops, its time to take a walk over to server aisle of the shop.

The reason is 2P or not 2P (4P!), no not a misquote of Shakespeare but the P stands for processors, because whilst you might get single processor with up 12 HW threads in x86 land, thats just not enough. Given the characteristics of iterative builds, as long as we the RAM we scale fairly horizontally (number of cores) well. So more processor sockets,the faster stuff gets done.

Current a 2P Xeon system will get you 32 HW threads, a 4P AMD system will give you 48 HW threads. Thats ALOT of processing power over a bog standard ‘workstation’ and it really shows in building and compiling. Its may seem expensive, but it makes development much more pleasant and efficient, if a build takes too long, coders will loose their place “in the zone” (lolcat’s is just too damn addictive). The faster they get back to the problem, the better.

The other point in favor of server hardware is reliability, as a general rule, servers system run for months and months without an issue. That often can’t be said for ‘extreme’ desktop platforms.

There are several issues,

  1. Its not representative of PC gamer rigs, if that a problem, simple add another PC gamer box to you devs desk.
  2. Noise, servers are designed to run in data centers, where ear protectors are standard equipment in many cases. Not ideal for your quiet office, there are two solutions, one keep all the machines in a quiet room and use keyboard/mouse/video/audio/usb extenders to each desk or buy special quiet server cases to set at the coders desk. Of course there is always fashionable ear protectors as well…

Disks

Disk = (low write speeds (yay for RAM DISK) + high read speed to fill the disk cache ) + fast boot/load + fast source control update (fast small file reads and writes)

  • If you’ve set up the RAM sizes as indicated above, normal building won’t actually need the disk much as it will be largely in RAM
  • We all like things to boot and load fast, and also for those first or full build situations we do want that to be relatively fast
  • Source control system can be nasty for filesystems, often checking and updating gigabytes spreading across hundreds of thousands of files. Lots of IOPS and out of order r/w are good

Two solutions here, 1 for each programmer and one for a small group.

  • Each programmer is simple, buy a few PCI-E SSD and RAID10 them together. This gives you your boot and local disk, fast enough to fill your RAM cache at break neck speed. Shouldn’t need to be too big, as most things will live in the next item.
  • Amongst a small group of developers, connect a SAN via 10GB ethernet or infiniband to each machine, a SAN is a machine on the network which only job is to provide very fast, very safe storage. Each SAN has its own complete system of caches and software to allow out of order delayed writes (safely), so that the main limit is network throughput, hence the 10GiB networks. Also using RAID replication and ERC technology means data can survive disk failures and other problems. There are stupidly expensive enterprise SANs which cost ouch amounts of cash, however luckily that’s all smoke and mirrors to a large degree. Using a system like OpenIndiana, with some SSDs, a dozen spinning rusts (AKA traditional hard drives) and perhaps a DDRDrive accelerator, you can for moderate amounts, have speed and iops to spare, all sitting in a nice little server connected directly to a bunch of programmers machines via a fast network.

In conclusion

CFO and bank managers will hate me, but I stick by my belief that your staff, your team are the most important part of any project, and that means buying them the best equipment so that they spend there time using that talent and not watching busy cursors and build outputs scrolling slowly up the screen.

Their is more to hardware than whats here, from displays to build machines, project progress displays, video conferences, etc. this has just been the focus on the very smallest but most personal part of a realistic hardware capitol expense for making worlds.

Today to buy a 64GiB RAM, 48 Core AMD workstation with multiple PCI-E SSDs per programmer is going to come as a shock, as we have got used to skimping on hardware, forcing the team to with work with little better than gaming platforms. Its never been true in many other performance computing sectors, and we need to realize thats what we do, we make some of the most complex performance software in the world. Just be glad we don’t need arrays of super-compute machines to build our worlds, well yet…

Its worth noting this isn’t just a spend spend spend article (well it is but… :P), I’ve spent years looking into this aspect of builds and the basic approach here works for lower expense systems to. Coder systems need balance, lots of cores on their own won’t cut it, RAM and disks are equally important, so even lower down the cost scale, don’t just accept an office or gamers rig, we have very different performance characstics than normal machines.

Perhaps next up is the much harder topic of software and configuration for build systems, if only that was as easy as spend spend spend!