Real Unusual I/O Slowdowns (Part 1)

Original Author: Drew Thaler

I thought it’d be fun to run a series of posts on actual problem I/O patterns that I’ve seen, heard about, or debugged in games or game-related situations. This is the kind of information that is usually kept anecdotally with a few experts, not often shared, and often forgotten when the experts move on. Perfect topic for a dev blog post, don’t you think?

(The title refers to the fact that these are all real-world situations, they often cause unusual or unexpected behavior, and sometimes they’re just real unusual!)

We’ll start with a simple one.

The Problem

A big AAA console game has added a new feature which means reading from multiple partitions on the hard disk at the same time. But as soon as they started to do this, the developers found that load times went to hell: a level which used to take 5 seconds to load now takes 20, and a level which used to take 10 seconds now takes 40!

Why did it get so slow?

Hard disk partitions and seeking

In a nutshell, the problem boils down to reading from (or writing to) multiple partitions on the same disk at the same time.

Let’s start by visualizing a hard drive. A hard drive is a platter, which is basically a flat donut shape. On this platter is a drive head used for reading and writing which moves back and forth. (Sure, real hard disks usually have multiple platters; but for the purposes of seeking, a multi-platter disk works almost exactly like a single-platter disk – so we’ll stick with the simple model.)

Hard drive platter with drive head.

A simple conceptual model of a hard drive.

Great! Now here are two partitions on this disk. We’ll call these C and D. The names are arbitrary, of course – you could think of these as Windows drive letters, or you could read them as “cache” and “data”. Let’s say C is small, and D is large.

Hard drive platter, showing two partitions C and D.

Partitions C and D.

Finally, suppose the data we want to read is on both C and D. As we load files, sometimes the data comes from C and sometimes it comes from D. Watch what happens to the drive head:

Hard drive read animation, alternating reads from C and D

Alternating reads from C and D.

Hmm, the head is sure moving around a lot, isn’t it? That’s a lot of seeking. And as I always say: seek time is wasted time.

Perhaps you’re looking at this diagram and saying: sure, but you’ve just illustrated the worst case. In the best case, the data from D will be right next to C. Right?

Well, that’s true — but in fact, this worst-case is actually the most common case! Filesystems tend to store data on the outer circumference of the disk, since logical block addressing goes from outside to inside. So, consider a hard disk partitioned in exactly this way, with C on the inside and D on the outside… which matches at least one of the current consoles out there. If the drive starts out initially empty, then the first batch of data that gets written to D will wind up all the way on the outside, as far away from C as possible. And right there is your worst case: it turns out it will happen automatically on any brand-new console, fresh out of the box.

OK, so let’s see how bad it is. How much time does it take to move the drive head all the way across the platter?

In this case we’re talking about a game console, not a high-end PC. A typical drive in this console generation (which, remember, started way back in 2005, so we’re talking about hard drives from that time) is pretty basic – 5400rpm, with an average seek time of 12.5 ms.

But we don’t want the average time. The strokes you see above are not average… in fact, they’re pretty much the worst case: they are what’s called “full-stroke” seek times, going across the entire width of the platter. The cost of a full-stroke seek is generally a little less than twice the average seek.

For these console hard drives, a full-stroke seek is about 22 ms. Expensive, but not so bad if it’s just an isolated or somewhat rare event.

But what if you are hitting each partition almost equally? Reading C – D – C – D – C – D? Then you might be paying that 22 ms cost over and over again. It’ll take just over two hundred reads like this – practically nothing in a modern, complex AAA game – to add 5 seconds of load time. Five hundred reads: 10 seconds. A thousand reads, just over 20 seconds.

Ouch!

Where did this come up?

This I/O pattern occurred in the real world when a game was reading from an optical disc cache on one partition, while at the same time patching the cached data with content from a second partition.

Loading of course worked great when all the data was on one partition, but the initial attempt at the patching system caused exactly the behavior I described above.

How would you fix it?

The ultimate answer is, of course: don’t do that. 😉 But that’s not too helpful, is it? Here are some more concrete steps that you might take to fix this problem:

Try to organize your loading scheme so that you load everything from C first, and then everything from D. In the theoretical ideal case you could reduce the cost to just one full stroke seek.

If you can’t easily do that (and to be honest, chances are you can’t!) then at least keep the number of transitions down: load a batch from C, then a batch from D, then a batch from C. For every N reads you can batch together you’ll divide the number of full-stroke seeks by N. Two reads from C and two reads from D, and there will be half as many full-stroke seeks. Three and three, and there will be one third as many. And so on.

If one or more of the data streams is easy to buffer ahead (a movie, or other linear stream of data), throw more buffering at it so that you’re reading or writing less frequently. This is effectively the same as the previous one, because it batches the reads more. It’s also the approach we ultimately used with the patching system, since the patches were stored linearly.

Consider migrating data from one partition to the other for better locality. For example, you could automatically copy from D to C when the drive is otherwise idle. (If C is a cache, this would be a prefetch.)

And of course, as a last resort, you may be able to simply suffer the penalty when it occurs – but hide it by loading in the background, well ahead of time.

Some Notes

This post is only the first one in a series. I hope this was entertaining and not too basic; problems like these may seem obvious in hindsight, but they can certainly catch you by surprise! If you’re hungry for something crazier, don’t worry, I’ve got some real doozies lined up for future posts.

Next time: An unexpected worst-possible I/O interleave.