The Power of Constraint

Original Author: Adam Rademacher

A few months ago, I took a look in the mirror and made a list of all the games I’ve made, and all the games I’ve tried to make.  Then, I made another list of all the games that I love, the games that made me yearn to make games for a living.  Where did I go wrong?!

Well, it’s not quite that dramatic, but I did have a bit of an epiphany about how to be a better designer.  Game design, much like programming and art in games, isn’t really something you can learn from a textbook or classroom.  Those are good starting points, but to be a great #gamedev in any field, you really have to run the gauntlet.  Make some tragic failures of games and learn from your mistakes.  At least, that’s what I’ve been told.

Then I started to think about it.  Design isn’t something you learn from a textbook and it’s not entirely ‘learning from your mistakes’ … it’s something else entirely.  There’s so much that the average game designer touches within a game (nearly 100% of it, generally) that it’s easy to miss the forest for the trees.  If you’ve made a game that can be described in no way nicer than crappy, why is it a bad game?  Does it feel incomplete?  Are the mechanics fundamentally flawed?  Is the level design too murky or unclear?  It could be any part of that 100% of the things you touched on that game.   What’s a designer to do?

This is the epiphany I had.  I listed out the games I’ve made (not that many) next to a list of the games that I’ve truly loved over the years (also not that many) and drew links between them, ignoring issues of polish or ‘not having enough time’ as those plague every game.  By doing this, I was able to draw a definitive line in the sand and say: This is what I need to work on.

Let me reiterate what I said to myself, only in the words I actually used.  “Wow, I suck at telling stories in games, and I want to be better at it.”

This is a familiar plea of the designer, as design is such a broad realm there’s bound to be something we are atrociously bad at.  I know what a good story is, and how to write one, but when it comes to weaving the story into gameplay mechanics, I don’t even know where to start.  So this is what I did, and I urge everyone to do something similar with their own areas of need:

  1. Decided to make a new game entirely focused on story and world building.
  2. I picked a few familiar genres that lent themselves easily to storytelling, and were dear to my heart: Point and Click adventure games, JRPGs, and VNs.
  3. I did some quick research to pick out some engines that would make it easy for me to dive in and start work.
  4. RPG Maker is an engine that I’ve had experience with, and was the quickest to get up and run with.
  5. I sat down and started making the game.

Just from this little exercise (I have about 5 minutes of game right now, and am extending it to flush out a full plot with twists and all that goodness), I immediately became aware of something important: Designing the mechanics of the game was holding me back from telling a story in the game.   I was too focused on flow and balance and didn’t have the mental resources to push the story through the way I wanted to.

So what I want from you, dear reader, is to do something very similar.  Look at your games, and compare them to the games on your shelf.  Focus on the differences and develop a simple plan to constrain your development and focus on that one aspect of your design toolbox that is lacking.  Please share similar experiences in the comments, as I’m really curious how you (or your more designery friends) improve their skills in a systematic way.

🙂


Home Safety Castle- designing for people with disabilities

Original Author: Ann-Cudworth

Earlier this year I was developing a scale model of a 4 sim build for a virtual world, utilizing a portable sim scenario called “Sim-on-a-stick”.

When I had finished that model, which was a prototype for the Federal Consortium of Virtual Worlds Con 2011.

With all the best intentions, I described how I would build a castle, set up home safety hazards, and work with the attendees on how to make it work as a game.  Ener stopped me in my tracks when she said “I’m a big fan of section 508, Will it be ADA compliant?”

Safety Castle

I have to confess, up until that point, I had never even heard of section 508, and had never considered the accessibility of my builds for people with disabilities.  This looked like a really large can of worms to me, and I had a very short deadline.  Nevertheless, when someone throws down the challenge glove, I cannot resist.  So I dug in and tried to figure out how to make my Home Safety Castle accessible to people who are hearing impaired or visually impaired.  Very quickly, I learned that this means layers: layers of sound on the visuals, layers of visuals on the scene and some way to keep them organized, functional, and loading in time to make the game play smoothly.

Here is a short walking tour of what has been done so far- it is still a work in progress-

Future Applications

Slowly it dawned on me that what this meant, was a new way to think about how to design a game. In fact, thinking like this could show us how to improve the game as well and to deepen and enrich the experience for the player.  I started think about other things, like red/green color blindness, and how people with that can see more and are less distracted by the shades of colors.

I also started to think about we saw avatars being moved by mind control, one small part of  Team Orlando’s efforts with Army Research.  I asked myself questions about gaming and what it means for people with disabilities when they can play a game without hands or fingers.  I began to wonder how someone like Stephen Hawking could create a virtual model of their theories in cyberspace, just by using their brainwaves.

And finally, I started to think about how we will make games in the future, and what that means for the design of them

What if you lived in a land where people could move things with their minds, or they could see secret patterns on things, or they could hear the sound of butterfly wings? What kind of games would they play?

“O brave new world that has such people in it. Let’s start at once.” (Shakespeare-The Tempest)

Ann Cudworth / Founder- Alchemy Sims

 


Don’t scrimp on Hardware!

Original Author: Deano

Whilst true of all disciplines I expect, I’ll talk about this one purely from a coder perspective, mostly because as a code lead, over the years the efficiency of the team has been something I’ve spend a lot of time thinking about.

There are many aspect of team efficiency, but this article is going to focus on the easiest to implement. Which is simply “Open you wallets and buy the best, hell no, buy better than the best”. This makes sense in big budget AAA projects, where often spending the money is the easiest, fastest and most available why of improving efficiency NOW. Its not necessarily the best but its one that can be implemented with almost no downtime and no other resource costs except cash (and IT infrastructure). A cash solution is rare in large projects where communication, acquiring staff, deadlines and morale are all significant issues affecting efficiency but also much harder to solve.

Often saving a bit of money on HW is seen to make sense for the project (a penny saved, etc.), but its a false economy IMHO, in fact I’d go as far as to say, the best hardware is more important than the millions your likely to end up spending on AAA marketing. Because marketing has to have something to sell, and the easier way you can make a better product is to make your most precious asset (your team) happier and more productive.

Some of course will say, with a decent hierarchical, incremental build system, building your game shouldn’t require crazy hardware as you shouldn’t be compiling many files at once for a typical small update, to which I say… true and also aerial porcine encounters do happen (they do honest!). Its not that you can’t solve it in software, its just you won’t. Its hard to maintain, its expensive man-power wise and that cost goes up the more productive the team is and then it only helps if you really aren’t changing a major underling system. The reality is, the spider web of files will mean that most people will be compiling a significant number of files at once, a fair portion of their working day.

So optimize and pay for the worse case (which is the only case I’ve ever encountered), your build complexity will grow exponentially as the projects evolves. Now don’t get me wrong, there is much to be done and should be done in software to reduce build times but it doesn’t reduce that fact that good hardware == faster in almost all cases. And faster == more stuff done AND happier team members.

How MUCH?!

So i’ve convinced you to throw money at the problem, awesome! So then what should you buy?

This is where I’m likely to scare the managers reading this who until now have been smug thinking “we do that! we are awesome”. Outside my job I also (some where along the line, not sure where) got involved with serious hardware, the stuff you don’t buy, you build (or pay someone else to build), the stuff enterprises pay tens or hundreds of thousands of pounds and dollar for software support contracts per year :O

Now i’m not suggesting you spend that on software contracts, because tbh your programmers will likely love fiddling with a decent system but I do think you need to start thinking in that league as regards hardware. Start with a rough £10,000 or $15,000 budget per programmer per couple of years on hardware only, and your approaching what I call ‘good’.

So what should I get for that kind of cash?!

Suspect I’ve not got many left this far down this article, but here comes the more precise side of things.

Ignoring build distribution for the moment, building any large games consists of three main bottlenecks

  1. Memory – how much stuff I can fit in
  2. Cores/HW threads – how many things can I build at the same time
  3. Disk – how fast can I get stuff into memory so my cores can work on it

Memory

Memory = normal usage + ~1 GiB per core + RAM Disk for temp files + cache for all source code, tools, the game, etc. at the same time!

So lets justify that,

  • Normal usage is running IDE, web browser, email client, etc. Currently I find 8GiB is about right, as usually enough to run your PC build/tools/Debugger whilst actually building.
  • 1 GiB per core, my current rule of thumb is that to compile happily the compiler likes about 1 GiB of workspace, particularly true with large projects and templates (like boost), some compilers are much more frugal than others but others can be crazy wasteful. Given 1 GiB per core gives you the head room for the bad boys of the compiler world. You want 1 GiB per core, because we are going to make sure that every bit of the platform is ready so that no time is wasted, every core is going to used at the same time.
  • RAM Disk, compilers are chains of operations, that write out lots of temporary or rebuild-able files. You know that, the compiler knows that, but most filesystem don’t have a way of saying (except via tmpfs type systems, which essentially we are creating manually) we are writing, but if I lose it all its not really that important. No matter what disk architecture you go for, minimizing writes that don’t need to be persisted will maximise its performance. A RAM Disk makes this easy, you direct all your .lib, .obj, .pdb, .exe, etc. files that can be rebuilt to your temporary RAM Disk, this means 99% of your writes are now free of disk bottleneck and running at the fastest speed possible. When you reboot your system or it crashes worse case you have to rebuild them all, however most RAM disk have to option to writes permanent storage out on shutdown, so except for crashes it appears as normal very fast drive.
  • Cache, the best place to read your files is from your disk cache. 99% of your source files won’t change each compile, so they can sit in your systems file cache lovely. But you need enough to fit all the files your compile references AND all the other things you might be running inbetween. A common dev cycle is compile, debug, compile, debug, ad finitum. Which means you want enough cache for not only the development files but also any non-streamed files on disk (depending on platform, files to the console might go through system cache or not, but best to assume that will go through cache) and the debugger and other things that will fill you cache during the build/debug cycle.

The key take away point, is that if you don’t have enough memory to support your cores/hw threads, your wasting your CPUs time, and if you don’t have enough memory for cache/RAM Disk, you are forcing your disks to over extend themselves also slowing your development process down.

Core / HW Threads

Cores / HW Threads = number of files that can be build + 2

Practically you are going to be limited by how many cores and threads you can buy in a single system. The +2 is to maintain system responsiveness even under heavy load, in practice it doesn’t matter as we generally have more files than cores.

Cores and HW threads, actually do your compiling and building, so the more you have the more in theory you can get done at once. So as long as you have the disk and memory to feed them well enough, you want the most you can buy. This is where you leave the common world of desktops, its time to take a walk over to server aisle of the shop.

The reason is 2P or not 2P (4P!), no not a misquote of Shakespeare but the P stands for processors, because whilst you might get single processor with up 12 HW threads in x86 land, thats just not enough. Given the characteristics of iterative builds, as long as we the RAM we scale fairly horizontally (number of cores) well. So more processor sockets,the faster stuff gets done.

Current a 2P Xeon system will get you 32 HW threads, a 4P AMD system will give you 48 HW threads. Thats ALOT of processing power over a bog standard ‘workstation’ and it really shows in building and compiling. Its may seem expensive, but it makes development much more pleasant and efficient, if a build takes too long, coders will loose their place “in the zone” (lolcat’s is just too damn addictive). The faster they get back to the problem, the better.

The other point in favor of server hardware is reliability, as a general rule, servers system run for months and months without an issue. That often can’t be said for ‘extreme’ desktop platforms.

There are several issues,

  1. Its not representative of PC gamer rigs, if that a problem, simple add another PC gamer box to you devs desk.
  2. Noise, servers are designed to run in data centers, where ear protectors are standard equipment in many cases. Not ideal for your quiet office, there are two solutions, one keep all the machines in a quiet room and use keyboard/mouse/video/audio/usb extenders to each desk or buy special quiet server cases to set at the coders desk. Of course there is always fashionable ear protectors as well…

Disks

Disk = (low write speeds (yay for RAM DISK) + high read speed to fill the disk cache ) + fast boot/load + fast source control update (fast small file reads and writes)

  • If you’ve set up the RAM sizes as indicated above, normal building won’t actually need the disk much as it will be largely in RAM
  • We all like things to boot and load fast, and also for those first or full build situations we do want that to be relatively fast
  • Source control system can be nasty for filesystems, often checking and updating gigabytes spreading across hundreds of thousands of files. Lots of IOPS and out of order r/w are good

Two solutions here, 1 for each programmer and one for a small group.

  • Each programmer is simple, buy a few PCI-E SSD and RAID10 them together. This gives you your boot and local disk, fast enough to fill your RAM cache at break neck speed. Shouldn’t need to be too big, as most things will live in the next item.
  • Amongst a small group of developers, connect a SAN via 10GB ethernet or infiniband to each machine, a SAN is a machine on the network which only job is to provide very fast, very safe storage. Each SAN has its own complete system of caches and software to allow out of order delayed writes (safely), so that the main limit is network throughput, hence the 10GiB networks. Also using RAID replication and ERC technology means data can survive disk failures and other problems. There are stupidly expensive enterprise SANs which cost ouch amounts of cash, however luckily that’s all smoke and mirrors to a large degree. Using a system like OpenIndiana, with some SSDs, a dozen spinning rusts (AKA traditional hard drives) and perhaps a DDRDrive accelerator, you can for moderate amounts, have speed and iops to spare, all sitting in a nice little server connected directly to a bunch of programmers machines via a fast network.

In conclusion

CFO and bank managers will hate me, but I stick by my belief that your staff, your team are the most important part of any project, and that means buying them the best equipment so that they spend there time using that talent and not watching busy cursors and build outputs scrolling slowly up the screen.

Their is more to hardware than whats here, from displays to build machines, project progress displays, video conferences, etc. this has just been the focus on the very smallest but most personal part of a realistic hardware capitol expense for making worlds.

Today to buy a 64GiB RAM, 48 Core AMD workstation with multiple PCI-E SSDs per programmer is going to come as a shock, as we have got used to skimping on hardware, forcing the team to with work with little better than gaming platforms. Its never been true in many other performance computing sectors, and we need to realize thats what we do, we make some of the most complex performance software in the world. Just be glad we don’t need arrays of super-compute machines to build our worlds, well yet…

Its worth noting this isn’t just a spend spend spend article (well it is but… :P), I’ve spent years looking into this aspect of builds and the basic approach here works for lower expense systems to. Coder systems need balance, lots of cores on their own won’t cut it, RAM and disks are equally important, so even lower down the cost scale, don’t just accept an office or gamers rig, we have very different performance characstics than normal machines.

Perhaps next up is the much harder topic of software and configuration for build systems, if only that was as easy as spend spend spend!


Sound Libraries: Creative Springboards

Original Author: Michael Taylor

Sound libraries are key tools of sound designers everywhere, while the way they are used differ between individuals, mostly they are used to sweeten or enhance original creations. Recently, while working on a project I was caught short on a couple of obscure sound sources, and resorted to downloading a selection from my favourite sound library websites . While editing these sounds, for some reason I was reminded of a developer I worked with who solely used library effects, and wouldn’t entertain the idea of my even editing them.

This in turn reminded me of the numerous times in my earlier days of advertising my services on various game dev forums where pretty much all audio design discussion (by non-audio types) frequently turned to ‘just download it from a free library/website’, as well as a couple of discussions with some friends who had colleagues that would just drop a sound straight from a professional library into the game.

20th Century Fox Library, a library included with software such as Logic Studio or Sound Forge, or a downloaded boutique collection from independent vendors such as those listed at the bottom of this article.

One of the Hollywood produced sound libraries, as collated by their sound engineers over the years

Real Unusual I/O Slowdowns (Part 1)

Original Author: Drew Thaler

I thought it’d be fun to run a series of posts on actual problem I/O patterns that I’ve seen, heard about, or debugged in games or game-related situations. This is the kind of information that is usually kept anecdotally with a few experts, not often shared, and often forgotten when the experts move on. Perfect topic for a dev blog post, don’t you think?

(The title refers to the fact that these are all real-world situations, they often cause unusual or unexpected behavior, and sometimes they’re just real unusual!)

We’ll start with a simple one.

The Problem

A big AAA console game has added a new feature which means reading from multiple partitions on the hard disk at the same time. But as soon as they started to do this, the developers found that load times went to hell: a level which used to take 5 seconds to load now takes 20, and a level which used to take 10 seconds now takes 40!

Why did it get so slow?

Hard disk partitions and seeking

In a nutshell, the problem boils down to reading from (or writing to) multiple partitions on the same disk at the same time.

Let’s start by visualizing a hard drive. A hard drive is a platter, which is basically a flat donut shape. On this platter is a drive head used for reading and writing which moves back and forth. (Sure, real hard disks usually have multiple platters; but for the purposes of seeking, a multi-platter disk works almost exactly like a single-platter disk – so we’ll stick with the simple model.)

Hard drive platter with drive head.

A simple conceptual model of a hard drive.

Great! Now here are two partitions on this disk. We’ll call these C and D. The names are arbitrary, of course – you could think of these as Windows drive letters, or you could read them as “cache” and “data”. Let’s say C is small, and D is large.

Hard drive platter, showing two partitions C and D.

Partitions C and D.

Finally, suppose the data we want to read is on both C and D. As we load files, sometimes the data comes from C and sometimes it comes from D. Watch what happens to the drive head:

Hard drive read animation, alternating reads from C and D

Alternating reads from C and D.

Hmm, the head is sure moving around a lot, isn’t it? That’s a lot of seeking. And as I always say: seek time is wasted time.

Perhaps you’re looking at this diagram and saying: sure, but you’ve just illustrated the worst case. In the best case, the data from D will be right next to C. Right?

Well, that’s true — but in fact, this worst-case is actually the most common case! Filesystems tend to store data on the outer circumference of the disk, since logical block addressing goes from outside to inside. So, consider a hard disk partitioned in exactly this way, with C on the inside and D on the outside… which matches at least one of the current consoles out there. If the drive starts out initially empty, then the first batch of data that gets written to D will wind up all the way on the outside, as far away from C as possible. And right there is your worst case: it turns out it will happen automatically on any brand-new console, fresh out of the box.

OK, so let’s see how bad it is. How much time does it take to move the drive head all the way across the platter?

In this case we’re talking about a game console, not a high-end PC. A typical drive in this console generation (which, remember, started way back in 2005, so we’re talking about hard drives from that time) is pretty basic – 5400rpm, with an average seek time of 12.5 ms.

But we don’t want the average time. The strokes you see above are not average… in fact, they’re pretty much the worst case: they are what’s called “full-stroke” seek times, going across the entire width of the platter. The cost of a full-stroke seek is generally a little less than twice the average seek.

For these console hard drives, a full-stroke seek is about 22 ms. Expensive, but not so bad if it’s just an isolated or somewhat rare event.

But what if you are hitting each partition almost equally? Reading C – D – C – D – C – D? Then you might be paying that 22 ms cost over and over again. It’ll take just over two hundred reads like this – practically nothing in a modern, complex AAA game – to add 5 seconds of load time. Five hundred reads: 10 seconds. A thousand reads, just over 20 seconds.

Ouch!

Where did this come up?

This I/O pattern occurred in the real world when a game was reading from an optical disc cache on one partition, while at the same time patching the cached data with content from a second partition.

Loading of course worked great when all the data was on one partition, but the initial attempt at the patching system caused exactly the behavior I described above.

How would you fix it?

The ultimate answer is, of course: don’t do that. 😉 But that’s not too helpful, is it? Here are some more concrete steps that you might take to fix this problem:

Try to organize your loading scheme so that you load everything from C first, and then everything from D. In the theoretical ideal case you could reduce the cost to just one full stroke seek.

If you can’t easily do that (and to be honest, chances are you can’t!) then at least keep the number of transitions down: load a batch from C, then a batch from D, then a batch from C. For every N reads you can batch together you’ll divide the number of full-stroke seeks by N. Two reads from C and two reads from D, and there will be half as many full-stroke seeks. Three and three, and there will be one third as many. And so on.

If one or more of the data streams is easy to buffer ahead (a movie, or other linear stream of data), throw more buffering at it so that you’re reading or writing less frequently. This is effectively the same as the previous one, because it batches the reads more. It’s also the approach we ultimately used with the patching system, since the patches were stored linearly.

Consider migrating data from one partition to the other for better locality. For example, you could automatically copy from D to C when the drive is otherwise idle. (If C is a cache, this would be a prefetch.)

And of course, as a last resort, you may be able to simply suffer the penalty when it occurs – but hide it by loading in the background, well ahead of time.

Some Notes

This post is only the first one in a series. I hope this was entertaining and not too basic; problems like these may seem obvious in hindsight, but they can certainly catch you by surprise! If you’re hungry for something crazier, don’t worry, I’ve got some real doozies lined up for future posts.

Next time: An unexpected worst-possible I/O interleave.


Finding uninitialized memory and corruption

Original Author: Bernat Muñoz

A few weeks ago I was trying to find one of the bugs related to memory corruption on the game we are working. Actually, the bug was first checked by a co-worker, but, as I’m a bit more experienced on that kind of bugs, I ended up being the guy working on it. He asked me how I did usually find those kind of bugs, so instead of going through a lengthy explanation, I promised I would write it down here, so more people would benefit from it, and it would probably be better organized.

The game crashes in random code spots

This is one of the usual indicators that something is either not initialized, or being corrupted by some code writing were it shouldn’t. I know it’s not a very precise indicator, but a good clue is that the code is crashing somewhere where it should never do. To make it simpler, somewhere where it doesn’t make sense. At all. It’s what one of the worst things you get when using C/C++. First, it’s interesting to actually distinguish between memory corruption and memory that is not initialized, as to which behaviors I expect to see. Take this with a grain of salt:

  • Using memory not initialized: Data with unexpected values, even at game start up, that don’t seem entirely possible but don’t actually fit into another type. For example, INF / NAN values on floating points values after a few operations. If you’re working with Visual Studio / different executable setups, a clear indicator is the game behaving differently on debug/release builds.
  • Memory corruption: Data with unexpected values, usually after a few game loop iterations. A good way to identify it, is to check specific portions of data on a memory viewer to see if a strange value is written over data, or if it’s being changed when it shouldn’t.

The first thing I like to do, is to check if I’m able to reproduce it. Most of this kind of bugs never reproduce, or reproduce in a spot not related to where the problem is. Even if that, that can give valuable clues, like which values are on memory when it crashes, or which behavior to expect on correct / incorrect execution. This kind of bugs are really hard to chase properly, so every bit of information you get is valuable.

I need tools!

So, can we use some tools to make the process easier? Yes, of course. What I recommend checking:

  • Enable static analysis on your compiler: It can catch lots of stupid bugs automatically, and it’s a non-consuming time way to reduce the amount of problems you might find in the future. If your compiler doesn’t support the feature, or you want a second opinion, I recommend the excellent cppcheck.
  • Valgrind: If you’re using Linux / MacOS, is a good option, or so I’ve heard. Never used it myself, as I primarily develop on Windows. I tried to use Valgrind + WINE, as explained here, without much success. Any way, I’ve heard so much good stuff about it, that I recommend checking it 🙂
  • Dr Memory: I actually tried using it on one of the most evasive bugs we had on our latest project. You can find it’s web here. Detects un-initialized memory, erroneous access, and a few more. From the few days I used it, it seems a good tool, in combination with a few scripts / GUI to actually check the results properly. I expect to use it again in the future 🙂
  • DUMA: I’ve tried to use it several times in the past, without success. The idea is simple, mark memory that you shouldn’t be accessing as protected, so you get immediate errors when accessing that memory. You can get it here, if you actually get it working (or know a good replacement), I’ll be glad to hear.
  • gFlags: A coworker (hey, Jorge!) told me about this a few weeks ago. The Windows Debugging Tools (32 bit version here), include a little application that can set up your game’s heap as protected, and detect accessing parts of the heap you shouldn’t. It quite useful as you can get the exception while debugging from Visual Studio, pointing directly were you screwed up.

Desperate ideas

Another “technique” I’ve seen used a lot (and used it myself), is just dividing you application in half, and checking if the bug still reproduces (check thoroughly, memory bugs are difficult to reproduce). Of course dividing it isn’t as trivial as it might seem, but it’s doable. This can go from just avoiding to load certain type of assets, for example, don’t load the physics library and all related geometry, to skipping certain parts altogether, for example, if the game crashes when going from level 3 to 4, does it happen if you go directly from 2 to 4? Look for patterns!

It’s also quite handy to check memory with a tool that let’s you see what’s really there. Knowing a bit about debugging magic numbers helps. A lot.

Busy, busy, busy

So I’m quite busy now, hope this was a good starting point for those starting the exciting world of debugging memory errors </irony>.


Points, Vertices and Vectors

Original Author: Wolfgang Engel

This post covers some facts about Points, Vertices and Vectors that might be useful. This is a collection of ideas to create a short math primer for engineers that want to explore computer graphics. The resulting material will be used in future computer graphics classes. Your feedback is highly welcome!

Points

A 3D point is a location in space, in a 3D coordinate system. We can find a point P with coordinates [Px, Py, Pz] by starting from the origin at [0, 0, 0] and moving the distance Px, Py and Pz along the x, y and z axis.

Two points define a line segment between them, three points define a triangle with corners at those points, and several interconnected triangles can be used to define the surface of an object; sometimes also called mesh.

Points that are used to define geometric entities like triangles, are often called vertices. In graphics programming, vertices are an array of structures or a structure of arrays and not only describe a position but also include other data like for example color, a normal vector or texture coordinates.

The difference of two points is a vector: V = PQ

Vectors

While a point is a reference to a location, a vector is the difference between two points which describes a direction and a distance -length-, or a displacement.

Like points, vectors can be represented by three coordinates. Those three values are retrieved by subtracting the tail from the vector from its head.

Δx = (xh – xt)

Δy = (yh – yt)

Δz = (zh – zt)

Vector components

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1 – Vector components Δx, Δy and Δz

Two vectors are equal if they have the same values. Thus considering a value as a difference of two points, there are any number of vectors with the same direction and length.

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2 – Instances of one vector

The difference between between points and vectors is reiterated by saying they live in a different space, the Euclidean space . Read more in [Farin].

The primary reason for differentiating between points and vectors is to achieve geometric constructions which are coordinate independent. Such constructions are manipulations applied to objects that produce the same result regardless of the location of the coordinate origin.

 

Scalar Multiplication, Addition and Subtraction of Vectors

A vector V can be multiplied by a scalar. Multiplying by 2 doubles the vectors components.

Similarly dividing the vector by 2 halves its components. The direction of the vector remains unchanged, only its magnitude changes.

The result of adding two vectors V and W can be obtained geometrically.

 

 

 

 

 

 

 

Figure 3 – Adding two vectors

Placing the tail of w to the head of V leads to the resulting vector, going from V‘s tail to W‘s head. In a similar manner vector subtraction can visualized.

 

 

 

 

 

 

 

 

Figure 4 – Subtracting two vectors

Similar to addition, the tail of the vector that should be subtracted –W– is placed to the head of V. Then the vector that should be subtracted is negated. The resulting vector runs from V‘s tail to W‘s head.

Alternatively, by the parallelogram law, the vector sum can be seen as the diagonal of the parallelogram formed by the two vectors.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5 – Parallelogram rule

The vectors VW and V + W are the diagonals of the parallelogram defined by V and W. Arithmetically, vectors are added or subtracted by adding or subtracting the components of each vector.

All the vector additions and subtractions are coordinate independent operations, since vectors are defined as difference of points.

 

Homogeneous Coordinates

Representing both points and vectors with three coordinates can be confusing. Homogeneous coordinates are a useful tool to make the distinction explicit. Adding a fourth coordinate, named w, allows us to describe a direction or a vector by setting this coordinate to 0. In all other cases we have a point or location.

Dividing a homogeneous point [Px, Py, Pz, Pw] by the w component leads to the corresponding 3D point. If the w component equals to zero, the point would be infinitely far away, which is then interpreted as a direction. Using any non-zero value for w, will lead to points all corresponding to the same 3D point. For example the point (3, 4, 5) has homogeneous coordinates (6, 8, 10, 2) or (12, 16, 20, 4).

The reason why this coordinate system is called “homogeneous” is because it is possible to transform functions f(x, y, z) into the form f(x/w, y/w, z/w) without disturbing the degree of the curve. This is useful in the field of projective geometry. For example a collection of 2D homogeneous points (x/t, y/t, t) exist on a xy-plane where t is the z-coordinate as illustrated in figure 6.

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6 – 2D homogenous coodinates can be visualized as a plane in 3D space

Figure 6 shows a triangle on the t = 1 plane, and a similar triangle much larger on a distant plane. This creates an arbitrary xy plane in three dimensions. The t- or z-coordinate of the plane is immaterial because the x- and y-coordinates are eventually scaled by t.

Homogeneous coordinates are also used to create a translation transform.

In game development, some math libraries have dedicated point and vector classes. The main distinction is made by setting the fourth channel to zero for vectors and one for points [Eberly].

 

Pythagorean Theorem

The length or magnitude of a vector can be obtained by applying the Pythagorean Theorem. The opposite -b- and adjacent -a- side of a right-angled triangle represents orthogonal directions. The hypotenuse is the shortest path distance between those.

 

 

 

 

 

 

 

 

 

 

Figure 7 – Pythagorean Theorem

It helps thinking of the Pythagorean Theorem as a tool to compare “things” moving at right angles. For example if a is 3, b equals 4, then c equals 5 [Azad].

The Pythagorean Theorem can also be applied to right-angled triangles chained together.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 8 – Pythagorean Theorem with two triangles chained together

Replacing leads to

is now written in three orthogonal components. Instead of lining the triangles flat, we can now tilt the green one a bit and therefore consider an additional dimension.

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 9 – Pythagorean Theorem in 3D

Renaming the sides to x, y and z instead of a, b and d we get:

This works with any number of dimensions.

The Pythagorean Theorem is the basis for computing distance between two points. Consider the following two triangles:

 

 

 

 

 

 

 

 

 

 

 

Figure 10 – Pythagorean Theorem used for distance calculations

The distance from the tip of the blue triangle at coordinate (4, 3) to the tip of the green triangle at coordinate (8, 5) can be calculated by creating a virtual triangle between those points. Subtracting the points leads to a 2D vector.

Δx = (xhead – xtail)

Δy = (yhead – ytail)

In this case

Δx = 8 – 4 = 4

Δy = 5 – 3 = 2

Extending the idea to three dimensions shows the well-known equation:

 

Unit Vectors

A unit vector has a length or magnitude of 1. This is a useful property for vector multiplications, because those consider the magnitude of a vector and the computation time can be reduced if this magnitude is one (more on this later). A unit column vector might look like this:

and

Converting a vector into a unit form is called normalizing and is achieved by dividing a vector’s components by its magnitude. Its magnitude is retrieved by applying the Pythagorean Theorem.

An example might be:

 

Cartesian Unit Vectors

Now that we have investigated the scalar multiplication of vectors, vector addition and subtraction and unit vectors, we can combine those to permit the algebraic manipulation of vectors (read more at [Vince][Lengyel]). A tool that helps to achieve this is called Cartesian unit vectors. The three Cartesian unit vectors i, j and k are aligned with the x-, y- and z-axes.

Any vector aligned with the x-, y- and z-axes can be defined by a scalar multiple of the unit vectors i, j and k. For example a vector 15 units long aligned with the y-axis is simply 15j. A vector 25 units long aligned with the z axis is 25k.

By employing the rules of vector addition and subtraction, we can compose a vector R by summing three Cartesian unit vectors as follows.

This is equivalent to writing R as

The magnitude of R would then be computed as

Any pair of Cartesian vectors such as R and S can be combined as follows

An example would be

 

Vector Multiplication

Vector multiplication provides some powerful ways of computing angles and surface orientations. While the multiplication of two scalars is a familiar operation, the multiplication of vectors is a multiplication of two 3D lines, which is not an easy operation to visualize. In vector analysis, there are generally two ways to multiply vectors: one results in a scalar value and the other one in a vector.

Scalar or Dot Product

Multiplying the magnitude of two vectors |R| and |S| is a valid operation but it ignores the orientation of the vectors, which is one of their important features. Therefore we want to include the angles between the vectors. In case of the scalar product, this is done by projecting one vector onto the other.

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 11 – Projecting R on S

The projection of R on S creates the basis for the scalar product, because it takes into account their relative orientation. The length of R on S is

Then we can multiply the projected length of R with the magnitude of S

or commonly written

The . The following figure shows a number of dot product scenarios.

 

 

 

 

 

 

 

 

 

 

 

 

Figure 12 – Dot product

The geometric representation of the dot product is useful to imagine how it works but it doesn’t map well to computer hardware. The algebraic representation maps better to computer hardware and is calculated with the help of Cartesian components:

There are various dot product terms such as are zero because the cosinus of 90 degrees is zero. This leads to

Finally, terms with two vectors that are parallel to themselve lead to a value of one because the cosinus of a degree of zero is one. Additionally the Cartesian vectors are all unit vectors, which leads to

So we end up with the familiar equation

An example:

Comparing the two ways of calculating the scalar product shows the same result:

Solving for  leads to the angle between the two vectors:

The angle returned by the scalar or dot product ranges between is always the smallest angle associated with the geometry.

 

Scalar Product in Lighting Calculations

Many games utilize the Blinn-Phong lighting model (see Wikipedia; ignore the code on this page). A part of the diffuse component of this lighting model is the Lambert’s Law term published in 1760. Lambert stated that the intensity of illumination on a diffuse surface is proportional to the consine of the angle between the surface normal vector and the light source direction.

Let’s assume our light source is located in our reference space for lighting at (20, 30, 40), while our normal vector is normalized and located at (0, 11, 0). The point where the intensity of illumination is measured is located at (0, 10, 0).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 13 – Lighting Calculation

The light and normal vector are calculated by subtracting the position of the point where the intensity is measured -representing their tails- from their heads.

Instead of using the original light vector, the following scalar product normalizes the light vector first, before using it in the lighting equation.

To test if the light vectors magnitude is one:

Plugging the unit light vector and the unit normal vector into the algebraic representation of the scalar product.

Now solving the geometrical representation for the cosine of the angle.

In case the light and the normal vector are unit vectors, the result of the algebraic scalar product calculation equals the cosinus of the angle. The algebraic scalar product is implemented in the dot product intrinsic available for the CPU and GPU. In other words, in case the involved vectors are unit vectors, a processor can calculate the cosine of the angle faster. This is the reason why normalized vectors might be more efficient in programming computer hardware.

Following Lambert’s law, the intensity of illumination on a diffuse surface is proportional to the consine of the angle between the surface normal and the light source direction. That means that the point at (0, 10, 0) receives about 0.4082 of the original light intensity at (20, 30, 40) (attenuation is not considered in this example).

Coming back to image 12, in case, the unit light vector would have a y component that is one or minus one and therefore its x and y component would be zero, it would point in the same or opposite direction as the normal and therefore the last equation would result in one or minus one. If the unit light vector would have a z or x component equaling to one and therefore the other components would be zero, those equations would result in zero.

 

The Vector Product

Like the scalar product, the vector or cross product depends on the modulus of two vectors and the angle between them, but the result of the vector product is essentially different: it is another vector, at right angles to both the original vectors.

and

For an understanding of the vector product R and S, it helps to imagine a plane through those two vectors as shown in figure 14.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 14 – Vector Product

The angle . There are two possible choices for the direction of the vector, each the negation of the other; the one chosen here is determined by the right-hand rule. Hold your right hand so that your forefinger points forward, your middle finger points out to the left, and your thumb points up. If you roughly align your forefinger with R, and your middle finger with S, then the cross product will point in the direction of your thumb.

Figure 15 – Right-Hand rule Vector Product

The resulting vector of the cross product is perpendicular to R and S, that is

and

The two vectors R and S can be orthogonal but do not have to be. This makes the vector product an ideal way of computing normals. A property of the vector product that will be covered later is, that the magnitude of T is the area of the parallelogram defined by R and S.

Let’s multiply two vectors together using the vector product.


There are various vector product terms such as . This leaves

The other products between the unit vectors can be reasoned as:

Those results show, that the commutative multiplication law is not applicable to vector products. In other words

Applying those findings reduces the vector product term to

Now re-grouping the equation to bring like terms together leads to:

To achieve a visual pattern for remembering the vector product, some authors reverse the sign of the j scalar term.

Re-writing the vector product as determinants might help to memorize it as well.

A 2×2 determinant is the difference between the product of the diagonal terms. With determinants a “recipe” for a vector product consists of the following steps:

1. Write the two vectors that should be multiplied as Cartesian vectors

2. Write the cross product of those two vectors in determinant form, if this helps to memorize the process; otherwise skip to step 3.

3. Then compute by plugging in the numbers into

A simple example of a vector product calculation is to show that the assumptions that were made above, while simplifying the vector product, hold up.

To show that there is a sign reversal when the vectors are reversed , let’s calculate the cross product of those terms.

The i and k terms are both zero, but the j term is -1, which makes . Now reversing the vector product

Which shows

 

Deriving a Unit Normal Vector for a Triangle

Image 16 shows a triangle with vertices defined in anti-clockwise order. The side pointing towards the viewer is defined as the visible side in this scene. That means that the normal is expected to point roughly in the direction of where the viewer is located.

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 16 – Deriving a Unit Normal Vector

The vertices of the triangle are:

P1 (0, 2, 1)

P2 (0, 1, 4)

P3 (2, 0, 1)

The two vectors R and S are retrieved by subtracting the vertex at the head from the vertex at its tail.

Δx = (xh – xt)

Δy = (yh – yt)

Δz = (zh – zt)

Bringing the result into the Cartesian form




 

It is a common mistake to believe that if R and S are unit vectors, the cross product will also be a unit vector. The vector product equation shows that this is only true when the angle between the two vectors is 90 degrees and therefore the sinus of the angle theta is 1.

Please read in [Van Verth] about CPU implementation details.

Areas

The vector product might be used to determine the area of a parallelogram or a triangle (with the vertices at P1 – P3). Image 17 shows the two vectors helping to form a parallelogram and a triangle.

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 17 – Deriving the Area of a Parallelogramm / Triangle with the Vector Product

The height h is , therefore the area of the parallelogram is

This equals the magnitude of the cross product vector T. Thus when we calculate the vector product of R and S, the length of the normal vector equals the area of the parallelogram formed by those vectors. The triangle forms half of the parallelogram and therefore half of the area.

area of parallelogram =

area of triangle =

or

area of triangle =

To compute the surface area of a mesh constructed from triangles or parallelograms, the magnitude of its non-normalized normals can be used like this.

The sign of the magnitude of the normal shows if the vertices are clockwise or counter-clockwise oriented.

References

[Azad] Kalid Azad, “Math Better Eplained”,  

Step by Step Optimisation

Original Author: Tony Albrecht

So, you’ve been given a task to optimise a system in a game. The original author(s) are no longer working in your studio (that’s always the case isn’t it?) and you need to lube up the code and insert a rocket. How do you go about it? Here are some step by step instructions to guide you through the optimisation process…

Grok your platform

In order to optimise anything, you need to first understand the platform you’re running on. This is more than just the hardware (although that is very important). It’s also the operating system and even the engine you’re using – anything that you can’t change is your platform. These things are your axioms that you have to work with and you need to use them to build your optimal solution. They are also your constraints, your limits. You need to know what it does well and what it does badly. Then do lots of the first and as little as possible of the latter.

Benchmark

To optimise something you must first measure it. If you have no benchmark, how will you know if your changes have improved performance or hindered it? An accurate benchmark can be quite problematic in a game – especially if (like most games) it is non-deterministic. If you can’t minimise frame by frame variation (pausing? Freeze the AI? A test level?) then you’ll either need a longer term average in a fairly consistent in-game region or a plot of those times over range of time. An even better solution is to use edit and continue to compile and relink the code on the fly so you can watch the performance worm rise or dip according to your changes (although this is only good for fairly local changes). Once you have established a decent benchmark you can move on to the next step.

Profile

Run your favourite profiler over the code and see if there is anything glaringly obvious that would benefit from optimisation – keep an eye out for excessive memory allocation/deallocation (if you’ve never traced through a malloc before I suggest you try it. Then try and tell me that its ok to malloc/free code in an inner loop), high frequency functions (small functions that are called many times in succession or, even worse, alternating with other small functions), inefficient data use, and obscure implicit construction or conversions (Oh, the evils of objects!). By all means fix the easy obvious stuff, but don’t delve down too far – before you shake the foundations of the codebase you need to understand what it does.

What

So, what is the code trying to do? Or, more importantly, what problem was the programmer who wrote this code trying to solve? What was their intention? Don’t get bogged down by implementation details – try and see it holistically. What are the inputs and the outputs? What are the transformations involved? You need to understand these first before you touch anything – yes, I appreciate the urge to optimise that obviously inefficient code, but will it really make a difference in the end? If you are going to rewrite parts of this code, then you’d better be sure you know what’s going on. Don’t blindly optimise without understanding what that code is doing.

How

Once you know what the problem being solved is, you need to delve down into it and see how it is solving it. This is where you look the structure of the code, the nitty gritty transformations of the data that compromise the guts of the algorithm. Try not to be too critical of the code or data just yet – ignore the over abundance of virtuals and the eye-stabbing templates, and just try to comprehend what is going on. Trace the data through the bowels of the code and watch what happens to it – where it goes in, where it comes out and what touches it in the middle.

Where

Now that you have developed some empathy for the code, profile it again. You should now understand much more of what is going on and the profiled information will make more sense.  Figure out where the performance matters and where efficiency is irrelevant. Look at not just the low level but the high level algorithm – not just the code that runs slow, but the code that calls the slow code. Where do you need to fix it to make it go fast?

Why

Now that we know what and how, we should consider why. Why was the code/data structured in that particular way?  You should always assume that the coder there before you knew what they were doing. That little nugget of crazy you’ve just stumbled across was probably put there for a reason – if you can’t figure out the reason then you probably don’t understand the code enough. Sure, the previous coder may have been a mouth breathing moron, but in all likelyhood they were about as smart as you and you’ll make the same mistakes if you don’t learn from theirs.

Alternatives

Once you know why, you should consider the alternatives – can you do the same thing in a different way? You understand the platform, you know what the code has to do, you know what the old code did and why – is there a better way? Take a step back and think of the big picture. You have the benefit of looking at a completed system. You know everything that the code has to do, whereas the original author(s) only had an idea of where they were going.  You also have benefit of a complete test suite – the running game. Given the gift of a complete overview of the system, can you do it in a better way? There are often many alternatives: Can you distribute the calculations over many frames? Can you reduce the fidelity of the calculations? (does it really need to be done in double precision?) Can you cache some of the information? Interpolate? Use level of detail? Do less? Run it on a different thread, concurrently or deferred?

Optimise

Now is the time to optimise. This doesn’t necessarily mean cracking open the assembler or getting dirty with a SIMD processor. You have alternatives from the last step, try those that seem most appropriate. If there is no alternative, or it is the best option, look at your low level optimisation. You know, the fun stuff; streaming data, SOA formats, SIMD processing, loop unrolling, intrinsics.

But, a word of advice before you go crazy – keep it simple. Simple code is easier to optimise, easier to maintain, and just plain easier for the next poor sap that comes along to deal with.

 

 


Implementing a True realloc() in C++

Original Author: Michael Tedder

Introduction

As anyone who’s programmed using C++ would know, the language only provides two operators for memory management: new and delete.  No realloc()-like operator or function is available.  One can certainly use std::vector or other similar container which wraps the management of the array, handling the dynamic resizing within itself, however the resize implementation is usually rather “dumb”.  Let’s take a quick look at what steps a typical std::vector resize implementation would perform:

  1. Allocate a new array 50% larger than the existing array size (operator new[]),
  2. Copy all of the array’s contents into the new array (operator =),
  3. Free the old array (operator delete[]),
  4. Additionally, if the type has any constructors or destructors, these are also called for every element during relocation.

Admittedly, given the fact that only the new[] and delete[] operators are available in C++, this is the best that can be accomplished.  While this implementation is fine for most general use cases, game development is typically not general, and there are also real-life cases with data sets where this implementation can be extremely inefficient and sometimes even prohibitively expensive.

Comparing std::vector To realloc()

As stated above, when resizing an array with std::vector, there must be enough available free memory to store a complete copy of the array, plus an extra 50% in order to be able to expand it.  On embedded systems with very little total memory, this can be a serious issue.  If a 512KB sized array needed to be expanded, even for only a few extra elements, a contiguous free block of 768KB would be necessary, temporarily requiring a total of 1.25MB of memory.  This could be rather difficult on a system that has, for example, only 4MB of memory available.

The copying of the data from the old array into the new one can also potentially be a slow operation, depending on the size of the data set.  If the array is hundreds of megabytes large, the time required to copy it is not minuscule, and could actually be quite slow if the OS has paged out any portion of that memory out to the swap file.  It is due to this copy that storing pointers directly to elements inside an array is not possible with std::vector because the array contents end up relocated elsewhere in memory after a resize.

Additionally, after the array contents have been relocated and the older array has been freed, a hole is created in the heap causing increased memory fragmentation.  This hole creation can be confirmed with the following test code, which pushes values into a std::vector until it resizes:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  
#include <iostream>
 
  #include <vector>
 
  using namespace std;
 
   
 
  int main(void)
 
  {
 
  	size_t cap;
 
  	vector<int> foo;
 
   
 
  	// push one element into the vector to allocate it
 
  	foo.push_back(1);
 
  	cap = foo.capacity();
 
  	cout << "ptr=" << &foo[0] << ", capacity=" << cap << endl;
 
   
 
  	// continue to push elements until the capacity is exceeded, forcing a resize
 
  	for (int t = foo.size(); t <= cap; t++)
 
  		foo.push_back(t);
 
   
 
  	cap = foo.capacity();
 
  	cout << "ptr=" << &foo[0] << ", capacity=" << cap << endl;
 
  	return 0;
 
  }

Upon execution, the output shows:

ptr=0x2192010, capacity=1

ptr=0x2192030, capacity=2

The difference between the pointers show that a hole of 32 bytes has been created at the original location of the vector. While 32 bytes by itself is not much, multiple resizes will result in the size of the hole increasing, as the array held by the vector moves upward in memory.

The C library has provided the realloc() function for resizing memory blocks for decades.  It allows a block previously allocated by malloc() to be expanded or shrunk in place, without relocation of the block contents, as long as there is enough free space available after the block to expand itself.  If there is not enough free space available after the block to perform the expansion, only then is the block relocated to a different location in memory.

Unfortunately realloc() cannot be directly used on arrays allocated with operator new[] in C++ as realloc() only handles low level memory allocation, and does not call any constructors or destructors that the type may contain.  It is possible to call constructors and destructors manually though, and this effectively gives us the ability to create a wrapper around realloc() which can work as expected with C++ arrays.

However, there is one remaining issue of the array count variable itself which is maintained internally by the compiler-generated code at runtime.  Since this is the most complicated part of this article, let’s start with it first.

The Magic Counter

Behind the scenes, the compiler stores a value containing the number of elements in the array which is placed directly before the actual pointer returned by operator new[] when that type has a non-trivial destructor.  This value is used to determine the number of elements to call the destructor on when the array is freed with operator delete[].[1] That last sentence is important — if the type being allocated has a trivial destructor, then the compiler does not need to call it on delete[], and hence no count is ever stored for that type.  Whether the type has a trivial constructor or not is irrelevant, because the number of constructors to call is known to the compiler as the size is directly specified by the parameter to operator new[].

Let’s break this down into a table so it’s a bit easier to understand:

Trivial Constructor	Trivial Destructor	Has Array Counter
 
  -----------------------	-----------------------	-----------------------
 
  Yes			Yes			No
 
  No			Yes			No
 
  Yes			No			Yes
 
  No			No			Yes

If you aren’t familiar with the differences between a trivial and a non-trivial constructor/destructor, you might want to read my previous post on the subject.

So what exactly is this array counter?  Unfortunately, there is no standard definition and it is left to however the compiler implements it.  While Microsoft Visual C++ uses an int type (32 bits) for 32-bit and 64-bit architectures, both gcc and clang use the size_t type which changes size based on the architecture.  By simply modifying this array count value, we can directly affect how many elements the compiler generated code should destroy when operator delete[] is called.

Let’s start by first writing a class which can access the array count for those types which have a non-trivial destructor.  Given an arbitrary pointer to a memory block allocated by operator new[], we should allow the array count to be retrieved and modified.  By routing all accesses through this class, any future changes that might need to be made to our implementation will be kept here in one place.

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  28
 
  29
 
  30
 
  31
 
  32
 
  33
 
  
template <typename T>
 
  class ArrayRealloc
 
  {
 
  public:
 
  // handle differences between compiler implementations
 
  #ifdef _MSC_VER
 
  	typedef int		CountT;
 
  #else
 
  	typedef size_t		CountT;
 
  #endif
 
   
 
  public:
 
  	ArrayRealloc(T *ptr)
 
  	{
 
  		// calculate pointer to base of memory
 
  		base = reinterpret_cast<CountT *>(ptr) - 1;
 
  	}
 
   
 
  	size_t count(void) const
 
  	{
 
  		// return count of items in array
 
  		return (size_t)*base;
 
  	}
 
   
 
  	void set(size_t newcnt)
 
  	{
 
  		// change count of items in array
 
  		*base = (CountT)newcnt;
 
  	}
 
   
 
  private:
 
  	CountT *	base;
 
  };

Next, let’s follow up with some simple test code which utilizes our new class:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  28
 
  
#ifndef __MACH__
 
   #include <malloc.h>
 
  #else
 
   #include <objc/malloc.h>
 
  #endif
 
  #include <cstdlib>
 
  #include <iostream>
 
  using namespace std;
 
   
 
  static int val = 0;
 
  class foocd
 
  {
 
  public:
 
  	int id;
 
  	foocd(void) : id(val++) { cout << "ctor: id=" << id << endl; }
 
  	~foocd(void) { cout << "dtor: id=" << id << endl; }
 
  };
 
   
 
  int main(void)
 
  {
 
  	foocd *cd = new foocd[5];
 
  	ArrayRealloc<foocd> arr(cd);
 
  	cout << "count=" << arr.count() << endl;
 
  	arr.set(3);
 
  	cout << "count=" << arr.count() << endl;
 
  	delete []cd;
 
  	return 0;
 
  }

The above test code allocates five elements of foocd, then uses our ArrayRealloc class to directly access the count of elements in the array.  We then call count() to obtain the count of elements, display it, change the number of elements in the array to three, and display the count again.  Finally, the array is deleted using the standard C++ operator delete[].  If you compile and run this code, you will get the following output:

ctor: id=0

ctor: id=1

ctor: id=2

ctor: id=3

ctor: id=4

count=5

count=3

dtor: id=2

dtor: id=1

dtor: id=0

Since we changed the count to three, the compiler-generated code for operator delete[] only deletes the first three elements in the array.  While this is not a true reallocation by itself, it does show that we can successfully modify the array count value held by the compiler.

Performing the Reallocation

In order to actually perform a reallocation on a memory block, we need to be able to request the memory manager or OS to reallocate it for us.  We will use the platform independent C library realloc() here since it is portable, but on Windows the HeapReAlloc() function is also available.  Of course, if you have your own memory manager, you will most likely want to use that instead.  Regardless of the API you use, it is important to remember that pointers should not be passed across different memory managers — for example, do not attempt to use a pointer that was obtained from malloc() with HeapReAlloc() as it won’t work!

So to be able to use realloc() on an array allocated with operator new[] and delete[], we will first need to override them with the C library’s malloc() and free() functions:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  
// override operator new[] and delete[] so they use the same memory manager as realloc()
 
  void *operator new[](size_t size) throw(bad_alloc)
 
  {
 
  	return ::malloc(size);
 
  }
 
   
 
  void operator delete[](void *addr) throw()
 
  {
 
  	::free(addr);
 
  }

Now we can revisit our ArrayRealloc class again, this time adding a resize() function to it:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  
template <typename T>
 
  class ArrayRealloc
 
  {
 
  public:
 
  	// ...
 
   
 
  	T *user(void) const
 
  	{
 
  		// return pointer to user memory
 
  		return reinterpret_cast<T *>(base + 1);
 
  	}
 
   
 
  	T *resize(size_t newcnt)
 
  	{
 
  		// reallocate the array to the specified size, plus the space needed for the array count
 
  		void *newptr = ::realloc(base, sizeof(CountT) + (newcnt * sizeof(T)));
 
   
 
  		// update our base pointer as realloc might have moved the memory block
 
  		base = static_cast<CountT *>(newptr);
 
   
 
  		// update the array count, and return the pointer to reallocated user memory
 
  		set(newcnt);
 
  		return user();
 
  	}
 
   
 
  	// ...
 
  };

Note that the pointer passed to realloc() is the true base of the memory block, already adjusted by the ArrayRealloc constructor.  The size passed also accounts for the extra size required for the array count value itself, and multiplies the specified count by the size of the type so we can specify the new count in elements rather than bytes.

Generic Object Construction and Destruction

Now that we can reallocate an array and control the number of elements in it, we need to be able to construct and destruct individual objects at will.  Although we can’t call the constructor for an object directly, it is possible to construct an object at a specific memory location by using placement new.  To destruct an object, C++ allows us to simply call the destructor directly.

As was noticeable from our test earlier, when the constructors for an array of objects are called with operator new[], they are called in order of increasing addresses.  When these objects are destroyed, operator delete[] calls them in reverse order, so we need to ensure we replicate that same functionality:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  28
 
  29
 
  30
 
  31
 
  
class GenericObject
 
  {
 
  public:
 
  	template <typename T>
 
  	static void construct(T *base, size_t count)
 
  	{
 
  		// create a pointer to the end of memory to begin construction
 
  		T *end = base + count;
 
   
 
  		// call the default constructor for each object using placement new, in increasing order
 
  		while (base < end)
 
  		{
 
  			new (static_cast<void *>(base)) T();
 
  			base++;
 
  		}
 
  	}
 
   
 
  	template <typename T>
 
  	static void destruct(T *base, size_t count)
 
  	{
 
  		// create a pointer to last object to destroy
 
  		T *end = base + (count - 1);
 
   
 
  		// call the destructor for each object directly, in reverse order
 
  		while (end >= base)
 
  		{
 
  			end->~T();
 
  			end--;
 
  		}
 
  	}
 
  };

Putting It All Together

Now that we have all of the components necessary to perform a proper reallocation of a C++ array, we lastly need to deal with how to detect if a given type has a trivial constructor and/or destructor or not.

The compiler-supplied type trait functions __has_trivial_constructor() and __has_trivial_destructor() have been available since Visual C++ 2005 and gcc-4.3/clang-2, and pretty much do exactly what their names say they do.  Both of these functions return boolean values and are available at compile time, so we can use them as arguments to a template:

1
 
  2
 
  
template <typename T, bool has_constructor = __has_trivial_constructor(T), bool has_destructor = __has_trivial_destructor(T)>
 
  class Reallocate;

This allows us to separately specialize each of the four possibilities that were described in our table earlier.  Let’s start with the easiest specialization, where a type has both a trivial constructor and destructor.  Since this type has no array count, it can be simply reallocated with a single call to realloc():

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  
template <typename T>
 
  class Reallocate<T, true, true>
 
  {
 
  public:
 
  	static T *reallocate(T *ptr, size_t newcnt)
 
  	{
 
  		// both trivial constructor and destructor, so just call realloc directly
 
  		return static_cast<T *>(::realloc(ptr, newcnt * sizeof(T)));
 
  	}
 
  };

Next, let’s tackle the case where a type has both a non-trivial constructor and destructor.  Since this case contains an array count, it will need to utilize our ArrayRealloc class, as well as ensure both the constructors are called when expanding and the destructors are called when shrinking:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  28
 
  29
 
  30
 
  31
 
  
template <typename T>
 
  class Reallocate<T, false, false>
 
  {
 
  public:
 
  	static T *reallocate(T *ptr, size_t newcnt)
 
  	{
 
  		ArrayRealloc<T> arr(ptr);
 
   
 
  		// do nothing if reallocating to the same size
 
  		if (arr.count() == newcnt)
 
  			return ptr;
 
   
 
  		// check if we are shrinking
 
  		if (newcnt < arr.count())
 
  		{
 
  			// destruct objects, then resize
 
  			GenericObject::destruct(ptr + newcnt, arr.count() - newcnt);
 
  			return arr.resize(newcnt);
 
  		}
 
  		else
 
  		{
 
  			// resize first
 
  			size_t oldcnt = arr.count();
 
  			arr.resize(newcnt);
 
   
 
  			// then construct the new objects
 
  			GenericObject::construct(arr.user() + oldcnt, arr.count() - oldcnt);
 
  			return arr.user();
 
  		}
 
  	}
 
  };

Now let’s handle the mixed cases.  Next up is when a type has a trivial destructor, but a non-trivial constructor.  Although this case does not require us to call any destructors on shrinking, we must call new constructors when expanding.  Because there are no destructors to call, and the compiler does not supply an array count in this case, we will need to somehow obtain the current size of the memory block to determine how many elements there exist in the array.

With Microsoft Visual C++, the function _msize() allows us to properly determine the size of a block allocated by malloc().  Unfortunately with the GNU C library and on OSX, no such function exists.  Both the functions malloc_usable_size() on GNU, and malloc_size() on OSX return the size of the allocated block plus any extra padding that may have been allocated by the memory manager.  This throws off the calculation of the number of elements, and cannot be used to properly call constructors on an expansion.

If you are using the GNU C library or OSX with malloc(), a simple solution to this problem would be to store the size of the allocated block inside the memory block itself during the malloc() operation, then reference that size when performing the reallocation.  Another solution would be to use a completely different memory manager other than malloc() and realloc(), but that is outside of the scope of this article.  A somewhat non-solution is to avoid this single specialization entirely, and never attempt to reallocate an array with a type that has a constructor but no destructor.

In any case, I will be using malloc_size() and malloc_usable_size() in the implementation below:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  28
 
  29
 
  
template <typename T>
 
  class Reallocate<T, false, true>
 
  {
 
  public:
 
  	static T *reallocate(T *ptr, size_t newcnt)
 
  	{
 
  		// first determine current array count (no count is stored since the destructor is trivial)
 
  #ifdef _MSC_VER
 
  		size_t cnt = _msize(ptr) / sizeof(T);
 
  #elif defined(__MACH__)
 
  		size_t cnt = malloc_size(ptr) / sizeof(T);
 
  #else
 
  		size_t cnt = malloc_usable_size(ptr) / sizeof(T);
 
  #endif
 
   
 
  		// do nothing if reallocating to the same size
 
  		if (cnt == newcnt)
 
  			return ptr;
 
   
 
  		// realloc to new size (there are no destructors to call)
 
  		T *newptr = static_cast<T *>(::realloc(ptr, newcnt * sizeof(T)));
 
   
 
  		// if we expanded, call constructors for those elements
 
  		if (newcnt > cnt)
 
  			GenericObject::construct(newptr + cnt, newcnt - cnt);
 
   
 
  		return newptr;
 
  	}
 
  };

The last mixed case is when a type has a trivial constructor but a non-trivial destructor.  No constructor needs to be called on expansion, but the destructors will need to be called on a shrink operation.  Since the compiler supplies an array count for this type, we can utilize our ArrayRealloc class:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  
template <typename T>
 
  class Reallocate<T, true, false>
 
  {
 
  public:
 
  	static T *reallocate(T *ptr, size_t newcnt)
 
  	{
 
  		ArrayRealloc<T> arr(ptr);
 
   
 
  		// do nothing if reallocating to the same size
 
  		if (arr.count() == newcnt)
 
  			return ptr;
 
   
 
  		// check if we are shrinking
 
  		if (newcnt < arr.count())
 
  		{
 
  			// destruct objects, then resize
 
  			GenericObject::destruct(ptr + newcnt, arr.count() - newcnt);
 
  			return arr.resize(newcnt);
 
  		}
 
  		else
 
  		{
 
  			// simply resize (no constructor to call)
 
  			return arr.resize(newcnt);
 
  		}
 
  	}
 
  };

Now we can wrap our template up with an easy to use one-line function to automatically detect the type when a pointer is passed as a parameter:

1
 
  2
 
  3
 
  4
 
  5
 
  
template <typename T>
 
  T *reallocate(T *ptr, size_t newcnt)
 
  {
 
  	return Reallocate<T>::reallocate(ptr, newcnt);
 
  }

And that’s it — we have a completed implementation of a C++ operator realloc[]-like function.

Testing It

Now that we have all of the code written, it only seems appropriate to ensure it works properly with a test.  We’ll add a simple function to return the count of elements in an array first, utilizing the ArrayRealloc class:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  
template <typename T>
 
  size_t count(T *ptr)
 
  {
 
  	ArrayRealloc<T> arr(ptr);
 
  	return arr.count();
 
  }

We’ll also utilize our foocd class from earlier, but this time with a different main() function that calls our new reallocate() function above:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  
int main(void)
 
  {
 
  	foocd *cd = new foocd[4];
 
  	cout << "ptr=" << cd << ", count=" << count(cd) << endl;
 
  	cd = reallocate(cd, 3);
 
  	cout << "ptr=" << cd << ", count=" << count(cd) << endl;
 
  	cd = reallocate(cd, 6);
 
  	cout << "ptr=" << cd << ", count=" << count(cd) << endl;
 
  	delete []cd;
 
  	return 0;
 
  }

When run, the following output will be displayed (of course, your pointers are likely to be different than the ones shown here):

ctor: id=0

ctor: id=1

ctor: id=2

ctor: id=3

ptr=0x20d7018, count=4

dtor: id=3

ptr=0x20d7018, count=3

ctor: id=4

ctor: id=5

ctor: id=6

ptr=0x20d7018, count=6

dtor: id=6

dtor: id=5

dtor: id=4

dtor: id=2

dtor: id=1

dtor: id=0

This test shows a C++ array being resized dynamically at runtime, properly calling the constructors and destructors for the class at each reallocation.  It can also be noted that unlike std::vector, the array was never relocated in memory as the pointer for the array did not change after any call to reallocate().

Of course the other three code paths should be tested as well, but instead of cluttering up this already long article with more test code, I will simply attach the complete code, tests and all, to the bottom of this post.  You are more than welcome to download it and test it for yourself.

Caveats

When realloc() cannot expand the memory block and performs a relocation, operator = is not called on the objects when they are moved.  If this is a problem for your code, check the section below on how this issue can be worked around.

It is also obvious from this implementation that it assumes that the count for the array is stored at the memory address located in the space directly before the array contents.  While it is completely possible that a future version of some compiler might do something completely different — for example, storing the counts for all arrays in a separate memory location not accessible by the user — in my opinion this is unlikely to ever happen.  Storing a tiny four- or eight-byte value in a different location means extra overhead in maintaining that memory for the compiler-generated code, and this will have an adverse impact on the application itself which is not in the interest of compiler designers.

Future Ideas and Possibilities

First you will most likely want to wrap this code into a container of your own to use as a replacement for std::vector.  I won’t go into that here, but if you would like me to cover this in a future post, please let me know in the comments.

Although it is not possible with the C library realloc() function, the HeapReAlloc() function available on Windows platforms allows an optional HEAP_REALLOC_IN_PLACE_ONLY flag to be specified, ensuring that a memory block is never relocated.  If there is not enough space to expand a block, HeapReAlloc() instead returns a failure code and leaves the block untouched.  By making use of this flag, it opens up a couple of possibilities:

  1. A container can be created allowing permanently fixed blocks, so maintaining pointers to the internal memory of the container remains safe, and iterators would never become invalid.  When more space is needed, the block is expanded as usual.  When it can no longer expand, an additional block is allocated and then the blocks are internally chained together.  As more items are added, the block on the end continues to expand, until it also reaches a maximum size.  The process is then repeated.  This container could provide performance similar to std::vector, but would be less wasteful on memory and cause less fragmentation than compared to std::list, with the added benefit that objects retain their fixed locations once they are inserted into the container.
  2. As described in the caveats above, operator = is not called on objects when using realloc().  When HeapReAlloc() returns a failure on expansion, a new block can be allocated manually, then objects can be assigned with operator =.  This would effectively provide an identical implementation as std::vector.

Conclusion

It is rather unfortunate that C++ never designed a true realloc() operator into the language itself, but by utilizing the code presented here, it is possible to work around this limitation.  Common sense alone should dictate that it can be orders of magnitudes faster to simply update a few words in memory to expand a memory block, rather than to copy the entire block to a different location in memory.  Perhaps one day the designers of the C++ language will notice that it is important to be as efficient with memory and performance as possible, and provide us a better mechanism to do so in a future revision.

As promised, here is the .zip containing a single .cpp file which contains all of the source posted in this article, plus the extra tests.  It will compile unmodified with Microsoft Visual C++ 2005/2008/2010, gcc-4.3 or higher, clang-2 or higher, and run as expected on Windows, Linux and OSX (and most likely other platforms as well!): ArrayRealloc.zip

References

“[38.7] How do compilers use ‘over-allocation’ to remember the number of elements in an allocated array?” — C++ FAQ Lite

Ego – the Destructive Menace

Original Author: Pete Collier

Mr Ego

Leave your ego at the door” is a well used phrase and so it should be. I don’t think many would argue that ego is good, especially in a team environment. Yet all too often ego is allowed in through the studio doors. As the very thing that defines a person it’s hard to simply say to someone to leave ego behind. Ultimately then, the best way to get rid of it is to not allow it in the first place. An effective recruitment process is therefore crucial to building a good team, but it’s easy to become complacent, especially when teams ramp up fast. So here is my list as to exactly how destructive ego can be as an argument for being as diligent as possible with your recruitment.

  • The problem with ego is that it can be quite deceptive – quite often people who have strong-armed a decision using bluster and belligerence are given the benefit of the doubt when it works out. Here’s the rub: with a talented team they’ll always find a way of making things work. Don’t let this cloud your judgement when looking at whether things could have been done better. Relief can hide a multitude of sins and ego can often be allowed to win again and again.
  • Ego takes the focus away from the work – We’re social animals and it can be very easy to lose focus on our work and concern ourselves more with pecking order. A well established structure where everyone’s role is clear will leave less room for social ambiguity and more room for effective game development.
  • Where there is ego there are irrelevant battles – I say ‘irrelevant battles’ because there are meaningful battles to be had on what matters, yep, the game. Ego battles become about who shouts hardest, longest and loudest. This creates what I like to call the black hole phenomenon where everything else in the room is sucked into a universe of meaningless bollocks. Ego is a big distraction away from what matters.
  • Where there is ego there is time wasted – decisions in government are made slowly because politicians must also consider their positions on an issue and the political ramifications as well as (and often more than) the actual decision in hand. Game development mired with politics is doomed to failure, or at best, massive expense.
  • Ego distorts effective decision making – if ego is the dominating force in your organisation the best politicians will rise to the top, but not necessarily the best decision makers. Ego tends to breed contempt if people don’t feel decisions are being made for the right reasons. I don’t need to tell you how destructive negativity then becomes.
  • Pandering to ego leads to compromise – and it is inevitable that you’ll have to pander to ego if you don’t want constant battles and strive in the team. Compromise is more often than not the most socially cohesive resolution, but alas, not for bold innovative design. Compromise leads to mediocrity. If Johnny Big Tantrum is always throwing his toys out the pram every time things don’t go his way, then however talented he is, you have a big problem.
  • Ego necessitates the need for damaging management mechanisms – even diligent management who recognise that large egos are having a distortive effect are between a rock and a hard place.  They’re forced to react with methods to reset team balance. However over-management can be stifling in a creative environment. Especially if the perceived solution is more formalisation in an effort to get everyone’s voice heard.

With ego running rife in an organisation the final result is always the same; people problems masked by project problems. Before they notice the difference, or indeed are willing to admit to it, it’s far too late to do anything about it. I would hazard a guess that this (exacerbated by other factors) is the root cause of most failed projects and companies. So let’s do our best to stamp out ego wherever we see it rearing its head, good luck!

This article was originally posted on my blog: Ego – the Destructive Menace