GLKit to the max: OpenGL ES 2.0 for iOS – Part 1: GLKit Features

Original Author: Adam Martin

Apple uses OpenGL extensively – their 2D desktop libraries (Quartz, CoreAnimation) are OpenGL-based, and iOS has inherited that link (along with some funky tricks you can pull).

a standalone library on GitHub, with the code from the articles as a Demo app. It uses the ‘latest’ version of the code, so the early articles are quite different – but it’s an easy starting point

iOS developers often don’t have an OpenGL background, and most of the ones I work with like the idea of OpenGL, but feel they don’t have the time to learn and master it. However, the combination of “new API (OpenGL ES 2)” with “Apple’s helper classes (GLKit)” makes OpenGL development on mobile suprisingly fast and easy.

A few years ago, Jeff LaMarche wrote a series of simple, easy tutorials on getting started with OpenGL ES 1. In the same spirit, I’m going to write about GL ES 2. In the land of Khronos (the OpenGL standards group), version “2″ is fundamentally different to version “1″ (this is a good thing: ES 1 was outdated from the start, based on 1990s OpenGL. ES 2 is a modern API, based on 2000′s OpenGL)

I’ll be covering this from an iOS perspective. There are lots of GL ES tutorials out there, but for non-iOS platforms the challenges (and solutions) are different.

Quick links to all posts

iOS 5.0, iOS 6.0, and GLKit

With iOS 5.0, Apple introduced a new Framework: GLKit. GLKit’s apparent aim was to fix ES 2.0 to make it simpler and easier for non-OpenGL experts to use.

iOS 7.0…

It’s not live yet, and I’d hoped Apple might update GLKit to fix some of the holes below. I don’t know what will be in the final iOS 7 release, but I get the impression GLKit hasn’t changed yet. If so, everything that follows applies equally to iOS 7.

Summary of Apple’s GLKit and where it helps/hinders

Porting GL ES 1.x code


Apple provided a good, easy-to-use set of classes to make it trivial to port GL ES 1.x code. The documentation is very poor, though.

More importantly: it prevents you from using Shaders, which are one of the easiest and most powerful parts of OpenGL (once you get up and running). So, we’ll be ignoring GL ES 1.x

Classes: anything with the text “Effect” in the class name (GLKBaseEffect, GLKEffectProperty, GLKNamedEffect, etc)

Vector math


Every desktop OpenGL implementation assumes you have access to a good Vector Math library.

Until GLKit, Apple’s iOS was the exception: you had to make your own “vector” classes, you had to write all the Dot Product code, etc. (e.g. c.f. Jeff LaMarche’s first tutorial). Not any more :). Apple’s design here is good, and closely follows the conventions of their other frameworks (works the same way as CGRect, CGPoint etc from Quartz/CoreGraphics)

Structs: GLKVector2, GLKVector3, GLKVector4



Quaternions have a bad rep. Many people find them incomprehensibly hard to understand/code, and yet … they are essential once you start “rotating 3D objects”.

Apple’s implementation of Quaternions is excellent: you don’t need to understand the mathematics, just use the pre-built methods

Matrix math


Like Vector-math, Matrix math is tricky and time consuming to build for yourself and debug.

Apple’s done all of it, with an good set of methods.

Structs: GLKMatrix4, GLKMatrix3

OpenGL Projection

Partial (almost full)

OpenGL uses 4-dimensions to deal with 3-dimensional rendering. That could get difficult, fast. Skipping the reasons for it, OpenGL used to be hardcoded to a set of special matries (M, V, and P – model, view, and projection).

GL ES 2 threw away the hard-coded matrices, and says “do it all yourself” (which, as we’ll see later, actually makes things easier in the long run). This is a pain … except Apple’s done it for us. Don’t go writing your own MVP stack code – use Apple’s.

Structs: GLKMatrixStack

Texture loading

Partial (poor)

Before GLKit, you had to write long and complex methods, using CoreAnimation and Quartz, to convert “file.png” and upload it to the graphics chip as an “OpenGL texture”.

That code was hard to debug, and most iOS programmers aren’t familiar with CA/Quartz. Apple wrote a proper Objective-C texturing system that does the work of Quartz and y-flipping for you. For most “simple” code, this is perfect.

…but: they screwed up in a few places, some major bugs. When it works, it’s fine – and it only needs two lines of code! – so we’ll use it in the early articles, but we’ll throw it away and write a “fixed” version for the later articles.

Classes: GLKTextureInfo, GLKTextureLoader

Mixing UIKit with OpenGL

Partial (OK)

See post: “Part 2″

There’s a lot of FUD on the web that says this is “impossible” or “slow” or … etc.

Don’t believe it. There are bugs in Apple’s CALayer / Quartz / CoreAnimation classes that make them slow *independent* of whether you’re running OpenGL. It’s just that the things people want to do when using UIKit with OpenGL are usually the ones that show up the bugs in UIKit/Quartz.

We’ll cover the main gotchas, and look at how to avoid or improve them. But for the most part: it works automagically. (there’s a reason for this: UIKit is implemented on top of OpenGL, so it’s already integrated to a high standard. It’s just that Apple hid the integration points)

Shaders (vertex, fragment)


See post: Part 3

GLKit pretends that shaders don’t exist. The most important feature of OpenGL ES 2.0 – and Apple ignored it. Sad, but true. We’ll fix that.

Multithreading, context-switching


OpenGL supports multi-threading, background loading, all sorts of funky stuff.

Although it’s not strictly part of GLKit, Apple has re-used their old EAGLContext class to provide access to all of this. This is probably because it worked fine in the first place. However, to be clear: if you’re used to EAGLContext, it’s still used EVERYWHERE by GLKit.

Classes: EAGLContext

Multi-pass rendering


See post: “Part 2″

You can make a cube appear on screen, textured, using single-pass rendering.

Just about everything else you ever want to do … needs more than one pass.

Apple provides no support for this, so you have to write it yourself (and there’s a surprisingly large amount of boilerplate you need to write here).

3D models, animation, data formats

Partial (very little)

See post: Part 3
See post: Part 4b

GLKit does one great thing with 3D data formats: it provides a Vector class that all iOS apps/libraries/source can use, and be fully compatible with each other.

But it provides zero support for the rest: meshes, 3D models, VAOs, VBOs, etc.

Error handling and state management


See post: Part 4

When you have a bug in your code, GL does nothing. Nor does Apple. Sooner or later you’ll find yourself weeping in frustration.

Performance analysis

Partial (some)

Apple makes a lot of noise about how good Instruments is with OpenGL.

This is true, it’s good. But Apple also blocks you from accessing the hardware-manufacturers own performance tools, which may be better.

If you already know Instruments inside-out (e.g. you know about the invisible RHS-panel…), you’ll be right at home.

Next steps

If you know nothing at all about OpenGL, I recommend skim-reading the first 8 of Jeff LaMarche’s posts on GL ES 1.

NOTE: a *lot* of the detail in Jeffs’ posts is now unnecessary (or superceded). But it all helps with understanding the newer/cleaner GL ES 2 (and GLKit) versions. If you get stuck or struggle, skim that post and move on to the next. Each of his posts works well standalone.

Then head on to Part 2 – Drawing and Draw calls.

Cross-posted to my blog at  

Code Build Optimisation Part 4a – fatal error LNK1170

Original Author: Andy Firth

If you previously followed Code Build Opt Part 4 then you have yourself a fantastic project that incrementally links and everyone is happy… there is a gremlin in this machine however.

Before we continue I should note, this primarily affects Microsofts link.exe (and thus lib.exe) but has also been seen in other commandline archive apps that one might use during development of current gen games (ar.exe for instance).

fatal error LNK1170: line in command file contains 131071 or more characters

What this is referring to is not always obvious so lets first explain what looks like the core problem (it isn’t but keep reading).

Link.exe builds exe’s and libs taking a bunch of parameters and outputting a binary (exe or lib). Often there are a multitude of parameters and the commandline gets too long for a standard command interpreter to allow (10kb in windows cmd.exe for instance). To combat this many commandline apps allow the use of a response file; a file containing all the parameters. So Instead of

Link.exe  /nologo /MACHINE:X64 /Wx  /INCREMENTAL 1.obj 2.obj 3.obj … 5000.obj /out:foo.exe

you would have

Link.exe @response_file /out:foo.exe

In Visual Studio if you enable “Use Library Dependency Inputs” (required for Incremental linking with libs) then VS will drop your library objects into a response file and call link.exe as above.

Now consider that no one calls their cpp files 1.cpp, 2.cpp … 5000.cpp, in fact most of us have something like


This will then compile to the object


So that link with 5000 files in it generates a response file with text in excess of 400k. With this example you would hit LNK1170.

On opening the response file you’d see all your very long paths and assume that reducing their size is the only option. This is what i did for quite a while, jumping through many hoops and doing so every few months to avoid the problem. From googling and talking to other engineers; everyone did the same.

The problem seems to stem from the assumption that a response needs to replace a commandline 1:1. Everything in the response file will append onto the final commandline and your link will (internally) look the same as without the response file. Response files have typically only been a single line in my experience, all the applications i know about that generate them follow this (seemingly as a rule). Even Visual Studio will generate 1 very long 400kb+ line in our example above.

This assumption that a response file is a single line of parameters not true.

From my investigations with 3 linkers and 2 archivers the basic method of resolving a response file is

const int32 max_line_size= 128 << 10;
char line[max_line_size];

    int32 count= read_line(line, line_size);

    if(count == max_line_size)
        error("fatal error LNK1170: line in command file contains %d or more characters", max_line_size);

    if (count > 0)
        append(link_params, line);
while (count > 0)

Older versions and 32bit versions of link.exe use 16kb but still throw an error. Other archivers (AR for instance) fail linking claiming a missing file or bad parameter even when you can see that file exists. These are silently hitting the problem above and simply clipping the characters beyond their internal line limit, which is way worse. For ar (GNU ar 2.17) this limit is 8kb.

This real fix is quite trivial if you generate your own response files; simply add a newline after each appended parameter.

Sadly it seems that if you are currently using the Visual Studio option  “Use Library Dependency Inputs” then you are out of luck, there doesn’t seem to be a way to make VS insert the newline. There are a few options tho:

  1. Generate and manually add a response file to your vcproj, this should be relatively easy to do but it would need updating each time a new file was added to your project.
  2. Switch your libraries build step over to building the response file as the “lib” step vs actually calling lib, then make that step output the response file with newlines.
  3. Write a stub version of the linker that converts the response files to the newline format and calls the real linker.
  4. Specific to Visual Studio, you can look into the “pre-link” step, i haven’t investigated this as we do not use msbuild.

Hopefully this helps anyone else who hits this problem in the future


Part 1  Part 2  Part 3 Part 4

MS Articles:


The main loop in Devil’s Attorney on Android

Original Author: Johan Knutzen


Android OS is filled with undocumented behavior and quirks that are taught the hard way; by discovery and by experience. Once you start playing around with the NDK, OpenGL and Android lifecycle events in multiple threads, things can become real hairy.

In this post, I’m going to list a number of things that I do not want to forget about my old buddy Android OS. Partly because I don’t want other people to make the same mistakes, but also because there are not a lot of logical threads to remember them by.

I will reference to how we solved Android specific main loop issues of Devil’s Attorney, which is available on Google Play here.


A main loop is where you process user input, update the state of the game, and where you issue your draw calls to OpenGL before you swap the front buffer with the back buffer. When porting a cross-platform game like Devil’s Attorney to Android, most of the platform specific code lies in how the main loop is set up and how OS events are handled.

On Android this is typically done by either using a GLSurfaceView, which has the following drawbacks:

  • Rendering thread in Java, which forces you to use Java thread locking mechanisms
  • No low level control of the EGL context, everything handled by this do-everything class

or by using NativeActivity, which has the following drawbacks:

  • Just a NDK layer around a regular Android Activity
  • Suffers from the disadvantage of not being a pure Java Activity handling Android SDK specific tasks, such as playing video, playing streamed audio or starting other activities.

GLSurfaceView on the main thread, requiring you to do awkward thread locking with the rendering thread at certain points. This may be all right in Java, but in conjunction with native calls through JNI it can become complex really fast. This also holds true for handling touch events and other kinds of input to the game, and for a game running at 60hz you want to minimize usage of Java.

NativeActivity seems like a blessing at first, because it advertises that you can program everything natively. This might seem true, but it is really not. Very few things in the Android SDK have native counterparts in the Android NDK, so whenever you need to start another Activity or stream audio or play video, or set up layouts; this needs to be done using JNI.

Using JNI to set up even simple things in C/C++ can become very messy, an example of this is the creation of an AudioTrack instance using JNI in C:

AudioTrackClass = (*env)->FindClass(env, "android/media/AudioTrack");
  AudioTrackInit  = (*env)->GetMethodID(env, 
  jobject track   = (*env)->NewObject(env, 

Compare this to:

AudioTrack track = new AudioTrack(STREAM_MUSIC, 

I doubt that the JNI way of looking up strings for methods and binding calls with Java is any more efficient than just doing it in Java. Memory references are still owned and garbage collected by Dalvik, the only difference is that you get an extra reference that you manually need to release.

The main loop in Devil’s attorney

In Devil’s Attorney, among other things we do the following using the Android SDK in Java:

  • Play the intro video
  • Stream audio tracks
  • Start a separate APK Expansion Files Downloader Activity
  • Query the device for resolution, type of device and version of Android OS
  • Handle lifecycle events
  • Show the Splash Screen
  • Handle user input

While we have the above requirements for the Java part of the application, we also need to have a main loop thread running the native code of the game. Instead of going with either GLSurfaceView, we decided to roll our own solution which satisfies the following:

  • No recurring calls to and from the Java part of the game and the native part during each frame
  • User input events and lifecycle messages are queued and double buffered to the native part of the application and consumed each frame
  • Ease of using parts of the Android SDK in conjunction with the native part of the game

The Native Part of Devil’s Attorney

The only vital thing that the game needs from Android OS in order to set up an EGL context in native code, is to satisfy the parameters to following function:

eglCreateWindowSurface(display, config, _window, 0);
      where display <- can be deduced in native code,
            config  <- can be deduced in native code,
            window <- Platform specific reference, needed from Android OS

The window reference is platform specific, and can be acquired in Java by instantiating something called a Surface. This is the only object that you need, in order to create an EGL context. If you pass this object via JNI to native code, you can deduce the window reference by calling a Android NDK library function in the following way:

  Java_com_senri_da_DevilsAttorneyActivity_nativeSetSurface(JNIEnv* jenv, 
                                                            jobject obj, 
                                                            jobject surface)
      ANativeWindow* window = ANativeWindow_fromSurface(jenv, surface);

By decoupling the java part of the application, the native part holds the actual main loop of the game, and runs in a natively created thread using pthreads. The main loop does roughly this:

while(running) {

The processMessagesAndUserInput method locks a semaphore, copies data containing messages and user input, and acts as a barrier for the following events:

  • EGL Context Creation
  • OpenGL state saving and the destruction of the EGL Context
  • Pausing of the game
  • Resuming of the game
  • Hardware button events
  • Touch Events

update(), draw() and swapBuffers() do the bulk of the work and can be found in your run-of-the-mill main loop.

The Java part of Devil’s Attorney

In the Android Activity on the Java side of things, we have the following:

  • JNI Methods to play videos and streaming audio.
  • Code that identifies the resolution of the screen and creates a SurfaceView, that it later passed to the native main loop via a message
  • Handles callbacks from the SurfaceHolder, more specifically: surfaceDestroyed.
  • Callbacks for touch events and other kinds of user input
  • An ability to start the APK Expansion Files Downloader Activity
  • Handles all kinds of lifecycle events, minimizing the amount of messages needed to be passed to the native main loop
  • Presents the Splash screen

Android lifecycle events and compatibility issues

Android lifecycle events can become very complicated when using an Activity in conjunction with a Surface.  Our game supports 2563 (!) different kinds of devices, and heavy testing on multiple devices running different flavors of Android required us to be very careful when setting up the main loop. This includes the following assumptions, caveats and countermeasures:

  • Surfaces can be destroyed at any time
  • Activities can be destroyed at any time
  • The EGL Context is never preserved in any way by Dalvik when the app is suspended (although some devices claim that they do).
  • Mismatches between pixel format queries for the Surface and EGL Context parameters
  • A complete restart of the application with the native parts still alive. This occurs when Dalvik starts killing everything, except for the native pthread. This is fairly common on Gingerbread devices with low RAM.
  • Special considerations when constructing the SurfaceView. We use a deprecated layout called AbsoluteLayout, which proved to have the least problems across devices.
  • Post OpenGL context creation issues, such as a for high resolution devices with weak GPUs. This requires a recreation of the Surface as well as the EGL Context.
  • Multiple safe checks between api calls querying and setting the size of the frame buffer/SurfaceView because some devices lie in different ways depending on OS version and manufacturer.  This is especially prevalent on Android Gringerbread and Honeycomb devices that have softkeys.
  • Saving configuration information if a device crashes during initialization, so that when the user restarts the game it goes into safe-mode. This is not something that is common, but if it affects 1 user in 1000 we need to fix it. This is a nightmare than can be avoided when so many users of Android customize their roms with faulty graphics drivers and experimental features.
  • A meticulous locking sequence of threads when surfaces are destroyed. Destruction of surfaces needs to be thread locked so that the caller of surfaceDestroyed is frozen until the entire EGL Context (textures, states, framegrab of the game) is preserved to RAM/Disk.
  • Very conservative usage of extensions and OpenGL capabilities. Trusting OpenGL extension queries on old roughed up Android devices is not a safe bet.


There might be a lot of better ways of setting up main loops on Android, but this solution proved to be the best for us. We don’t get a lot of compatibility support mail and the game seems to run fine on most devices. Supporting every single device on Android remains a challenge, but most of the issues lie in the main loop.

An introduction to: Go[lang]

Original Author: Dan-Speed

I think it’s valuable for game developers to know of a broad range of programming techniques and tools, and programming languages are one of our fundamental tools that strongly dictate how we phrase answers to the problems that we’re solving. With this in mind, I’d like to introduce Go, a language created by Google mainly for creating web services.

What is Go?

  • Strongly typed (with dynamic casting), compiled language, garbage collected
  • Made by Google for their servers, based on pragmatically solving problems
  • Really good at concurrent stuff, pretty fast
  • Designed to be productive (similar to Python, “batteries included” standard library)
  • Structural interfaces, not inheritance
  • Public /private by capitalization
  • Built in unit testing, benchmarking, profiling, building, documentation server
  • Compiles fast, cross platform


I think that the go tour on the official website is a really cool way to have a look at this language without having to install anything. Be sure to pay closest attention to goroutines and channels, since they’re the fundamental building block.

Good introductions / tutorials:

  • Golang tour: 
  • Advanced Concurrency Patterns: 

  • Performance

    Talking about performance is always a bit of a strange… I’ve worked for a long time under the belief that programmer time is typically more valuable than using a potentially faster language, but this depends on the math of your scaling. Go is one of the fastest languages available for writing web servers, with the added advantage that it offers not only the IO based async / concurrency, but also an able solution for using multiple cores. You can almost certainly find things that outperform it in particular situations, but I tend to think that it’s a pretty safe high-performance option all round, without the worry of something like Ruby or Python that you’ll be running 30 times more nodes to compensate for your language choice, or worrying about refactoring because of the function call overhead on the hot path.

    Using CSP as a concurrency paradigm also means that it’s much easier to scale out in the cloud – communication will work between many machines where mutexes won’t.


    I wish that Go had a full-featured editor with refactoring and debugging support on all operating systems.

    There are some very nice features, like being able to import a profiling module that will enable a web endpoint for collecting and downloading profiles, built in unit test and benchmark support (unfortunately no built-in XUnit output format – ) – you’ll need GDB on a linux platform to get anything, which is really off-putting if you’re primarily based on Windows and Visual Studio.

    Standard Library

    You can go a very long way with Go’s standard library. It’s really only a few lines of code to run a web server. It has html templating built in, JSON support, image libraries + binary encoding support. Websockets are available as a supplementary library.

    When you go further with web servers, I’ve found that the Revel framework is quite nice for fast iteration compile-as-you-save (it’s based on the Java “Play” framework) along with better featured routing & variable binding and a lot more features as Revel approaches a 1.0 version. There are others


    Go is designed to compile to a single, statically linked, standalone binary executable. This makes it very easy to deploy, at the cost of a bit of extra size. It does mean that individual libraries cannot be individually updated.

    Cross platform compilation isn’t the easiest of things, and it falls apart if you have any native code. This isn’t the end of the world, as long as you have CI nodes running on VMs.

    Version Control

    Go has a remote package retrieval mechanism built in “go get”, but it relies on Git/Mecurial etc being installed on your machine. If you’re going to rely on any third party libraries, you’re going to have to expect to install these VCSes on at least one machine to get new versions.

    While I think that it is possible to just get 3rd party packages direct at compile time at the latest version, in my opinion, it’s pretty dangerous to not have reproducible builds, which is problematic since there is no way currently to ask for a particular known-good revision. For my projects, I’m trying to get and check in every dependency.

    There is currently quite a lot of ongoing discussion (and a number of community built solutions) for trying to deal with this issue, but no apparent wider consensus yet.

    C interop

    Go supports C based packages, which puts it in a similar position to Python and a number of equivalent languages.


    There are no real exceptions in Go – you’re supposed to use multi-value returns to allow error codes and handle them explicitly. This can make things quite verbose.

    No RAII (ish) – Go uses defer (a bit like a function-based Scope Guard) to execute cleanup. This makes more sense when you understand that lifetime managing objects are generally sitting in a for{select {}} scope responding to messages. Once they exit that loop, they’re being shut down. This said, it might be possible to use runtime.SetFinalizer() for RAII with the understanding that cleanup will then be tied to when the GC runs on an object.

    Unicode from the start – Go is based around UTF8, which is a blessing compared to the sheer confusion that I’ve seen in Python 2.7 and other places about encodings.

    Closures – I find that closures are excessively handy for being able to encapsulate and bind things into a function signature that something else expects.

    Dynamic casting – while Go is statically typed, it has a strong system for dynamic casting and reflection. This yields one of the nicest implementations of JSON marshalling/unmarshalling that I’ve seen, allowing it to fill in fields on typed structures from a given JSON input, both validating it and allowing you to use normal field access notation rather than doing a lot of casting and map lookups.

    You support an interface in Go just by implementing the required function signatures on a structure, without needing to know about the interface at any time. To a certain degree, this is great: if you implement io.Reader, your object can now be used in a myriad of ways. This works fairly nicely with objects being passed around (A implements B, so can be passed to C(B)) but not quite so well with function signatures (if you return something that implements an interface, that’s not accepted if the required signature wants the interface returned).

    No generics – while everything implements the empty interface, if you want to implement a particular container other than the given slices (vector/array) or maps, you’re not going to be able to get it to be reusable without it returning an interface that someone has to try and type assert back to what they put in. This is possibly worst with arrays of interfaces, where you’ll have to cast each item individually. You have to get used to this sort of thing though, since in order to use the sorting library, you’ll need to cast them to a type that implements Len, Swap and a compare function.


    There are some things that they don’t mention much in the tutorials that I feel obliged to point out. Your mileage may vary.

    • Channels can’t return tuples, so if you need errors and values as possible returns, you may need to pack them into a struct.
    • If something is sending you values across a channel, and you’re consuming them in a range loop, be careful about exiting the loop early. Leaving the loop without fully consuming it is likely to leave a goroutine somewhere blocked while trying to send, and goroutines aren’t cleaned up in this kind of case. If this is likely you want to make sure that there is a way to Close the object that is sending prematurely.
    • The standard pattern in Go is to loop processing messages using for{select{}} – be aware that ‘break’ works on select as well as for, so if you want to exit the loop, you need to use a labelled break, exit condition or return (making sure all cleanup is handled by deferred statements)
    • Many tutorials tend to show a ‘quit’ channel that you send to and forget about. This can be a little dangerous and simplistic – if your deferred functions deadlock, or you didn’t break out of your loop properly, you’ll just leak your goroutine and any associated memory with it. With more complicated objects, I like to have a very carefully tested ‘quitblocker’ can block until exit has completed, and doesn’t have the vulnerability of potentially trying to send a close message to a channel that’s no longer being listened to on the service loop.
    • map lookups return two values, always. value, wasPresent := myMap[key]. if !wasPresent, then value is uninitialized (zeroed in Go). Single assignment of just the value is valid, but infrequently safe/correct.

    My Experience with it

    I’ve been porting some of our internal infrastructure to Go from Python. Where we previously used a number of processes (and multiprocessing) in Python, we’re now using far fewer due to the increased ease of concurrency and more competent implementation of threading. I’ve also been able to add an http server to the mix, allowing me to interrogate the status & history live from a website, and potentially orchestrate the process.

    I’ve found the libraries to be well featured, with one of the few things that I miss being a way to shut down a running http server by doing something other than exiting the program.

    This isn’t a language for use in UIs or game clients, but it is an extremely competent language for making servers, infrastructure or parallel/distributed processing systems.

Finding nearby stuff

Original Author: Niklas Frykholm

A problem that crops up quite often in game programming is the need to find stuff that is “close to” other stuff. For example, you may want to find all enemies close to the player, all treasures close to a goblin, etc.

Most recently, I ran into this problem when I added support for merging navigation meshes to the Bitsquid engine.

To merge meshes, I need to find all the vertices that are “sufficiently close” (within some tolerance limit) in the meshes that I’m merging. These vertices should be considered “the same” in the merged mesh and need to be treated specially.

The naive algorithm for finding these coinciding vertices is to just do a double loop:

foreach (v1 in vertices)
   foreach (v2 in vertices)
      if (distance(v1, v2) < tolerance)

But since this algorithm is O(n^2) in the number of vertices, it is often prohibitively expensive.

To improve the performance you need some kind of data structure for accelerating spatial queries. There are lots of different possibilities. Real-Time Collision Detection by Christer Ericsson has a good coverage of them.

One of the simplest approaches is to use some kind of grid bucket system. You divide the world into a grid and for each grid cell you store a list of the vertices that fall in that cell. To check for “close” vertices you look through the list of vertices in your cell:


Depending on what your definition of “close” is (how big the search radius is compared to the grid size), you may need to search more cells. Note that if the search starts close enough to a grid corner you always have to search at least four cells, no matter how small the search radius is. (For a two-dimensional grid, a three-dimensional grid requires you to search eight cells.)


If you know what the search radius is going to be (for example, in my vertex merging case I always use the same merge distance) you can adapt the grid so that you never have to search more than four cells, by setting the grid size to be equal to the search diameter.


With a larger grid size you can sometimes get by with searching just one or two cells, depending on how close to a grid line the search starts, but there will always be some positions where you will have to look at four cells.

The fact that you at most have to look at four cells can be used to make an interesting variant of the algorithm. Instead of checking the four neighboring cells in the lookup, you can store the item in all four neighboring cells. That way you will only have to check a single cell when you do the lookup. This approach will make lookups four times faster, insertions four times slower and use four times as much memory. It can be a good optimization if you have a high ratio of lookups to insertions.

For my particular case (vertex merging) I only have one lookup per insertion, so this would not be a net win.

Designing the grid can be tricky.

How large should it be? You may need to do a pre-pass over your data to find the range, which isn’t possible if you don’t have all the data up front. (Letting the grid coordinates wrap around can solve this in some cases.)

How big should the cells be? If you make them too small, the grid will consume too much memory. If you make them too big, you may end up with lots of points that you need to check in each cell, which will make the algorithm slow.

What if the data is laid out on a diagonal? In this case most grid cells will be empty, wasting memory:


Most of these concerns can be fixed by not storing the grid in a traditional matrix, but instead use a hash that maps from grid coordinates to cell data:

struct GridCoordinate {
   int x;
   int y;

HashMap<GridCoordinate, CellData> grid;

// To insert an item
GridCoordinate gc;
gc.x = (int)floor(p.x / cell_size);
gc.y = (int)floor(p.y / cell_size);
grid[gc] = cell_data;

This way, you will only use memory for the cells that actually contain data and lookups are still O(1). You also don’t have to care about how big the grid is or what happens if data ends up outside the grid. In fact, you only have one parameter left to worry about, the cell_size.

As mentioned above, a good heuristic is to use:

float cell_size = 1.0 * search_diameter;

That way you have to investigate exactly four cells in each lookup.

If the data is sparse compared to the search radius you can use a bigger cell size, which means that you don’t always have to check all four neighbors. But note that the average number of cells you need to search in each lookup goes down slowly, while the cell area (and correspondingly the average number of items in each cell) grows quickly. So only use this optimization when you know that the data is sparse.

Multiplier Avg. cells to search Cell area
1.0 4.00 1.00
1.5 2.78 2.25
2.0 2.25 4.00
3.0 1.78 9.00
5.0 1.44 25.00
10.0 1.21 100.00

The final piece of this puzzle is what CellData should look like. It might be tempting to do something like:

typedef Vector<VertexId> CellData;

However, this would be highly inefficient. In many cases cells will only contain a few items and allocating a Vector for them is total overkill. Using a Vector will mean tons of pointer chasing, cache misses and data copying.

For me, a general rule of thumb when writing high performance C++ code is to avoid collections stored inside other collections. If you store collections in collections it is very easy to create innocuous looking code that does tons of data copying behind your back.

So what can you do instead. If you have a good MultiHashMap implementation, you can use that. Otherwise, you can do something like this:

struct CellData {
   VertexId id;
   unsigned next; 
Array<CellData> items;

Here, items contains linked lists of the items in each cell, stored in a continuous array (which is the only sane way to store linked lists). The HashMap gives the first cell item. Then you follow the next references in the items list to find subsequent items in the same cell until next == UINT_MAX, which marks the end of the list.

This basic pattern: grid coordinates -> hash map -> linked list in array is my standard go-to solution for making spatial queries. It is straightforward, easy to understand and uses memory reasonably while providing O(1) lookups.

This has also been posted to The Bitsquid blog.

Simplex Update #1: Extreme modularization with CMake and Git

Original Author: Francisco Tufró

This is the first update on my Simplex Engine Journey. The first topic I want to talk about is modularization.


One of the most important aspects of my engine’s design is modularization. Engines are usually big in terms of classes, dependencies and systems; that’s why a modular approach made sense for me.

From my previous experience working on Moai SDK, a monolythic git repository that includes all the different systems (being modular or not), and third party libs becomes really huge, really soon.

That’s why I wanted to solve that problem for this engine.

There are two different problems to solve:

  1. How can I sort the code in a suitable way to work with modules?
  2. How to build the modules consistently and be able to include the necessary ones in the final build of a program that uses the engine?

This article will guide you through how I’ve solved this problem and give back to you some of the knowledge I got working on Moai SDK and other personal projects.

Git for Extreme modularization

While working on Moai SDK, I never felt really happy about the monolithic git repository.

It was a huge git repo that took hours to download completely, for me that’s EXTREMELY anti-agile (Rule #5).

The solution that came to my mind was something I proposed to Zipline, but didn’t make it because of the $$$ factor that I don’t care about for my project: Having one git repository per module and one ring repository to rule them all (aka. central git repository using submodules).

The pros for having a single repository per module include:

  • Single history of git commits for each module. (This is pretty interesting, since you avoid having a lot of noise when searching the git log for something specific).
  • Easier to find files, since modules include between one and 10 source files in general.
  • Bundled third party libraries (if a library is ONLY used by this module, it can be bundled inside the repository, and added to it’s build script).

The cons are:

  • If you have a third party lib that it’s used by more than one module, you need to be careful with versions, having a common-third party module might be the way to solve this.
  • More complex directory structure, since instead of having all the code in a single place you have it split among many folders. (This shouldn’t be huge though, since modern IDE’s or text editors allow you to search inside a directory tree pretty easily).

Please, comment below if you want to discuss about the pros and cons of this!

Directory structure

In order to be able to have a unified way to build and sort the code among different modules, I’ve thought about the following direcory structure:

  • simplex-[module-name]/src (Actual source code for the module)
  • simplex-[module-name]/include (include files for the module’s classes)
  • simplex-[module-name]/tests (unit tests for the module’s classes)
  • simplex-[module-name]/samples (samples that show the module’s functionality)
  • simplex-[module-name]/docs (documentation on how to use the module or other stuff)
  • simplex-[module-name]/third-party (third party libraries that are used by this module alone, if I want to include them instead of using them from the  system)

Having this standard structure, allows me to create a build system that know what will be found inside each module.

So, when I want to create a module I just create the repository in bitbucket and run my create-module script:

  # This script is used to create a module for simplex engine
  # use:
  # ./bin/create-module [module-name]
  # NOTE: the script will add the 'simplex-' prefix to the module name
  # so you should not add it or you will end up having a simplex-simplex- prefix.
  # We have one repository per module, so after creating the repository in
  # BitBucket, I add it as a submodule of my main repository.
  # In this way, we can rebuild the whole directory structure for the engine
  # using git submodule init, instead of having to download each module by hand.
  # This is also useful if you want to avoid using some module and customize what
  # gets built into your final application. Just remove or add the modules you want
  # to your root repository.
  git submodule add$NAME.git modules/$NAME
  cd modules/$NAME
  git checkout -b master
  # In order to work consistently with the module, we need to create
  # a standard directory structure.
  DIRECTORIES="src include tests samples docs third-party"
    mkdir $DIRECTORY
    # Git doesn't include directories if they don't have content inside.
    # So we create a file called .gitkeep in order to create the directories
    # in th remote repository.
    touch $DIRECTORY/.gitkeep
  # Just have something show up on BitBucket, we can create an empty
  echo -e "Module $NAMEn===" >
  # This is an empty CMakeLists.txt for the module.
  # All the specifics for creating the library and building its
  # tests should go here.
  echo -e "cmake_minimum_required(VERSION 2.8.5)nproject($NAME)" > CMakeLists.txt
  # Now we need to push these default changes as an initial commit.
  # After this we can start working on the module.
  git add .
  git commit -m "[Module] Initial commit for $NAME" -a
  git push origin master
  cd ..
  cd ..

After running the script, (for example ./bin/create-module core) you’ll end up with the module created (simplex-core), the basic directory structure, a, and the CMakeLists.txt, which will be used to build the module.

This is how I solved the first problem: “How can I sort the code in a suitable way to work with modules?“.

Building modules with CMake

Having solved the core problem for the module’s directory structure, I had to see how was I going to compile all that stuff and get it together.

The approach I found more appealing to me is building each module as a static library, and linking to the necessary ones from the final application.

Since the engine is being built in C++ and needs to be cross-platform (Rule #2), I decided to use a single build system: CMake.

I had the opportunity to learn about using CMake as a cross platform build system (which I really enjoyed) while working on Moai SDK.

CMake is a cross-platform build system, a really well thought one from my point of view. It allows you to write build scripts that are really well organized, and simple. And it’s really well-suited for modularized code.

For it to work, you need to create a file called CMakeLists.txt, which includes the directives on how to build your piece of software.

I was sure I was not going to use another build system, so I included one CMakeLists.txt per module, in order to be able to write all the build instructions for a specific module inside it’s own repository. Another approach could’ve been having a different directory structure with all the CMakeLists.txt, like in Moai SDK, but it made more difficult to have CMakeLists.txt inside each project’s repository, something that I think is a must for my engine.

The main pattern I followed for creating the modules is that each module builds a static library with its name (simplex-core, simplex-math, simplex-graphics).

This static library includes all the code for the namespace Simplex::Module. If a third-party lib is included in the repository, it will be added to this static lib as well.

You need to be very careful with third party libs, including the same library but with two different versions as a static lib could bring problems, my proposed solution as I stated previously, is to have a common third party libs module that will include the dependencies that are shared among modules.

Since I’m TDD-Addict, I decided to include all the mechanics for unit-testing directly into the build system. For this I create a specific executable on each module, and then tell CMake that the executable is a test suite. I’m using Google C++ Testing Framework for writing unit-tests, I’ll write more about this on the next article.

So, each module includes a CMakeLists.txt that looks close to this:

cmake_minimum_required(VERSION 2.8.5)
  # src/ includes all the necessary cpp files for this module, so what we do is 
  # to use CMake's GLOB instruction to find all the .cpp files and load them into 
  # the $CODE variable.
  file ( GLOB CODE
  # To use the module in our app we create a static library to be linked against.
  # This library includes all the source files in our src/ directory, so we use 
  # the $CODE variable as the list of source files to include.
  add_library ( simplex-core STATIC
  # tests/ includes all the unit-tests for the module.
  # Again, we use CMake's GLOB instruction to get all of them
  # into the $TEST_CODE variable. 
  # In order to run the tests, we need to create an executable file (our test runner)
  # You'll understand more about this on the next article, that will focus on testing, for now
  # it's enough that you understand that we include all the tests from $TEST_CODE into a single executable.
  add_executable ( SimplexCoreTests ${TESTS_CODE} )
  # Then we need to link the test executable to the modules that are used.
  # In this case, we're linking agains simple-test (the library needed to run the tests) and simplex-core,
  # the module we're testing.
  # This is usually the case, you link only to the test library and the module you're testing, but there may be
  # exceptions where mocking is not an option.
  target_link_libraries ( SimplexCoreTests simplex-test simplex-core ) 
  # Finally, in order to be able to run "make tests" and run all the tests for the different modules,
  # we have to add the executable file we just created to CMake's test list.
  # We do that using add_tests
  add_test ( SimplexCoreTests SimplexCoreTests )

Careful and non caffeine-influenced readers should have noted that I’ve not set the include path for the module, this has a reason.

Each module has some information on how to build itself, but there is a piece missing: the main CMakeLists.txt

I’ve created a generic CMakeLists.txt (outside of the modules structure) in the main simplex-engine repository. It basically iterates through all the modules adding its include dir to the include path and loading each module’s CMakeLists.txt to the mix.

I’ll omit some CMake and platform-specific stuff (like stuff for finding OpenGL, Cg, and MacOSX flags) and focus on the custom part, you can mail me and ask me for the whole file if you want:

  # Every module has a CMakeLists.txt in it's root. As we saw, those files provide the necessary
  # directions to build each module.
  # To be able to iterate on the modules' names we use the GLOB instruction again.
  # This will generate a list of all the modules present in the "modules/" directory.
  file ( GLOB MODULES ${CMAKE_SOURCE_DIR}/modules/* )
  # Each module has an include directory where all the headers go to.
  # To make it easier to find header files, I decided to make all the headers 
  # available to all modules globally.
  # You can achieve this using include_directories in the root CMakeLists.txt
  # before loading the modules.
  foreach ( MODULE ${MODULES} )
    include_directories ( "${MODULE}/include/" )
  endforeach ()
  # After adding the include paths, the only remaining thing is to 
  # include each CMakeLists.txt for each module.
  # add_subdirectory searches for a CMakeLists.txt file in the given
  # directory, so that's what we use to include each module's build script.
  foreach ( MODULE ${MODULES} )
    add_subdirectory ( ${MODULE} )
  endforeach ()

Note that this main script doesn’t know anything about which modules are being built. It just iterates through the contents of the modules/ directory and includes all its sub directories.

This way of adding modules provides an abstract level that doesn’t care about what you’re building, it just builds what you want.

Also note that having a custom build (with some modules you want and not other ones) it’s just a matter of removing the module from the directory tree.

This whole CMake configuration solved the second problem I was facing: How to build the modules consistently and be able to include the necessary ones in the final build of a program that uses the engine?

You may ask here, “Where is the actual final program?”, well.. it’s just another module. 🙂


Using CMake and Git submodules you can create a really comfortable structure to create a heavily modularized C++ project. The use of good conventions (module directory tree, for example) allows you to avoid complex build scripts, and makes it easy to find and organize code.

On the next article, I’ll talk about how I implemented my testing environment on top of this structure.

I hope you had a really good reading! Feel free to contact me and comment below!

Till next time!

Management Lessons from Dumbledore

Original Author: Ted Spence

I’ve been reading the Harry Potter series of books to my daughter as a bedtime story. It’s been over a year now, and we’re still only on book six! But while I read a few pages to her in the evenings, it strikes me that Dumbledore is almost certainly the type of manager I’d like to be. I may not quite have Dumbledore’s bohemian charm, but I can definitely study his techniques.

Dumbledore: Daft or Dangerous?

Dumbledore was a complicated character: funny, serious, forgetful, scheming, inspiring, and aloof. I notice how, over time, all of these attributes combine to make him a compelling person and the kind of leader you want to see in an organization.

A Vision, Rather Than Details

In book four, Dumbledore stands up before the crowd and announces the Triwizard Tournament. He talks about the value of international cooperation, of the need for Hogwarts students to collaborate with wizards in other countries. Yet at the same time, he doesn’t “pair students up” or assign designated buddies. Students from different schools are brought into the same area and placed into the same social events – but with no assigned duties.

This is interesting. Most management lessons enforce detail-oriented activity. You’re always being challenged to maintain task lists, project backlogs, help break down big things into smaller things. But this kind of activity works well for people who don’t understand the vision, and for people who need guidance to focus on their work. Dumbledore was clearly encouraging people to learn to take the initiative, to hear his ideas and find their own ways of implementing them.

The risk of this approach is that many people won’t. In fact, there were lots of students who disagreed with Dumbledore’s methods and goals. Yet the free, open strategy he pursued made it clear which students were part of the team and which weren’t.

Turn a Blind Eye To Risks (to a degree)

Time Turners

If there’s one thing you can say about Dumbledore – he tolerated a lot of inappropriate behavior from students. He let people try things that were in no way safe or sensible. Consider the second book, when Hermione Granger was given the “Time Turner,” a device so phenomenally powerful the Ministry of Magic kept it locked up.

This is clearly crazy.

Anyone reading the books will remember this example offhand. Some fans consider this a huge plothole – “Why didn’t Dumbledore use the time turner to save lives during Voldemort’s attacks?” But plot points aside, this part of the story shows that Dumbledore is demonstrating exceptional trust in a student, based upon her incredible promise and talent.

Consider it that way: if you want to have the best team, if you really want your team to dazzle the world, you need to be willing to support and encourage them. Often, this support will require bending the rules.

Dumbledore has clearly spent ages discovering how to tell the difference between safe and unsafe risks; and he often praises people for succeeding in risky endeavors publicly while criticizing them privately. This is entirely appropriate when you view yourself as a coach or a mentor. When Buckbeak was about to be killed and only Hermione could save him, Dumbledore didn’t explain the task at all – he gave her a goal, and knew that she had learned well the risks of the device she was entrusted with. This is a regular pattern. Dumbledore cultivates a broad group of people and gives them each the encouragement (and corrections) that they need.

Foster an Attitude of Curiosity

All the students at Hogwarts knew that Dumbledore was prone to giving odd, curious, interesting speeches. He always revealed some interesting hint, for example, about the Philosopher’s stone, or about classes, or about the various mysteries at school. He would reveal information through the press, through speeches at dinners and at formal occasions, and through unusual requests of the students and staff.

At the same time, Dumbledore always had a kind word for someone who was interested in learning more. He kept information available for students, and answered their questions honestly (if sometimes incompletely). He could have easily crushed an inquisitive young mind with a critical remark, but he didn’t.

Many of you will notice that this attitude helped to separate those curious minds from the uncurious ones. Dumbledore knew that Malfoy didn’t care about the Philosopher’s stone, but when he discovered Harry with the Mirror of Erised, he helped Harry learn about it, knowledge that would eventually help him to keep the stone safe from Voldemort.

Recognize People – When They Need It

Gryffindor Wins the House Cup!

I always found it quite curious how Dumbledore granted so many extra points to Gryffindor at the end of book one. There was absolutely no need to give Harry, Ron, and Hermione extra awards; they already knew how amazing it was that they had defeated Quirrell/Voldemort. Why make them win the house cup as well?

Yet Dumbledore used the opportunity to reward someone else – Neville – by granting him the final points that won Gryffindor the house cup.

One could argue that he was just evening the scales after Snape docked points from Gryffindor inappropriately. But you could also look at this as a way of taking the raw potential in Neville Longbottom – who was still a forlorn, shy character in book one – and setting him on the path to greatness by the end of the series.

Dumbledore did something that is very hard as a manager: he knew which of his people needed the boost, and who would react best to it. He delivered an unexpected surprise to Neville, and I think that surprise made all the difference. Over the years, Neville regularly discovered a deep well of inner strength when others thought him weak.

Maintain Variety in Middle Management

Dumbledore provided a fascinating culture at Hogwarts. No one could accuse Professor Snape and Hagrid of having identical management styles. Students always had a range of mentors and teams to choose from, and each of them provided interesting contrasts.

For example, let’s consider Minerva McGonagall, the epitome of a tough-but-fair teacher. She brooked no dissent, but never spoke a cruel word to anyone. In her classroom, misbehavior was never tolerated, but when students sought out advice in between classes they knew they could rely on her to balance out the strict tendencies of Argus Filch and Severus Snape.

On the contrasting side were the loopy teachers like Hagrid, Trelawney, and Professor Binns (the ghost who never paid attention to students’ boredom). Students with a horrible Potions test to study for could count on easy work in some areas, and their daily lives could balance between the two opposites and avoid burnout.

Think about your team: they need some sensible managers, some strict; they need some lenient guidance, and some slack work. Anyone who is placed in the hotseat every single day quickly discovers that they need time off.

Create Opportunities for People to Surprise You

No manager hires only already proven winners (except maybe the New York Yankees). Dumbledore, interestingly, stocks his school with rejects, outcasts, weirdos, and kooks. He grants “prefect” status to students who clearly aren’t capable of filling the roles, like Malfoy or Ron Weasley. He hires way beyond the boundaries of those who are considered capable of acting as teachers, like Remus Lupin (a werewolf who taught Defense against the Dark Arts), Firenze (a centaur who taught Divination), and others.

Yet, although some of these people clearly fail at their tasks, others excel in unexpected ways. In business perfect qualifications often do not translate into success. I have hired many people with fantastic credentials who have turned out to be unproductive; whereas some junior staff right out of college have arrived already capable of taking on senior roles. The art of succeeding in picking people is to constantly challenge everyone to deliver one step beyond their capabilities; to constantly reward them when they succeed, and help them when they fail.

Keep People Guessing

The instant Dumbledore risked becoming too predictable, he changed. He was at once aloof, yet constantly plotting; he was forgetful, but always knew where to pay attention. He was willing to share Every Flavor Beans with Harry, even knowing that he risked getting an “Earwax”; and he was fiercely angry when his anger could shock Petunia Dursley into taking better care of Harry.

Business may move smoothly when it is predictable, but as managers we know that things have to change too. If you spend one year telling everyone to focus on customer satisfaction, chances are they’ll make great progress and eventually end up with diminishing returns. Change their focus! Challenge everyone with a different task each month

At the same time, change the attitude of the office. One week hold a party; the next, reorganize the group – not as a punishment – but as a way to allow people to experience different parts of the firm. You’ll be surprised how often a tiny little change, one that may seems pointless at the time, can unleash creativity and help defeat dark wizards.