Building an Engine Plugin System

Original Author: Niklas Frykholm

A plugin system is a useful way to allow developers to extend the capabilities of a game engine. Of course, an engine can also be extended by directly modifying the source code, but there are several drawbacks with that approach:

  • Changing the code requires you to recompile the engine. Anyone who wants to modify the engine must have the full source code, access to all the libraries and the build environment set up correctly.

  • Every time you pull changes from upstream you will have to merge your changes with the incoming patches. Over time, this adds up to a significant chunk of work.

  • Since you work directly in the source code, instead of against a published API, refactoring of engine systems might force you to rewrite your code from scratch.

  • There is no easy way to share the modifications you have made with other people.

A plugin system solves all these issues. Plugins can be distributed as compiled DLLs. They are easily shared and you can install them by just putting them in the engine’s plugin folder. Since the plugins use an explicit API, they will continue to work with new versions of the engine (unless backwards compatibility is explicitly broken).

Of course, the plugin API can never cover everything, so there will always be things you can do by modifying the engine that you can’t do through the plugin API. Nevertheless, it is a good complement.

A Tale of Two APIs

When building a plugin system, there are actually two APIs that you need to think about.

The first, and most obvious one, is the API that the plugin exposes to the engine: a set of exported functions that the engine will call at predefined times. For a very basic system, it could look something like this:

__declspec(dllexport) void init();
 
  __declspec(dllexport) void update(float dt);
 
  __declspec(dllexport) void shutdown();

The other API, which usually is a lot more complicated, is the API that the engine exposes to the plugin.

In order to be useful, the plugin will want to call on the engine to do stuff. This can be things like spawning a unit, playing a sound, rendering some meshes, etc. The engine needs to provide some way for plugins to call on these services.

There are a number of ways of doing this. One common solution is to put all the shared functionality in a common DLL and then link both the engine application and the plugin against this DLL.

plugin_1

The drawback of this approach is that the more functionality that the plugins need access to, the more must go in the shared DLL. Eventually you end up with most of the engine in the shared DLL, which is pretty far from the clean and simple APIs that we strive for.

This creates a very strong coupling between the engine and the plugins. Every time we want to modify something in the engine, we will probably have to modify the shared DLL and thus likely break all of the plugins.

As anyone who has read my previous articles know I really don’t like these kinds of strong couplings. They prevent you from rewriting and refactoring your systems and thus eventually cause your code to stagnate.

Another approach is to let the engine’s scripting language (in our case Lua) work as the engine’s API. With this approach, any time a plugin wanted the engine to do something it would use a Lua call.

For lots of applications I think this can be a really good solution, but in our case it doesn’t seem like a perfect fit. First, the plugins will need access to a lot of stuff that is more “low level” than what you can access from Lua. And I’m not to keen on exposing all of the engine’s innards to Lua. Second, since both the plugins and the engine are written in C++, marshalling all the calls between them through Lua seems both overly complicated and inefficient.

I prefer to have an interface that is minimalistic, data-oriented and C-based (because of C++ ABI compatibility issues and also because of… well… C++).

Interface Querying

Instead of linking the plugin against a DLL that provides the engine API. We can send the engine API to the plugin when we initialize it. Something like this (a simplified example):

plugin_api.h:

typedef struct EngineApi
 
  {
 
  	void (*spawn_unit)(World *world, const char *name, float pos[3]);
 
  	...
 
  } EngineApi;

plugin.h

#include "plugin_api.h"
 
  
 
  __declspec(dllexport) void init(EngineApi *api);
 
  __declspec(dllexport) void update(float dt);
 
  __declspec(dllexport) void shutdown();

This is pretty good. The plugin develeoper does not need to link against anything, just include the header file plugin_api.h, and then she can call the functions in the EngineApi struct to tell the engine to do stuff.

The only thing that is missing is versioning support.

At some point in the future we probably want to modify the EngineApi. Perhaps we discover that we want to add a rotation argument to spawn_unit() or somehting else.

We can achieve this by introducing versioning in the system. Instead of sending the engine API directly to the plugin, we send the plugin a function that lets it query for a specific version of the engine API.

With this approach, we can also break the API up into smaller submodules that can be queried for individually. This gives us a cleaner organization.

plugin_api.h

#define WORLD_API_ID    0
 
  #define LUA_API_ID      1
 
  
 
  typedef struct World World;
 
  
 
  typedef struct WorldApi_v0 {
 
  	void (*spawn_unit)(World *world, const char *name, float pos[3]);
 
  	...
 
  } WorldApi_v0;
 
  
 
  typedef struct WorldApi_v1 {
 
  	void (*spawn_unit)(World *world, const char *name, float pos[3], float rot[4]);
 
  	...
 
  } WorldApi_v1;
 
  
 
  typedef struct lua_State lua_State;
 
  typedef int (*lua_CFunction) (lua_State *L);
 
  
 
  typedef struct LuaApi_v0 {
 
  	void (*add_module_function)(const char *module, const char *name, lua_CFunction f);
 
  	...
 
  } LuaApi_v0;
 
  
 
  typedef void *(*GetApiFunction)(unsigned api, unsigned version);

When the engine instances the plugin, it passes along get_engine_api(), which the plugin can use to get hold of different engine APIs.

The plugin will typically set up the APIs in the init() function:

static WorldApi_v1 *_world_api = nullptr;
 
  static LuaApi_v0 *_lua_api = nullptr;
 
  
 
  void init(GetApiFunction get_engine_api)
 
  {
 
  	_world_api = (WorldApi_v1 *)get_engine_api(WORLD_API, 1);
 
  	_lua_api = (LuaApi_v0 *)get_engine_api(LUA_API, 0);
 
  }

Later, the plugin case use these APIs:

_world_api->spawn_unit(world, "player", pos);

If we need to make a breaking change to an API, we can just introduce a new version of that API. As long as get_engine_api() can still return the old API version when requested for it, all existing plugins will continue to work.

With this querying system in place for the engine, it makes sense to use the same approach for the plugin as well. I.e. instead of exposing individual functions init(), update(), etc, the plugin just exposes a single function get_plugin_api() which the engine can use in the same way to query APIs from the plugin.

plugin_api.h

#define PLUGIN_API_ID 2
 
  
 
  typedef struct PluginApi_v0
 
  {
 
  	void (*init)(GetApiFunction get_engine_api);
 
  	...
 
  } PluginApi_v0;

plugin.c

__declspec(dllexport) void *get_plugin_api(unsigned api, unsigned version);

Since we now have versioning on the plugin API as well, this means we can modify it (add new required functions, etc) without breaking existing plugins.

Putting It All Together

Putting all this together, here is a complete (but very small) example of a plugin that exposes a new function to the Lua layer of the engine:

plugin_api.h

#define PLUGIN_API_ID       0
 
  #define LUA_API_ID          1
 
  
 
  typedef void *(*GetApiFunction)(unsigned api, unsigned version);
 
  
 
  typedef struct PluginApi_v0
 
  {
 
  	void (*init)(GetApiFunction get_engine_api);
 
  } PluginApi_v0;
 
  
 
  typedef struct lua_State lua_State;
 
  typedef int (*lua_CFunction) (lua_State *L);
 
  
 
  typedef struct LuaApi_v0
 
  {
 
  	void (*add_module_function)(const char *module, const char *name, lua_CFunction f);
 
  	double (*to_number)(lua_State *L, int idx);
 
  	void (*push_number)(lua_State *L, double number);
 
  } LuaApi_v0;

plugin.c

#include "plugin_api.h"
 
  
 
  LuaApi_v0 *_lua;
 
  
 
  static int test(lua_State *L)
 
  {
 
  	double a = _lua->to_number(L, 1);
 
  	double b = _lua->to_number(L, 2);
 
  	_lua->push_number(L, a+b);
 
  	return 1;
 
  }
 
  
 
  static void init(GetApiFunction get_engine_api)
 
  {
 
  	_lua = get_engine_api(LUA_API_ID, 0);
 
  
 
  	if (_lua)
 
  		_lua->add_module_function("Plugin", "test", test);
 
  }
 
  
 
  __declspec(dllexport) void *get_plugin_api(unsigned api, unsigned version)
 
  {
 
  	if (api == PLUGIN_API_ID && version == 0) {
 
  		static PluginApi_v0 api;
 
  		api.init = init;
 
  		return &api;
 
  	}
 
  	return 0;
 
  }

engine.c

// Initialized elsewhere.
 
  LuaEnvironment *_env = 0;
 
  
 
  void add_module_function(const char *module, const char *name, lua_CFunction f)
 
  {
 
  	_env->add_module_function(module, name, f);
 
  }
 
  
 
  void *get_engine_api(unsigned api, unsigned version)
 
  {
 
  	if (api == LUA_API_ID && version == 0 && _env) {
 
  		static LuaApi_v0 lua;
 
  		lua.add_module_function = add_module_function;
 
  		lua.to_number = lua_tonumber;
 
  		lua.push_number = lua_pushnumber;
 
  		return &lua;
 
  	}
 
  	return 0;
 
  }
 
  
 
  void load_plugin(const char *path)
 
  {
 
  	HMODULE plugin_module = LoadLibrary(path);
 
  	if (!plugin_module) return;
 
  	GetApiFunction get_plugin_api = (GetApiFunction)GetProcAddress(plugin_module, "get_plugin_api");
 
  	if (!get_plugin_api) return;
 
  	PluginApi_v0 *plugin = (PluginApi_v0 *)get_plugin_api(PLUGIN_API_ID, 0);
 
  	if (!plugin) return;
 
  	plugin->init(get_engine_api);
 
  }

This has also been posted to The Bitsquid blog.

Multi-Threaded Programming 3: Locking, Lock-Free, Wait-Free

Original Author: Joseph Simons

Now I want to cover the basic terminology used when talking about concurrent algorithms. This will be helpful so that you are familiar with the available techniques you have when you have multiple threads operating together. The term concurrent itself refers to an algorithm that can scale from being synchronous on a single thread to utilizing multiple threads at the same time. The terms I am going to cover also exist independently of any architecture, though the actual implementation will vary due to the underlying hardware differences. I am going to cover them in order of increasing difficulty to implement, so if you are interested in implementing an algorithm of your own, I would suggest starting with the simplest implementation and moving on only as needed.

If you want to read up on other things that I have covered, then here are my previous posts in this series:

 

locking

Locking

In the previous post, we ended with an example of a mutex lock. This is the easiest and most straightforward method available in order to restrict access to a particular area of code for only a single thread at a time. However, as you may have surmised, it will also have the worst performance when dealing with many threads all accessing the same area at once. This is because the threads are restricted to operating serially, so you basically have given up on the performance benefits of using multiple threads for the area where you lock. But if it is an area that is accessed infrequently, or there is only a small chance that multiple threads will be in the locked area at the same time, it can easily be a “good enough” solution.

Locking is a good first step when you are dealing with data that can be read and modified by multiple threads. As we saw with the explanation of caches, when you modify data on one thread that change will quickly propagate to other threads that then read that data. However, the trickiness lies in the fact that these modifications can occur at any point within your algorithm. So by forcing the algorithm to operate in a serial manner we prevent any unexpected changes that can occur and disrupt what we expect to happen.

While debugging mutex locks can be fairly simple (as far as threading problems go), there are several ways that things can go wrong. If you are using locks and find that program has stopped responding, simply pause it and examine it in the debugger. Chances are that one of the following three things has occurred.

The first pitfall you might have fallen into is what is referred to as a deadlock. This occurs when the operation of your program results in such a manner where a lock is obtained but is never released. This causes your program to halt because no further progress can be made because all the threads are waiting on something that is never going to happen.

One cause of this can be due to not acquiring locks in the same manner in different places in your code. So if you acquire Lock A then Lock B in one place and Lock B then Lock A in another, you can get in a situation where one has Lock A locked and tried to lock Lock B, but can’t because Lock B has already been acquired. Since it has Lock A already, the thread that is holding Lock B cannot get it and so never unlocks Lock B. If you acquire the locks always in the Lock A then Lock B direction, then you can prevent this problem from ever occurring.

DeadLockThreads-1

 

Another cause is simply attempting to use the same lock in multiple places, but you have code that calls to lock multiple times without a resulting unlock. You can easily design a re-entrant mutex that can handle this sort of use case if desired, but make sure that this behavior is what you want.

DeadLockThreads-2

 

The final case that I will cover is likely the hardest to ascertain. When you have multiple threads of different priorities, you need to be careful when it comes to sharing locks. If you have a case where there is low priority, medium priority, and high priority thread, and the low and high priority threads both execute in the same critical section, then you need to be wary of random boosting in Windows, that can help avoid this situation for you.

DeadLockThreads-3

 

Lock-Free

When using atomic operations, we are also forcing a linear execution on our program, but in a much smaller scope. However, they do not fall prey to the things that can go wrong when using locks. Therefore, if we are not using any locks, then our program will always be able to advance. If our algorithm is guaranteed to never wait (even if it has to repeat some of its work), then it is considered a lock-free algorithm. Another way of thinking about it is if you were to arbitrarily pause any thread that is in your algorithm, it will not affect any of the other threads in there.

However, when dealing with lock-free algorithms, even though you do not need to worry about deadlocks as described above, you have a whole new set of problems. The main issue when writing code that doesn’t force linear execution for large sections is that almost anything can happen at anytime. When thinking through the code, you have to take into account that any other thread can come along and perform operations, such as changing a variable, adding / removing a node, or reorganizing the data. Because of this, it is common with lock-free algorithms to continuously test the assumptions before performing an action. If the assumed state is incorrect, then lock-free algorithms typically will loop and repeat their work until they can safely proceed. So, if an area of code has low contention, then using locks might actually lead to faster execution.

However, just testing the state isn’t without its caveats. One particularly nasty thing that can happen is known as the ABA problem. The idea here is that even if you are making sure that you are operating on the same address, by testing it with a Compare-And-Swap atomic action, it can still fail. If the address was recycled in some way (such as being deleted than allocated again, or popped then pushed again) then even though it is at the same location, it is not the same data. This can lead to crashes or lost data, and is very difficult to track down due to the fact that it can be extremely rare.

ABAProblem

 

Wait-Free

Now, if you have an algorithm that will never repeat work and will always advance, then you have a wait-free algorithm. Because of their properties, they will always finish in a fixed number of steps. These can be exceedingly difficult to implement and it is rare to actually see them in use, though technically any algorithm can be certain circumstances). This means that for games, where performance is typically the highest goal and the number of threads is low, lock-free algorithms are generally the best option for highly threaded, time-critical areas.

 

Final Considerations

Never underestimate the power of a “good enough” solution. Because multi-threaded programming carries with it a high chance of hard to detect bugs, a tiny bit of performance tradeoff for a more stable codebase can definitely be worth it. Therefore, it is actually good practice to start with using simple locks to protect shared data. Then, when you are certain that those locks are a bottleneck, investigate into lock-free solutions. Also, because writing good lock-free algorithms can be difficult, it is usually a good idea to keep an eye out for already implemented libraries that of. These will help reduce  possible implementation errors, since simple tests might fail to catch all the problems that can occur. Especially since the issues can be very dependent on rare, timing specific situations.

It is also a very good idea to redesign your code so that you minimize the areas where multiple threads can interact with each other. This gives you the benefits of parallelism without many the issues that can occur (such as all those described earlier). We will look at ways to do this later in the series, but I wanted the reader to first be familiar with some of the difficulties that can come with multi-threaded programming. That way, if you encounter a situation where you have to worry about multiple threads executing on the same data, you have a good grasp of the problem space going into it.

 

See You Next Time

Next Time

For the next post, we are going to look at a basic locking queue implementation. For the post after that, we will then see a lock-free queue implementation where I will cover why we need to make the changes that we did in order to convert it. For the following post, we will see if we can optimize the lock-free queue further, and some pitfalls to watch out for while doing so. Because of the difficulties with wait-free algorithms and the fact they typically don’t have much real-world benefit, we will not be covering those further.

 

Further Reading

If you want to read some more about locking, lock-free, and wait-free algorithms, feel free to check out the following areas:

Custom Vector Allocation

Original Author: Thomas Young

(Also posted to series of posts about Vectors and Vector based containers.)

A few posts back I talked about the idea of ‘rolling your own’ STL-style vector class, based my experiences with this at PathEngine.In two follow-ups I talked about the general approach and also some specific performance tweaks that actually helped in practice for our vector use cases.I haven’t talked about custom memory allocation yet, however. This is something that’s been cited in a number of places as a key reason for switching away from std::vector so I’ll come back now and look at the approach we took for this (which is pretty simple, but nonstandard, and also pre C++11), and assess some of the implications of using this kind of non-standard approach.

I approach this from the point of view of a custom vector implementation, but I’ll be talking about some issues with memory customisation that also apply more generally.

Why custom allocation?

In many situations it’s fine for vectors (and other containers) to just use the same default memory allocation method as the rest of your code, and this is definitely the simplest approach.

(The example vector code I posted previously used malloc() and free(), but works equally well with global operator new and delete.)

But vectors can do a lot of memory allocation, and memory allocation can be expensive, and it’s not uncommon for memory allocation operations to turn up in profiling as the most significant cost of vector based code. Custom memory allocation approaches can help resolve this.

And some other good reasons for hooking into and customising allocations can be the need to avoid memory fragmentation or to track memory statistics.

For these reasons generalised memory customisation is an important customer requirement for our SDK code in general, and then by extension for the vector containers used by this code.

Custom allocation in std::vector

The STL provides a mechanism for hooking into the container allocation calls (such as vector buffer allocations) through allocators, with vector constructors accepting an allocator argument for this purpose.

I won’t attempt a general introduction to STL allocators, but there’s a load of material about this on the web. See, for example, this article on Dr Dobbs, which includes some example use cases for allocators. (Bear in mind that this is pre C++11, however. I didn’t see any similarly targeted overview posts for using allocators post C++11.)

A non-standard approach

We actually added the possibility to customise memory allocation in our vectors some time after switching to a custom vector implementation. (This was around mid-2012. Before that PathEngine’s memory customisation hooks worked by overriding global new and delete, and required dll linkage if you wanted to manage PathEngine memory allocations separately from allocations in the main game code.)

We’ve generally tried to keep our custom vector as similar as possible to std::vector, in order to avoid issues with unexpected behaviour (since a lot of people know how std::vector works), and to ensure that code can be easily switched between std::vector and our custom vector. When it came to memory allocation, however, we chose a significantly different (and definitely non-standard) approach, because in practice a lot of vector code doesn’t actually use allocators (or else just sets allocators in a constructor), because we already had a custom vector class in place, and because I just don’t like STL allocators!

Other game developers

A lot of other game developers have a similar opinion of STL allocators, and for many this is actually then also a key factor in a decision to switch to custom container classes.

For example, issues with the design of STL allocators are quoted as one of the main reasons for the creation of the EASTL, a set of STL replacement classes, by Electronic Arts. From the EASTL paper:

Among game developers the most fundamental weakness is the std allocator design, and it is this weakness that was the largest contributing factor to the creation of EASTL.

And I’ve heard similar things from other developers. For example, in this blog post about the Bitsquid approach to allocators Niklas Frykholm says:

If it weren’t for the allocator interface I could almost use STL. Almost.

Let’s have a look at some of the reasons for this distaste!

Problems with STL allocators

We’ll look at the situation prior to C++11, first of all, and the historical basis for switching to an alternative mechanism.

A lot of problems with STL allocators come out of confusion in the initial design. According to Alexander Stepanov (primary designer and implementer of the STL) the custom allocator mechanism was invented to deal with a specific issue with Intel memory architecture. (Do you remember near and far pointers? If not, consider yourself lucky I guess!) From this interview with Alexander:

Question: How did allocators come into STL? What do you think of them?

Answer: I invented allocators to deal with Intel’s memory architecture. They are not such a bad ideas in theory – having a layer that encapsulates all memory stuff: pointers, references, ptrdiff_t, size_t. Unfortunately they cannot work in practice.

And it seems like this original design intention was also only partially executed. From the wikipedia entry for allocators:

They were originally intended as a means to make the library more flexible and independent of the underlying memory model, allowing programmers to utilize custom pointer and reference types with the library. However, in the process of adopting STL into the C++ standard, the C++ standardization committee realized that a complete abstraction of the memory model would incur unacceptable performance penalties. To remedy this, the requirements of allocators were made more restrictive. As a result, the level of customization provided by allocators is more limited than was originally envisioned by Stepanov.

and, further down:

While Stepanov had originally intended allocators to completely encapsulate the memory model, the standards committee realized that this approach would lead to unacceptable efficiency degradations. To remedy this, additional wording was added to the allocator requirements. In particular, container implementations may assume that the allocator’s type definitions for pointers and related integral types are equivalent to those provided by the default allocator, and that all instances of a given allocator type always compare equal, effectively contradicting the original design goals for allocators and limiting the usefulness of allocators that carry state.

Some of the key problems with STL allocators (historically) are then:

  • Unnecessary complexity, with some boiler plate stuff required for features that are not actually used
  • A limitation that allocators cannot have internal state (‘all instances of a given allocator type are required to be interchangeable and always compare equal to each other’)
  • The fact the allocator type is included in container type (with changes to allocator type changing the type of the container)

There are some changes to this situation with C++11, as we’ll see below, but this certainly helps explain why a lot of people have chosen to avoid the STL allocator mechanism, historically!

Virtual allocator interface

So we decided to avoid STL allocators, and use a non-standard approach.

The approach we use is based on a virtual allocator interface, and avoids the need to specify allocator type as a template parameter.

This is quite similar to the setup for allocators in the BitSquid engine, as described by Niklas here (as linked above, it’s probably worth reading that post if you didn’t see this already, as I’ll try to avoid repeating the various points he discussed there).

A basic allocator interface can then be defined as follows:

class iAllocator
 
  {
 
  public:
 
      virtual ~iAllocator() {}
 
      virtual void* allocate(tUnsigned32 size) = 0;
 
      virtual void deallocate(void* ptr) = 0;
 
  // helper
 
      template <class T> void
 
      allocate_Array(tUnsigned32 arraySize, T*& result)
 
      {
 
          result = static_cast<T*>(allocate(sizeof(T) * arraySize));
 
      }
 
  };

The allocate_Array() method is for convenience, concrete allocator objects just need to implement allocate() and free().

We can store a pointer to iAllocator in our vector, and replace the direct calls to malloc() and free() with virtual function calls, as follows:

    static T*
 
      allocate(size_type size)
 
      {
 
          T* allocated;
 
          _allocator->allocate_Array(size, allocated);
 
          return allocated;
 
      }
 
      void
 
      reallocate(size_type newCapacity)
 
      {
 
          T* newData;
 
          _allocator->allocate_Array(newCapacity, newData);
 
          copyRange(_data, _data + _size, newData);
 
          deleteRange(_data, _data + _size);
 
          _allocator->deallocate(_data);
 
          _data = newData;
 
          _capacity = newCapacity;
 
      }

These virtual function calls potentially add some overhead to allocation and deallocation. It’s worth being quite careful about this kind of virtual function call overhead, but in practice it seems that the overhead is not significant here. Virtual function call overhead is often all about cache misses and, perhaps because there are often just a small number of actual allocator instance active, with allocations tending to be grouped by allocator, this just isn’t such an issue here.

We use a simple raw pointer for the allocator reference. Maybe a smart pointer type could be used (for better modern C++ style and to increase safety), but we usually want to control allocator lifetime quite explicitly, so we’re basically just careful about this.

Allocators can be passed in to each vector constructor, or if omitted will default to a ‘global allocator’ (which adds a bit of extra linkage to our vector header):

    cVector(size_type size, const T& fillWith,
 
          iAllocator& allocator = GlobalAllocator()
 
          )
 
      {
 
          _data = 0;
 
          _allocator = &allocator;
 
          _size = size;
 
          _capacity = size;
 
          if(size)
 
          {
 
              _allocator->allocate_Array(_capacity, _data);
 
              constructRange(_data, _data + size, fillWith);
 
          }
 
      }

Here’s an example concrete allocator implementation:

class cMallocAllocator : public iAllocator
 
  {
 
  public:
 
      void*
 
      allocate(tUnsigned32 size)
 
      {
 
          assert(size);
 
          return malloc(static_cast<size_t>(size));
 
      }
 
      void
 
      deallocate(void* ptr)
 
      {
 
          free(ptr);
 
      }
 
  };

(Note that you normally can call malloc() with zero size, but this is something that we disallow for PathEngine allocators.)

And this can be passed in to vector construction as follows:

    cMallocAllocator allocator;
 
      cVector<int> v(10, 0, allocator);

Swapping vectors

That’s pretty much it, but there’s one tricky case to look out for.

Specifically, what should happen in our vector swap() method? Let’s take a small diversion to see why there might be a problem.

Consider some code that takes a non-const reference to vector, and ‘swaps a vector out’ as a way of returning a set of values in the vector without the need to heap allocate the vector object itself:

class cVectorBuilder
 
  {
 
      cVector<int> _v;
 
  public:
 
      //.... construction and other building methods
 
      void takeResult(cVector<int>& result); // swaps _v into result
 
  };

So this code doesn’t care about allocators, and just wants to work with a vector of a given type. And maybe there is some other code that uses this, as follows:

void BuildData(/*some input params*/, cVector& result)
 
  {
 
    //.... construct a cVectorBuilder and call a bunch of build methods
 
      builder.takeResult(result);
 
  }

Now there’s no indication that there’s going to be a swap() involved, but the result vector will end up using the global allocator, and this can potentially cause some surprises in the calling code:

   cVector v(someSpecialAllocator);
 
     BuildData(/*input params*/, v);
 
     // lost our allocator assignment!
 
     // v now uses the global allocator

Nobody’s really doing anything wrong here (although this isn’t really the modern C++ way to do things). This is really a fundamental problem arising from the possibility to swap vectors with different allocators, and there are other situations where this can come up.

You can find some discussion about the possibilities for implementing vector swap with ‘unequal allocators’ here. We basically choose option 1, which is to simply declare it illegal to call swap with vectors with different allocators. So we just add an assert in our vector swap method that the two allocator pointers are equal.

In our case this works out fine, since this doesn’t happen so much in practice, because cases where this does happen are caught directly by the assertion, and because it’s generally straightforward to modify the relevant code paths to resolve the issue.

Comparison with std::vector, is this necessary/better??

Ok, so I’ve outlined the approach we take for custom allocation in our vector class.

This all works out quite nicely for us. It’s straightforward to implement and to use, and consistent with the custom allocators we use more generally in PathEngine. And we already had our custom vector in place when we came to implement this, so this wasn’t part of the decision about whether or not to switch to a custom vector implementation. But it’s interesting, nevertheless, to compare this approach with the standard allocator mechanism provided by std::vector.

My original ‘roll-your-own vector’ blog post was quite controversial. There were a lot of responses strongly against the idea of implementing a custom vector, but a lot of other responses (often from the game development industry side) saying something like ‘yes, we do that, but we do some detail differently’, and I know that this kind of customisation is not uncommon in the industry.

These two different viewpoints makes it worthwhile to explore this question in a bit more detail, then, I think.

I already discussed the potential pitfalls of switching to a custom vector implementation in the original ‘roll-your-own vector’ blog post, so lets look at the potential benefits of switching to a custom allocator mechanism.

Broadly speaking, this comes down to three key points:

  • Interface complexity
  • Stateful allocator support
  • Possibilities for further customisation and memory optimisation

Interface complexity

If we look at an example allocator implementation for each setup we can see that there’s a significant difference in the amount of code required. The following code is taken from my previous post, and was used to fill allocated memory with non zero values, to check for zero initialisation:

// STL allocator version
 
  template <class T>
 
  class cNonZeroedAllocator
 
  {
 
  public:
 
      typedef T value_type;
 
      typedef value_type* pointer;
 
      typedef const value_type* const_pointer;
 
      typedef value_type& reference;
 
      typedef const value_type& const_reference;
 
      typedef typename std::size_t size_type;
 
      typedef std::ptrdiff_t difference_type;
 
      template <class tTarget>
 
      struct rebind
 
      {
 
          typedef cNonZeroedAllocator<tTarget> other;
 
      };
 
      cNonZeroedAllocator() {}
 
      ~cNonZeroedAllocator() {}
 
      template <class T2>
 
      cNonZeroedAllocator(cNonZeroedAllocator<T2> const&)
 
      {
 
      }
 
      pointer
 
      address(reference ref)
 
      {
 
          return &ref;
 
      }
 
      const_pointer
 
      address(const_reference ref)
 
      {
 
          return &ref;
 
      }
 
      pointer
 
      allocate(size_type count, const void* = 0)
 
      {
 
          size_type byteSize = count * sizeof(T);
 
          void* result = malloc(byteSize);
 
          signed char* asCharPtr;
 
          asCharPtr = reinterpret_cast<signed char*>(result);
 
          for(size_type i = 0; i != byteSize; ++i)
 
          {
 
              asCharPtr[i] = -1;
 
          }
 
          return reinterpret_cast<pointer>(result);
 
      }
 
      void deallocate(pointer ptr, size_type)
 
      {
 
          free(ptr);
 
      }
 
  
 
      size_type
 
      max_size() const
 
      {
 
          return 0xffffffffUL / sizeof(T);
 
      }
 
      void
 
      construct(pointer ptr, const T& t)
 
      {
 
          new(ptr) T(t);
 
      }
 
      void
 
      destroy(pointer ptr)
 
      {
 
          ptr->~T();
 
      }
 
      template <class T2> bool
 
      operator==(cNonZeroedAllocator<T2> const&) const
 
      {
 
          return true;
 
      }
 
      template <class T2> bool
 
      operator!=(cNonZeroedAllocator<T2> const&) const
 
      {
 
          return false;
 
      }
 
  };

But with our custom allocator interface this can now be implemented as follows:

// custom allocator version
 
  class cNonZeroedAllocator : public iAllocator
 
  {
 
  public:
 
      void*
 
      allocate(tUnsigned32 size)
 
      {
 
          void* result = malloc(static_cast<size_t>(size));
 
          signed char* asCharPtr;
 
          asCharPtr = reinterpret_cast<signed char*>(result);
 
          for(tUnsigned32 i = 0; i != size; ++i)
 
          {
 
              asCharPtr[i] = -1;
 
          }
 
          return result;
 
      }
 
      void
 
      deallocate(void* ptr)
 
      {
 
          free(ptr);
 
      }
 
  };

As we saw previously a lot of stuff in the STL allocator relates to some obsolete design decisions, and is unlikely to actually be used in practice. The custom allocator interface also completely abstracts out the concept of constructed object type, and works only in terms of actual memory sizes and pointers, which seems more natural and whilst doing everything we need for the allocator use cases in PathEngine.

For me this is one advantage of the custom allocation setup, then, although probably not something that would by itself justify switching to a custom vector.

If you use allocators that depend on customisation of the other parts of the STL allocator interface (other than for data alignment) please let me know in the comments thread. I’m quite interested to hear about this! (There’s some discussion about data alignment customisation below.)

Stateful allocator requirement

Stateful allocator support is a specific customer requirement for PathEngine.

Clients need to be able to set custom allocation hooks and have all allocations made by the SDK (including vector buffer allocations) routed to custom client-side allocation code. Furthermore, multiple allocation hooks can be supplied, with the actual allocation strategy selected depending on the actual local execution context.

It’s not feasible to supply allocation context to all of our vector based code as a template parameter, and so we need our vector objects to support stateful allocators.

Stateful allocators with the virtual allocator interface

Stateful allocators are straightforward with our custom allocator setup. Vectors can be assigned different concrete allocator implementations and these concrete allocator implementations can include internal state, without code that works on the vectors needing to know anything about these details.

Stateful allocators with the STL

As discussed earlier, internal allocator state is something that was specifically forbidden by the original STL allocator specification. This is something that has been revisited in C++11, however, and stateful allocators are now explicitly supported, but it also looks like it’s possible to use stateful allocators in practice with many pre-C++11 compile environments.

The reasons for disallowing stateful allocators relate to two specific problem situations:

  • Splicing nodes between linked lists with different allocation strategies
  • Swapping vectors with different allocation strategies

C++11 addresses these issues with This stackoverflow answer discusses what happens, specifically, with C++11, in the vector swap case.

With PathEngine we want to be able to support clients with different compilation environments, and it’s an advantage not to require C++11 support. But according to this stackoverflow answer, you can also actually get away with using stateful allocators in most cases, without explicit C++11 support, as long as you avoid these problem cases.

Since we already prohibit the vector problem case (swap with unequal allocators), that means that we probably can actually implement our stateful allocator requirement with std::vector and STL allocators in practice, without requiring C++11 support.

There’s just one proviso, with or without C++11 support, due to allowances for legacy compiler behaviour in allocator traits. Specifically, it doesn’t look like we can get the same assertion behaviour in vector swap. If propagate_on_container_swap::value is set to false for either allocator then the result is ‘undefined behaviour’, so this could just swap the allocators silently, and we’d have to be quite careful about these kinds of problem cases!

Building on stateful allocators to address other issues

If you can use stateful allocators with the STL then this changes things a bit. A lot of things become possible just by adding suitable internal state to standard STL allocator implementations. But you can also now use this allocator internal state as a kind of bootstrap to work around other issues with STL allocators.

The trick is wrap up the same kind of virtual allocator interface setup we use in PathEngine in an STL allocator wrapper class. You could do this (for example) by putting a pointer to our iAllocator interface inside an STL allocator class (as internal state), and then forward the actual allocation and deallocation calls as virtual function calls through this pointer.

So, at the cost of another layer of complexity (which can be mostly hidden from the main application code), it should now be possible to:

  • remove unnecessary boiler plate from concrete allocator implementations (since these now just implement iAllocator), and
  • use different concrete allocator types without changing the actual vector type.

Although I’m still not keen on STL allocators, and prefer the direct simplicity of our custom allocator setup as opposed to covering up the mess of the STL allocator interface in this way, I have to admit that this does effectively remove two of the key benefits of our custom allocator setup. Let’s move on to the third point, then!

Refer to this presentation about bloomberg allocators in the context C++11 allocator changes).

Memory optimisation

The other potential benefit of custom allocation over STL allocators is basically the possibility to mess around with the allocation interface.

With STL allocators we’re restricted to using the allocate() and deallocate() methods exactly as defined in the original allocator specification. But with our custom allocator we’re basically free to mess with these method definitions (in consultation with our clients!), or to add additional methods, and generally change the interface to better suit our clients needs.

There is some discussion of this issue in this proposal for improving STL allocators, which talks about ways in which the memory allocation interface provided by STL allocators can be sub-optimal.

Some customisations implemented in the Bitsquid allocators are:

  • an ‘align’ parameter for the allocation method, and
  • a query for the size of allocated blocks

PathEngine allocators don’t include either of these customisations, although this is stuff that we can add quite easily if required by our clients. Our allocator does include the following extra methods:

    virtual void*
 
      expand(
 
              void* oldPtr,
 
              tUnsigned32 oldSize,
 
              tUnsigned32 oldSize_Used,
 
              tUnsigned32 newSize
 
              ) = 0;
 
  // helper
 
      template <class T> void
 
      expand_Array(
 
              T*& ptr,
 
              tUnsigned32 oldArraySize,
 
              tUnsigned32 oldArraySize_Used,
 
              tUnsigned32 newArraySize
 
              )
 
      {
 
          ptr = static_cast<T*>(expand(
 
              ptr,
 
              sizeof(T) * oldArraySize,
 
              sizeof(T) * oldArraySize_Used,
 
              sizeof(T) * newArraySize
 
              ));
 
      }

What this does, essentially, is to provide a way for concrete allocator classes to use the realloc() system call, or similar memory allocation functionality in a custom head, if this is desired.

As before, the expand_Array() method is there for convenience, and concrete classes only need to implement the expand() method. This takes a pointer to an existing memory block, and can either add space to the end of this existing block (if possible), or allocate a larger block somewhere else and move existing data to that new location (based on the oldSize_Used parameter).

Implementing expand()

A couple of example implementations for expand() are as follows:

// in cMallocAllocator, using realloc()
 
      void*
 
      expand(
 
          void* oldPtr,
 
          tUnsigned32 oldSize,
 
          tUnsigned32 oldSize_Used,
 
          tUnsigned32 newSize
 
          )
 
      {
 
          assert(oldPtr);
 
          assert(oldSize);
 
          assert(oldSize_Used <= oldSize);
 
          assert(newSize > oldSize);
 
          return realloc(oldPtr, static_cast<size_t>(newSize));
 
      }
// as allocate and move
 
      void*
 
      expand(
 
          void* oldPtr,
 
          tUnsigned32 oldSize,
 
          tUnsigned32 oldSize_Used,
 
          tUnsigned32 newSize
 
          )
 
      {
 
          assert(oldPtr);
 
          assert(oldSize);
 
          assert(oldSize_Used <= oldSize);
 
          assert(newSize > oldSize);
 
          void* newPtr = allocate(newSize);
 
          memcpy(newPtr, oldPtr, static_cast<size_t>(oldSize_Used));
 
          deallocate(oldPtr);
 
          return newPtr;
 
      }

So this can either call through directly to something like realloc(), or emulate realloc() with a sequence of allocation, memory copy and deallocation operations.

Benchmarking with realloc()

With this expand() method included in our allocator it’s pretty straightforward to update our custom vector to use realloc(), and it’s easy to see how this can potentially optimise memory use, but does this actually make a difference in practice?

I tried some benchmarking and it turns out that this depends very much on the actual memory heap implementation in use.

I tested this first of all with the following simple benchmark:

template <class tVector> static void
 
  PushBackBenchmark(tVector& target)
 
  {
 
      const int pattern[] = {0,1,2,3,4,5,6,7};
 
      const int patternLength = sizeof(pattern) / sizeof(*pattern);
 
      const int iterations = 10000000;
 
      tSigned32 patternI = 0;
 
      for(tSigned32 i = 0; i != iterations; ++i)
 
      {
 
          target.push_back(pattern[patternI]);
 
          ++patternI;
 
          if(patternI == patternLength)
 
          {
 
              patternI = 0;
 
          }
 
      }
 
  }

(Wrapped up in some code for timing over a bunch of iterations, with result checking to avoid the push_back being optimised out.)

This is obviously very far from a real useage situation, but the results were quite interesting:

OS container type time
Linux std::vector 0.0579 seconds
Linux cVector without realloc 0.0280 seconds
Linux cVector with realloc 0.0236 seconds
Windows std::vector 0.0583 seconds
Windows cVector without realloc 0.0367 seconds
Windows cVector with realloc 0.0367 seconds

So the first thing that stands out from these results is that using realloc() doesn’t make any significant difference on windows. I double checked this, and while expand() is definitely avoiding memory copies a significant proportion of the time, this is either not significant in the timings, or memory copy savings are being outweighed by some extra costs in the realloc() call. Maybe realloc() is implemented badly on Windows, or maybe the memory heap on Windows is optimised for more common allocation scenarios at the expense of realloc(), I don’t know. A quick google search shows that other people have seen similar issues.

Apart from that it looks like realloc() can make a significant performance difference, on some platforms (or depending on the memory heap being used). I did some extra testing, and it looks like we’re getting diminishing returns after some of the other performance tweaks we made in our custom vector, specifically the tweaks to increase capacity after the first push_back, and the capacity multiplier tweak. With these tweaks backed out:

OS container type time
Linux cVector without realloc, no tweaks 0.0532 seconds
Linux cVector with realloc, no tweaks 0.0235 seconds

So, for this specific benchmark, using realloc() is very significant, and even avoids the need for those other performance tweaks.

Slightly more involved benchmark

The benchmark above is really basic, however, and certainly isn’t a good general benchmark for vector memory use. In fact, with realloc(), there is only actually ever one single allocation made, which is then naturally free to expand through the available memory space!

A similar benchmark is discussed in this stackoverflow question, and in that case the benefits seemed to reduce significantly with more than one vector in use. I hacked the benchmark a bit to see what this does for us:

template <class tVector> static void
 
  PushBackBenchmark_TwoVectors(tVector& target1, tVector& target2)
 
  {
 
      const int pattern[] = {0,1,2,3,4,5,6,7};
 
      const int patternLength = sizeof(pattern) / sizeof(*pattern);
 
      const int iterations = 10000000;
 
      tSigned32 patternI = 0;
 
      for(tSigned32 i = 0; i != iterations; ++i)
 
      {
 
          target1.push_back(pattern[patternI]);
 
          target2.push_back(pattern[patternI]);
 
          ++patternI;
 
          if(patternI == patternLength)
 
          {
 
              patternI = 0;
 
          }
 
      }
 
  }
 
  template <class tVector> static void
 
  PushBackBenchmark_ThreeVectors(tVector& target1, tVector& target2, tVector& target3)
 
  {
 
      const int pattern[] = {0,1,2,3,4,5,6,7};
 
      const int patternLength = sizeof(pattern) / sizeof(*pattern);
 
      const int iterations = 10000000;
 
      tSigned32 patternI = 0;
 
      for(tSigned32 i = 0; i != iterations; ++i)
 
      {
 
          target1.push_back(pattern[patternI]);
 
          target2.push_back(pattern[patternI]);
 
          target3.push_back(pattern[patternI]);
 
          ++patternI;
 
          if(patternI == patternLength)
 
          {
 
              patternI = 0;
 
          }
 
      }
 
  }

With PushBackBenchmark_TwoVectors():

OS container type time
Linux std::vector 0.0860 seconds
Linux cVector without realloc 0.0721 seconds
Linux cVector with realloc 0.0495 seconds

With PushBackBenchmark_ThreeVectors():

OS container type time
Linux std::vector 0.1291 seconds
Linux cVector without realloc 0.0856 seconds
Linux cVector with realloc 0.0618 seconds

That’s kind of unexpected.

If we think about what’s going to happen with the vector buffer allocations in this benchmark, on the assumption of sequential allocations into a simple contiguous memory region, it seems like the separate vector allocations in the modified benchmark versions should actually prevent each other from expanding. And I expected that to reduce the benefits of using realloc. But the speedup is actually a lot more significant for these benchmark versions.

I stepped through the benchmark and the vector buffer allocations are being placed sequentially in a single contiguous memory region, and do initially prevent each other from expanding, but after a while the ‘hole’ at the start of the memory region gets large enough to be reused, and then reallocation becomes possible, and somehow turns out to be an even more significant benefit. Maybe these benchmark versions pushed the memory use into a new segment and incurred some kind of segment setup costs?

With virtual memory and different layers of memory allocation in modern operating systems, and different approaches to heap implementations, it all works out as quite a complicated issue, but it does seem fairly clear, at least, that using realloc() is something that can potentially make a significant difference to vector performance, in at least some cases!

Realloc() in PathEngine

Those are all still very arbitrary benchmarks and it’s interesting to see how much this actually makes a difference for some real uses cases. So I had a look at what difference the realloc() support makes for the vector use in PathEngine.

I tried our standard set of SDK benchmarks (with common queries in some ‘normal’ situations), both with and without realloc() support, and compared the timings for these two cases. It turns out that for this set of benchmarks, using realloc() doesn’t make a significant difference to the benchmark timings. There are some slight improvements in some timings, but nothing very noticeable.

The queries in these benchmarks have already had quite a lot of attention for performance optimisation, of course, and there are a bunch of other performance optimisations already in the SDK that are designed to avoid the need for vector capacity increases in these situations (reuse of vectors for runtime queries, for example). Nevertheless, if we’re asking whether custom allocation with realloc() is ‘necessary or better’ in the specific case of PathEngine vector use (and these specific benchmarks) the answer appears to be that no this doesn’t really seem to make any concrete difference!

Memory customisation and STL allocators

As I’ve said above, this kind of customisation of the allocator interface (to add stuff like realloc() support) is something that we can’t do with the standard allocator setup (even with C++11).

For completeness it’s worth noting the approach suggested by Alexandrescu in this article where he shows how you can effectively shoehorn stuff like realloc() calls into STL allocators.

But this does still depends on using some custom container code to detect special allocator types, and won’t work with std::vector.

Conclusion

This has ended up a lot longer than I originally intended so I’ll go ahead and wrap up here!

To conclude:

  • It’s not so hard to implement your own allocator setup, and integrate this with a custom vector (I hope this post gives you a good idea about what can be involved in this)
  • There are ways to do similar things with the STL, however, and overall this wouldn’t really work out as a strong argument for switching to a custom vector in our case
  • A custom allocator setup will let you do some funky things with memory allocation, if your memory heap will dance the dance, but it’s not always clear that this will translate into actual concrete performance benefits

A couple of things I haven’t talked about:

Memory fragmentation: custom memory interfaces can also be important for avoiding memory fragmentation, and this can be an important issue. We don’t have a system in place for actually measuring memory fragmentation, though, and I’d be interested to hear how other people in the industry actually quantify or benchmark this.

Memory relocation: the concept of ‘relocatable allocators’ is quite interesting, I think, although this has more significant implications for higher level vector based code, and requires moving further away from standard vector usage. This is something I’ll maybe talk about in more depth later on..

** Comments: Please check the comment thread on the original post before commenting. **

GDC 2014 Report

Original Author: Jaewon Jung

It’s that time again. Personally I think GDC this year was more interesting that a few preceding ones. Next-gen consoles have arrived, mobile gaming is really big still, and a quality Virtual Reality is becoming a reality.

PowerVR Ray Tracing Hardware and Mobile Graphics

ImgTec announced its upcoming mobile GPU, PowerVR GR6500, which features dedicated ray tracing units on top of its rasterization capabilities. It has a Ray Tracing Unit (RTU) which uses fixed-function math to perform ray tracing intersection queries and a scene hierarchy generator to speed up dynamic object updates. You can program it by writing ray shaders for OpenRL APIs. It makes a hybrid approach feasible, where ray tracing is used only for stuff it’s good at like shadowing, reflection, and refraction in addition to a regular rasterization. Unity showcased its editor using this technology for interactive real-time lightmap previews. It’s interesting that this kind of hardware innovation came from a mobile chip company rather than traditional a GPU powerhouse like NVIDIA or AMD. But, if you think of it, it makes sense. Arithmetic performance of the latest mobile GPUs has already reached that of previous generation consoles like Xbox 360. But, its bandwidth still remains quite limited compared to the previous gen consoles because of power limitation of mobile devices. Dedicating a small portion of its die space to ray tracing, which can spend less bandwidth to achieve the same effects previously mentioned in comparison to rasterization, can be a win to circumvent such a constraint. As they say, constraints breed creativity. The chip is not yet in production and it’s yet to be seen that other GPU manufactures will follow suit or not, but it remains true that this is really an interesting time for graphics coders.

There have been a few talks targeted for enabling console-level mobile graphics from PowerVR, Qualcomm, and Epic Games. Mobile GPUs usually do a tile-based rendering except Tegra one from NVIDIA, but there seems to be considerable differences even among tile-based GPUs from different vendors, which graphics codes should be aware of, to achieve full potential of each hardware.

VR is the next big thing

Kickstarter-born Oculus Rift recently has become an object of attention with great reviews from everywhere, its second hardware coming up, and a few great gamedev talents, including venerable John Carmack, joining it. (In related news, joined Oculus Rift, or more precisely Facebook.) But, in this GDC, the one who revealed an ace up its sleeve was Sony. Its aptly named Project Morpheus seems to have the same goal overall as Rift, but with its PlayStation 4 ecosystem including its camera and Move controller. According to SCE Worldwide Studios President Shuhei Yoshida, It has been in the works since 2010. The current prototype features a head-mounted display with 1080p resolution and a 90 degree field of view and seems to use Playstation Camera to track head orientation and movement. It’s still unclear what its final spec will look like and when the final product will be released. VR is a totally new medium whose scope extends beyond the domain of video games, as the Sony presentation emphasized, and it’ll be very interesting for us, game developers, both as a player and a developer, to see how these unfold in the coming years.

War of Game Engines

Epic Games introduced a $19 monthly subscription model (with 5 percent of revenue royalty payment) for Unreal Engine 4. Right after this announcement, Crytek tried to undercut it by announcing a %9.90 monthly subscription without any royalty for its CryEngine. Ostensibly, the CryEngine deal sounds much better if we naively assume both have similar capabilities, but there is a big caveat: Crytek deal doesn’t provide a full source code access like the Unreal one. Epic Games even allows keeping the source code after the subscription expired. In other words, as a programmer, you can dig into its latest source code and try to learn all its tricks & techniques with just twenty bucks, if you have time & passion for it. What a bold move. Unity also didn’t sit on its hands. It announced the latest incarnation of its engine, Unity 5, with impressive features like physically-based shading and WebGL deployment.

DirectX is not dead

It’s been a while since DirectX 11 came out. Microsoft also seemed to signal that the announcement of OpenGL ES 3.1.

Misc

A math tutorial by Code Clinic: How to Write Code the Compiler Can Actually Optimize. He also shows how zip archiving can be abused to measure information density of some data used in your code.

(I posted this article to my personal blog, too.)

Links

Principled Design

Original Author: Dan-Speed

What makes good software?

temporal coupling, informative error messages. Perhaps it’s well documented where it needs to be.

These are obvious candidates, but even here there is room for debate between the individuals who work on the software.

One property of software that I like to see is a consistency of vision, which tends to be easy enough to achieve with a one man team over a short period, but is more challenging to maintain over a longer period or with more people. By “vision”, I mean a high level consistency of understanding of what the software is meant to solve (and what it’s not meant to solve) and the general way in which it sets about that. Over a longer period of time, it’s very easy to forget what properties of the software were important when it was created, and therefore how its fitness of purpose should be evaluated.

These sorts of high level concepts are easily lost from documentation but are important for new people working on a piece of software to understand. I’m a big fan of giving a software library a mission statement,  but I’ve found that a list of principles is very useful in a number of ways, including fostering debate.

What principles are important to me?

Here’s the exercise – make a list of principles you feel that are important to how you build something that you’re working on.

Here are some guidelines for making this list:

  • Keep it short and sweet (~10 items)
  • Prefer ideas that force people to go out of their way over things that are expected
  • Avoid being specific to a particular solution (though this may sometimes be called for)
  • Don’t worry about order
  • Ensure that each principle is clearly communicated, but brief
  • You may add explanations about the reason for adopting a principle afterwards
  • Principles can be changed after they are agreed, but doing so ought to mean that you need to address the parts where your software violates them, or consider if you need to start again.
  • Pick only principles that can be evaluated against an implementation; if you can’t then you still lack concreteness or clarity
  • If your organization already has principles for software construction you need to make sure your principles are not contradicting them
  • Put it somewhere it’s easily found for reference

The goal of this list is to:

  • Foster debate within the team or with stakeholders about what is important
  • Help constrain the shape of the solution, without dictating the implementation
  • Provide an explanation and motivation for inconvenient choices
  • Provide a checklist for evaluating new features/changes
  • Help you to realize when a new feature or change would compromise the original principles and therefore lead to an inconsistent and confusing design

A quick bit of reasoning about the principles:

It’s important to be brief because the point is to communicate things to other people quickly and the principles need to be easy to remember. You want to keep the list short, so that you have a reasonable chance of sticking to it, and describe each item at a higher-level, so that the items provide value to a large area of your software, rather than small pockets. Principles are not rules, they can be broken, but there should always be a conversation about doing so.

If everyone in your company believes in XP, TDD etc, then having a principle that your software is unit tested doesn’t require anyone to go out of their way. However, if all of your technology is generally tied to a very particular tech stack and environment but you need this new software to be usable, for instance by a build system, outside of that environment, that’s a good principle to put down.

You should pick a solution because it achieves your goals and fits with your principles, not because you picked it up-front and put a principle around it. Principles should provoke a discussion about when an existing solution doesn’t fit any more rather than enforcing blind continuation. Note that the principle itself trumps the explanation, so an explanation could be a specific example, but that should not limit the principle.

Avoid things like “The software should be well written”, or “The game design should be meaningful to players”. These ideas are subjective and open to interpretation, as well as being almost impossible to evaluate.

A nice thing is that this can be applied at almost any level, to almost anything. You can make some company-wide architectural principles, personal principles or principles for a library.

Some example principles & explanations

  • Modular – the libraries should not require each other. Requirements will change or be different for new projects, new components will develop. This should be embraced without requiring us to discard the whole set of tools.
  • Standard Formats – the tools should use standard formats, allowing tools to be written in other languages or scripts.
  • Local testing – parts of the solution that are vendor specific must be testable locally through the use of interfaces & dependency injection
  • Minimal Setup – the software should be easy to deploy
  • Iteration time is important
  • Familiar – The solution should be familiar to game designers using it, and quick to teach programmers to develop with
  • HTTP First – REST/JSON based network APIs are preferred over custom formats and transports.
  • Push – deployments should be pushed, not pulled at random times by cron jobs / hup / agents.
  • Backwards compatibility –  we achieve backwards compatibility through careful versioning.
  • Free of Game logic – our back-end services will not be polluted by game-specific logic, but will use generic feature flags to achieve the desired behaviors

Software development hot air

This just provides a tool for communication, discussion and a bit of periodic reflection on what you find important. These things are achievable in other ways, but I’ve found that this is a lightweight way to help you mold your solution to set of concrete underlying ideas that might otherwise be lost. The constraints provided give other people a map to follow in understanding the path that you took, and a chance to follow in your footsteps (or express their differences).

constrained_Problem_Space

Principles are a purer motivation than practices. Practices can become rote. You can sprint, groom the backlog, do stand-ups, and still not be Agile. By keeping principles close to your heart, you, your team, your code-base and your practices can be in a continual cycle of self-evaluation and evolution without it being haphazard. Challenge yourself to live up to your principles.

Zero Initialisation for Classes

Original Author: Thomas Young

(Also posted to series of posts about Vectors and Vector based containers.)

This is a response to comments on a previous post, roll your own vector, and has also been rewritten and updated fairly significantly since first posted.

In roll your own vector I talked about a change we made to the initialisation semantics for PathEngine’s custom vector class. In my first followup post I looked more closely at possibilities for replacing resize() with reserve() (which can avoid the initialisation issue in many cases), but so far I’m been concentrating pretty much exclusively on zero initialisation for built-in types. In this post I come back to look at the issue of initialisation semantics for class element types.

Placement new subtleties

At it’s root the changed initialisation semantics for our vector all come down to a single (quite subtle) change in the way we write one of the placement new expressions.

It’s all about the placement new call for element default construction. This is required when elements need to be initialised, but no value is provided for initialisation by the calling code, for example in a call to vector resize() with no fill value argument.

As shown in my previous post, the standard way to implement this placement new is with the following syntax:

       new((void*)begin) T();

but we chose to replace this with the following, subtly different placement new syntax:

       new((void*)begin) T;

So we left out a pair of brackets.

Note that this missing pair of brackets is what I’m talking about when I refer to ‘changed initialisation semantics’. (Our custom vector class does not omit initialisation completely!)

What those brackets do

So what do those brackets do, and what happens when we remove them?

Well, this is all about ‘zero initialisation’.

In certain cases the memory for the object of type T being constructed will get zero initialised in the first version of the placement new call (‘new((void*)begin) T()’), but not in the second version (‘new((void*)begin) T’).

You can see find these two initialisation types documented on cppreference.com, in this stackoverflow answer, as well as in the related links.

This makes a difference during element construction for built in types, (as we saw with the buffer initialisation overhead in my previous post), but also for certain types classes and structs, and this is what I’ll be looking at in this post.

Initialisation of built in types

It’s quite well known that initialisation for built-in types works differently for global variables (which are usually created as part of the program’s address space) and local variables (which are allocated on the program stack).

If we start with the following:

int
 
  main(int argc, char* argv[])
 
  {
 
      int i;
 
      assert(i == 0);
 
      return 0;
 
  }

This runs through quite happily with the debug build, but if I turn assertions on in the release build then this assertion gets triggered. That’s not really surprising. This kind of uninitialised local variable is a well known gotcha and I think most people with a reasonable amount of experience in C++ have come across something like this.

But the point is that the local variable initialisation here is using ‘default initialisation’, as opposed to ‘zero initialisation’.

And if we change i from a local to a global variable the situation changes:

int i;
 
  int
 
  main(int argc, char* argv[])
 
  {
 
      assert(i == 0);
 
      return 0;
 
  }

This time the variable gets zero initialised, and the program runs through without assertion in both release and debug builds.

The reason for this is that global variables can be initialised in the linked binary for your program, at no cost (or else very cheaply at program startup), but local variables get instantiated on the program stack and initialising these explicitly to zero would add a bit of extra run time overhead to your program.

Since uninitialised data is a big potential source of error, many other (more modern) languages choose to always initialise data, but this inevitably adds some overhead, and part of the appeal of C++ is that it lets us get ‘close to the metal’ and avoid this kind of overhead.

Zero initialisation and ‘value’ classes

What’s less well known (I think) is that this can also apply to classes, in certain cases. This is something you’ll come across most commonly, I think, in the form of classes that are written to act as a kind of ‘value type’, and to behave in a similar way to the C++ built in types.

More specifically, it’s all about classes where internal state is not initialised in during class construction, and for which you could choose to omit the class default constructor.

In PathEngine we have a number of classes like this. One example looks something like this:

class cMeshElement
 
  {
 
  public:
 
      enum eType
 
      {
 
          FACE,
 
          EDGE,
 
          VERTEX,
 
      };
 
  
 
  //.. class methods
 
  
 
  private:
 
      eType _type;
 
      tSigned32 _index;
 
  };

Default construction of value classes

What should happen on default construction of a cMeshElement instance?

The safest thing to do will be to initialise _type and _index to some fixed, deterministic values, to eliminate the possibility of program execution being dependant on uninitialised data.

In PathEngine, however, we may need to set up some fairly large buffers with elements of this type. We don’t want to limit ourselves to only ever building these buffers through a purely iterator based paradigm (as discussed in my previous post), and sometimes want to just create big uninitialised vectors of cMeshElement type directly, without buffer initialisation overhead, so we leave the data members in this class uninitialised.

Empty default constructor or no default constructor?

So we don’t want to do anything on default construction.

There are two ways this can be implemented in our value type class. We can omit the class default constructor completely, or we can add an empty default constructor.

Omitting the constructor seems nice, insofar as avoids a bit of apparently unnecessary and extraneous code, but it turns out there’s some unexpected complexity in the rules for C++ object construction with respect to this choice, and to whether an object is being constructed with ‘zero initialisation’ or ‘default initialisation’.

Note that what the two terms refer to are actually two different sets of object construction semantics, with each defining a set of rules for what happens to memory during construction (depending on the exact construction situation), and ‘zero initialisation’ does not always result in an actual zero initialisation step.

We can test what happens in the context of our custom vector, and ‘value type’ elements, with the following code:

class cInitialisationReporter
 
  {
 
    int i;
 
  public:
 
    ~cInitialisationReporter()
 
    {
 
        std::cout << "cInitialisationReporter::i is " << i << 'n';
 
    }
 
  };
 
  
 
  class cInitialisationReporter2
 
  {
 
    int i;
 
  public:
 
    cInitialisationReporter2() {}
 
    ~cInitialisationReporter2()
 
    {
 
        std::cout << "cInitialisationReporter2::i is " << i << 'n';
 
    }
 
  };
 
  template <class T> void
 
  SetMemAndPlacementConstruct_ZeroInitialisation()
 
  {
 
    T* allocated = static_cast<T*>(malloc(sizeof(T)));
 
    signed char* asCharPtr = reinterpret_cast<signed char*>(allocated);
 
    for(int i = 0; i != sizeof(T); ++i)
 
    {
 
        asCharPtr[i] = -1;
 
    }
 
    new((void*)allocated) T();
 
    allocated->~T();
 
  }
 
  template <class T> void
 
  SetMemAndPlacementConstruct_DefaultInitialisation()
 
  {
 
    T* allocated = static_cast<T*>(malloc(sizeof(T)));
 
    signed char* asCharPtr = reinterpret_cast<signed char*>(allocated);
 
    for(int i = 0; i != sizeof(T); ++i)
 
    {
 
        asCharPtr[i] = -1;
 
    }
 
    new((void*)allocated) T;
 
    allocated->~T();
 
  }
 
  
 
  int
 
  main(int argc, char* argv[])
 
  {
 
    SetMemAndPlacementConstruct_ZeroInitialisation<cInitialisationReporter>();
 
    SetMemAndPlacementConstruct_ZeroInitialisation<cInitialisationReporter2>();
 
    SetMemAndPlacementConstruct_DefaultInitialisation<cInitialisationReporter>();
 
    SetMemAndPlacementConstruct_DefaultInitialisation<cInitialisationReporter2>();
 
    return 0;
 
  }

This gives the following results:

cInitialisationReporter::i is 0
 
  cInitialisationReporter2::i is -1
 
  cInitialisationReporter::i is -1
 
  cInitialisationReporter2::i is -1

In short:

  • If our vector uses ‘zero initialisation’ form (placement new with brackets), and the value type has default constructor omitted then the compiler will add code to zero element memory on construction.
  • If our vector uses ‘zero initialisation’ form (placement new with brackets), and the value type has an empty default then the compiler will leave element memory uninitialised on construction.
  • If the vector uses ‘default initialisation’ form (placement new without brackets), then the compiler will leave element memory uninitialised regardless of whether or not there is a default constructor.

Zero initialisation in std::vector

The std::vector implementations I’ve looked at also all perform ‘zero initialisation’ (and I assume this is then actually required by the standard). We can test this by supplying the following custom allocator:

template <class T>
 
  class cNonZeroedAllocator
 
  {
 
  public:
 
      typedef T value_type;
 
      typedef value_type* pointer;
 
      typedef const value_type* const_pointer;
 
      typedef value_type& reference;
 
      typedef const value_type& const_reference;
 
      typedef typename std::size_t size_type;
 
      typedef std::ptrdiff_t difference_type;
 
  
 
      template <class tTarget>
 
      struct rebind
 
      {
 
          typedef cNonZeroedAllocator<tTarget> other;
 
      };
 
  
 
      cNonZeroedAllocator() {}
 
      ~cNonZeroedAllocator() {}
 
      template <class T2>
 
      cNonZeroedAllocator(cNonZeroedAllocator<T2> const&)
 
      {
 
      }
 
  
 
      pointer
 
      address(reference ref)
 
      {
 
          return &ref;
 
      }
 
      const_pointer
 
      address(const_reference ref)
 
      {
 
          return &ref;
 
      }
 
  
 
      pointer
 
      allocate(size_type count, const void* = 0)
 
      {
 
          size_type byteSize = count * sizeof(T);
 
          void* result = malloc(byteSize);
 
          signed char* asCharPtr = reinterpret_cast<signed char*>(result);
 
          for(size_type i = 0; i != byteSize; ++i)
 
          {
 
              asCharPtr[i] = -1;
 
          }
 
          return reinterpret_cast<pointer>(result);
 
      }
 
      void deallocate(pointer ptr, size_type)
 
      {
 
          free(ptr);
 
      }
 
  
 
      size_type
 
      max_size() const
 
      {
 
          return 0xffffffffUL / sizeof(T);
 
      }
 
  
 
      void
 
      construct(pointer ptr, const T& t)
 
      {
 
          new(ptr) T(t);
 
      }
 
      void
 
      destroy(pointer ptr)
 
      {
 
          ptr->~T();
 
      }
 
  
 
      template <class T2> bool
 
      operator==(cNonZeroedAllocator<T2> const&) const
 
      {
 
          return true;
 
      }
 
      template <class T2> bool
 
      operator!=(cNonZeroedAllocator<T2> const&) const
 
      {
 
          return false;
 
      }
 
  };

Oh, by the way, did I mention that I don’t like STL allocators? (Not yet, I will in my next post!) This is a bog standard STL allocator with the allocate method hacked to set all the bytes in the allocated memory block to non-zero values. The important bit is the implementation of the allocate and deallocate methods. The rest is just boilerplate.

To apply this in our test code:

int
 
  main(int argc, char* argv[])
 
  {
 
    std::vector<cInitialisationReporter,
 
      cNonZeroedAllocator<cInitialisationReporter> > v1;
 
    v1.resize(1);
 
    std::vector<cInitialisationReporter2,
 
      cNonZeroedAllocator<cInitialisationReporter2> > v2;
 
    v2.resize(1);
 
    return 0;
 
  }

And this gives:

cInitialisationReporter::i is 0
 
  cInitialisationReporter2::i is -1

Class with no default constructor + std::vector = initialisation overhead

So if I implement a ‘value class’ without default constructor, and then construct an std::vector with elements of this type, then I get initialisation overhead. And this accounts for part of the speedups we saw when switching to a custom vector implementation (together with the corresponding issue for built in types).

But there’s a clear workaround for this issue, now, based on the above. To use std::vector, but avoid initialisation overhead for value type elements, we just need to make sure that each of our value type classes has an empty default constructor.

Extending to a wrapper for working around zero initialisation for built-in types

In the comments (commenting on the original version of this post!) Marek Knápek suggests using the following wrapper to avoid zero initialisation, in the context of built-in types:

template<typename T>
 
  // assuming T is int, short, long, std::uint64_t, ...
 
  // TODO: add static assert
 
  class MyInt{
 
  public:
 
  MyInt()
 
  // m_int is "garbage-initialized" here
 
  {}
 
  public:
 
  T m_int;
 
  };

And sure enough, this works (because of the empty default constructor in the wrapper class). But I really don’t like using this kind of wrapper in practice, as I think that this complicates (and slightly obfuscates!) each vector definition.

Using default initialisation semantics for our custom vector avoids the need for this kind of workaround. And, more generally, if we take each of the possible construction semantics on their merits (ignoring the fact that one of these is the behaviour of the standard vector implementation), I prefer ‘default initialisation’ semantics, since:

  • these semantics seem more consistent and avoid surprises based on whether or not an empty default constructor is included in a class, and
  • value type classes shouldn’t depend on zero initialisation, anyway (since they may be instantiated as local variables)

Type specialisation

One thing to be aware of, with this workaround, is that it looks like there can be implications for type specialisation, depending on your compiler version.

As I understand the type traits system in C++11, and the is_trivially_default_constructible method, this should return the same value for both cInitialisationReporter and cInitialisationReporter2, but when I try the following (with clang 3.2.1):

  cout
 
      << "is_trivially_default_constructible<cInitialisationReporter>: "
 
      << is_trivially_default_constructible<cInitialisationReporter>::value
 
      << 'n';
 
    cout
 
      << "is_trivially_default_constructible<cInitialisationReporter2>: "
 
      << is_trivially_default_constructible<cInitialisationReporter2>::value
 
      << 'n';

I get:

error: no template named 'is_trivially_default_constructible' in namespace 'std'; did you mean 'has_trivial_default_constructor'?

and then when I try with ‘has_trivial_default_constructor’:

  cout
 
      << "has_trivial_default_constructor<cInitialisationReporter>: "
 
      << has_trivial_default_constructor<cInitialisationReporter>::value
 
      << 'n';
 
    cout
 
      << "has_trivial_default_constructor<cInitialisationReporter2>: "
 
      << has_trivial_default_constructor<cInitialisationReporter2>::value
 
      << 'n';

I get:

has_trivial_default_constructor<cInitialisationReporter>: 1
 
  has_trivial_default_constructor<cInitialisationReporter2>: 0

This doesn’t matter for PathEngine since we still use an ‘old school’ type specialisation setup (to support older compilers), but could be something to look out for, nevertheless.

Conclusion

The overhead for zero initialisation in std::vector is something that has been an issue for us historically but it turns out that for std::vector of value type classes, zero initialisation can be avoided, without resorting to a custom vector implementation.

It’s interesting to see the implications of this kind of implementation detail. Watch out how you implement ‘value type’ classes if they’re going to be used as elements in large buffers, and maximum performance is desired!

Video game research sucks. Here’s how you can help.

Original Author: Joost van Dreunen

At moments that an industry undergoes a fundamental change, market research is most valuable. Retail game sales are down. Digital sales are way up. Mobile and tablet have everyone excited, and equally clueless. Today more people than ever before play games. The industry is now a mosaic of new platforms (both hardware and software-based), and offers a slew of new distribution methods and revenue models. For the first time, the games industry is a truly a global market, as even markets that historically were inaccessible to publishers have now become viable. Opportunity abound, but where do we begin?

Despite these massive changes, companies that traditionally provide publishers with market insight have been acting like it’s business as usual. Retail sales continue to be the focal point among the major firms, leading them to the persistent observation that the industry is in decline. As an industry analyst myself, it’s embarrassing to see how companies that get paid to help others understand gaming markets have so far failed to innovate or even update their own model. Same shit, different day.

Renewal and innovation in the games industry is a good thing: it allows new companies to enter the market and forces existing publishers, developers and distributors to adapt. The shift toward online, free-to-play, downloadable content and mobile gaming has allowed the games industry to finally become a mainstream form of entertainment. No longer dominated by a single homogenous demographic, the market today is more diverse and offers more opportunity for a wider range of interactive entertainment. It is a clear sign that games are maturing. As we, collectively as an industry, enter this next stage, each of us deals with these macro-level changes in their own way. And, just like platform holders, publishers and developers, the research that helps guide decisions and makes sense of what’s happening in the market needs to change, too.

What’s the problem?

The questions that we confront today are very different from a few years ago. To answer them, researchers have to overcome at least three key challenges: access, fragmentation and expertise.

Getting access to key industry data is much more difficult. Different from the retail-based games market, where retailers share sales figures allowing an assessment of title performance and competitive information, the digital era knows few leading companies that provide access. Facebook, Apple and Steam are all very important hubs in the games industry, and so far they have shown little interest in providing market transparency. Their argument is fairly simple: there is no advantage for them in it. In fact, their digital data sets are a key asset in their respective strategies, and offering it up would merely bolster their competition.

The games industry value chain is more fragmented than ever before. Previously, there existed only three console giants and a (much smaller) PC games market. Today there exists a host of digital platforms like downloadable content, free-to-play browser-based MMOs, social network-based games, and mobile games across a host of technical standards and markets. The challenge facing video game market research today is not the absence of data, but rather the abundance of specialized, partial data sets. There are plenty of solutions that offer daily traffic estimates for social games, or approximate downloads for the various mobile app stores. Others rely on traditional survey data, casting a wide net a few times a year across a range of topics in an effort to map out the gaming landscape. Relying on a single methodology or on a specialized data set makes it difficult to describe the games industry, as it presents an incomplete picture of user behavior and preferences. For example, overall traffic for a particular title or genre may show a month-over-month increase. Without understanding why this is the case, however, this information provides only partial insight. The reliance on a single data source or, worse, a skewed data source, greatly reduces one’s ability to describe the market as a whole.

In an industry that is increasingly information-driven, knowing how to interpret that information is crucial. As entertainment markets go, the games industry is very complex. Sure enough, most of us are aware that it is a hit-driven business, but what does that mean in practical terms? Well, for market researchers it means that you cannot just hire anyone off the street to provide insight and analyses. Rather, it takes years to fully understand how the games business works. Because of this, there are few true experts on the topic. The quality of your market research is inversely related to whatever number ‘some guy’ heard from ‘a friend’. Sure enough, there exists a host of small consulting outfits, manned by people who previously held a senior position at some well-known publisher. But understanding the industry mechanics, from financial analysis to consumer insight, is a skill only mastered over time. Drawing insight from the available data generated by digital games is as much an art as it is a science, and demands a dedicated effort and expertise.

How do we solve this?

plea at GDC this year to eliminate social injustice from the games industry, by making games geared to minorities, and received a standing ovation. Market research helps to develop, publish and make available interactive experiences that resonate with all gamers out there, not just a subset. Better still, knowing the market keeps companies in business even when they have a bad run. To operate alone in a volatile industry like interactive entertainment means you increase the risk. Ain’t nobody got time that for that.

The only way for the games industry to get the research it so desperately needs is through building relationships and working together. Four years ago, I started asking companies to help me compile a dataset on digital games: in exchange for market insight, we ask that companies share their data. This has proven very effective: today we have access to the monthly spending of over four million paying online gamers from 50 different publishers across 450 different titles, worldwide. Having an open conversation with publishers, developers, service providers and investors has allowed us to build a comprehensive methodology that provides critical market information, title-specific revenue estimates, and deep consumer insight. To build in a redundancy and ensure we’re not misreading any trends, we combine this quantitative dataset with regular qualitative research studies. This gives us multiple touch points in describing the market, assessing market share and figuring out what truly drives the market.

  • Case in point 1: The emergence of new business models like free-to-play demands expertise. By sharing data, even if it is anonymously, we’re able to get a better sense of what works well in the market, and how we can improve our games. This helps reduce the cost of figuring out how to best serve customers and, ideally, reduce the amount of garbage free-to-play games aimed solely at nickel-and-diming users, leaving more money to develop richer experiences.
  • Case in point 2: Different markets have different preferences. So, too, when it comes to how people like to pay for their games. Having a complete picture of the different payment options that users look for when they’re ready to send you money is a key piece to successful digital publishing. Just because your game only has a modest audience base, does not mean your efforts are not sustainable.
  • Case in point 3: The cost of user acquisition on Facebook, Android and iOS has increased tremendously in 2013. By sharing this information anonymously, we were able to provide contributors with detailed insight on when to spend, and when not to spend, their marketing budgets.

And so to repeat my earlier point: at moments that an industry undergoes a fundamental change, market research is most valuable. Working together helps build better games. So if you, too, feel that video games research sucks, shoot me a line.

joost at superdataresearch dot com

Unity3D Resumable Downloads

Original Author: Gabor Szauer

The Problem

I keep working with mobile games built using Unity, most of which have a target executable size of 50MB. These games however  tend to have more than 50MB worth of assets, so they pack content up into asset bundles, and download them later using Unity’s WWW class. At this point these games all run into the same problem. On a flaky mobile network downloading a 2MB file can be a challenge, often the download is disrupted or worse yet the player walks into an underground tunnel (or something like that).

The Solution

Abandon the built in WWW class to load it up as an asset bundle. It sounds simple but there are a few issues we will have to deal with. First, if you want a download to be resumable you have to make sure that the file you finish downloading is the same file you started downloading. You also need to know how many bytes you are trying to download, if this information isn’t available you can’t really resume your download. Then there is the issue of coming up with a simple interface of doing a complex task. I’ll try to tackle all these issues in this article.

Why HttpWebRequest?

The short answer is HttpWebRequest does not. There is also the option of using third party libraries for this functionality, but most of them are built on top of the same methods we are going to use, they offer really little added value.

Birds Eye View

We’re going to create a class to handle the loading of an asset for us. We’ll call this class RemoteFile. Ideally you should be able to create a remote file, retrieve information from it and trigger a download if needed. This is the target API:

Simple but powerful. Our class is going to be responsible for downloading files as well as comparing to local files to see if an update is needed. Everything load in the background, not blocking the application. The actual class will look like:

What are those variables?

  • mRemoteFile
    • String that stores the url of the remote file to be downloaded
  • mLocalFile
    • String that stores the uri of the local copy, that remote file is to be downloaded to
  • mRemoteLastModified
    • Timestamp of when the remote file was last modified
    • Used to check if the file needs to be re-dowloaded
  • mLocalLastModified
    • Timestamp of when the local file was last modified
    • Used to check if the file needs to be re-downloaded
  • mRemoteFileSize
    • Size (in bytes) of the remote file
    • Needed for the AddRange method
  • mAsynchResponse
    • HttpWebResponse response token
    • Used to signal when download finishes
    • Used to get file stream that gets written to disk
  • mBundle
    • Convenient access to the local copy of the file (If it’s an assetbundle)

Implementation Details

Lets go trough every method and accessor in this class one at a time.

LocalBundle

LocalBundle is a convenient accessor for the mBundle variable

IsOutdated

IsOutdated is an accessor that compares the last modified time of the server file to the last modified time of the local file. If the server was modified after the local file, our local copy is outdated and needs to be re-downloaded. If the server does not support returning this data (more on that in the constructor) then IsOutdated will always be true.

Constructor

The Constructor takes two strings, the url of the remote file to load, and the path of the local file where the remote file should be saved. The Constructor is responsible for getting the last modified times for the remote and local file, it also gets the size in bytes of the remote file. We get information about the remote file trough an HttpWebRequest with it’s content type set to HEAD. Setting the content type to HEAD ensures that we only get header data, not the whole file.

Setting the requests method to HEAD ensures we don’t get the full file, only header data. Because this is such a small amount of data we retrieve it with a blocking function. Beware, not all servers support the HEAD tag

The only return data that we will use from here is the files size, and last modified time. Much like the HEAD tag there is no guarantee that every server will support providing this information. If not available, the current time will be returned for last modified and zero will be returned for the file size.

Not all hosts support the HEAD command. If your host does not, it will serve up the full file and defeat the purpose of this function. Some hosts like Dropbox will support the HEAD command but will serve up invalid last modified times. If this happens IsOurdated will always be true and your file will always be downloaded, defeating the purpose of this class.

It is your responsibility as the programmer to run this code against your production server and make sure that all data is being returned as expected (HEAD is supported, and the correct last modified time gets returned.

Download

The Download coroutine will download the desired file using an BeginGetResponse takes two arguments, a callback which in our case its the AsynchCallback method and a user object, which will be the response it’s self. Once the download is compleate we just write the result of the stream to disk.

If the local file is large than the remote file we panic and delete the local file. If the remote file was modified more recently than the local file, the local copy is assumed outdated and is deleted.

A few things happen here, first a timeout must be set. If the timeout is not set, the download will just hang. We provide the request with a range of bytes to download. While the documentation claims that given a single argument (the size of the local file) it should download the entire file from that starting point. In my experience this isn’t the case, if a start and end byte is not provided the download just hangs. The method of the request is set to POST, this is because the previous calls set some body data and we can no longer send a get request. We next call BeginResponse, which will trigger the AsynchCallback method. Until AsynchCallback is called mAsynchResponse will be null, so we can yield on this.

Finally we write the downloaded file stream to disk and do some cleanup. It’s important remember that this is a coroutine. If called from another class be sure to call it as such. While IsOutdated is  public, you don’t need to worry about checking it. The Download function will early out and not download if the file is already up to date.

AsynchCallback

The AsynchCallback Gets called from the BeginResponse method when it’s ready. The request object was passed into the BeginResponse method, it will be forwarded to the AsynchCallback callback. From there we can get the HttpWebResponse and let the Download function finish executing.

LoadLocalBundle

LoadLocalBundle is a utility function to load the asset bundle we downloaded from disk. The only thing to note here is that in order to load the file from disk with the WWW class we must add file:// to the beginning.

UnloadLocalBundle

This is the counter to LoadLocalBundle. Managing memory is important, you do not want to have an asset bundle hanging around if it is not absolutely necessary.

Testing

The last thing to do is to test the code, i made a quick utility class for this

My example project is available on github

What next?

It may seem like the hard part is out of the way, but really this was the easy part. It’s up to you to come up with an efficient strategy for managing asset bundles. The real work is in making the system that manages bundles and the memory associated with them. To keep memory sane, download a bundle, load it into memory, instantiate any needed game objects and unload the bundle.

Challenges with projects and kids

Original Author: Colleen Delzer

10 years ago my husband and I decided we wanted to make games. We had tons of time to learn anything we needed to even with both of us having jobs.

I look back and wish we didn’t take it for granted. We made lots of practice projects that never made it past a level or two. Everything was very casual, work on an art piece one week, work on something else a month later.

Fast forward 8 years and we still didn’t have anything finished, just a project we worked on for a few years. We were on a roll with our first game Nono and the Cursed Treasure when we found out we were going to have a child. We tried to get as much stuff done as possible before she was born, but being pregnant with nausea and fatigue made it hard to accomplish much…but I could had tried harder. Our first daughter was born and it was nearly impossible to do much with a baby and we really didn’t schedule ourselves like we should have.

When she turned a year and she was able to walk, it got a lot better and we could start really pacing ourselves. Soon after that we were unexpectedly

going to have another baby. It was extremely daunting to think of the lack of time we were going to have. Would we have to just stop thinking of the possibility of even finishing our game we already spent too much time on? Being pregnant and having a toddler was already hard, and it got a lot more challenging when our second was born, top it off with our almost 2 year old getting into the tantrum stage being extremely jealous of the newborn.
girls

Our second daughter is now 7 months old and we are still able to create our projects. It can be challenging at times, but we manage. Having a schedule between us have helped immensely. We alternate days to work on things while the other looks after the kids and we keep one day for family day.

Our first game is just about finished and we are also in the works of a new project that is going well. We would have never thought it was possible but our lives evolve to manage new challenges and we get used to the changes. Things take a lot more time and we learn to take advantage of every second of free time we have. We think of all the things we could get done with the free time that we once had, but we couldn’t imagine our lives without the girls.

We have learned to not waste time we have when we have it because you never know what will happen in the future and believe me it goes faster than you think! At the same time, make sure to take some time to spend it with people you care about and not worry too much. Think positive even though it can be hard sometimes!

Keyboard shortcuts info in debug builds – IMGUI style

Original Author: Hubert Rutkowski

I want to share a little discovery, a trick easing the creation and development of games. First a short motivational and historical introduction.

Case study

I don’t know if you have it too, but when I’m creating a game, I often add to it a lot of keyboard shortcuts – even a few dozens. Player avatar, enemies, map, camera, physics, rules etc. Shortcuts use normal letters, digits, F keys, special keys (tab, home, backspace), and their combinations with ctrl, shift, alt. Sometimes it’s hard to figure out what is available and what does what.

Especially I had problem with that in Ninja Cat, where shortcut availability depended on compilation mode (CONFIG:: variables, you can treat them like #defines) – debug+editor had all of them, debug without editor had a part of that, editor without debug had yet another part etc. It was very useful, because sometimes having too much shortcuts and special keys was bad – ie. during testing, when pressing some keys would change gameplay.

But I didn’t want to have it as binary – either all, or none. Sometimes the ability to intervene meant beaing able to catch a bug on the hot spot (perhaps Heisenbug?), and do something about it. So it was nice having it at such a granular level (debug/release, editor, deploy, profile). The disadvantage was that often I forgot what the actual shortctus were, what could I do at this moment. I was pressing shortcuts and they didn’t work, or I was forgeting about some magic functionality, hidden in not so visible code places.

Solution

Ok, now I’ll explain how to fix this situation. The easiest is to just display a list of all shortcuts, upon pressing some key like F1 or H(elp). But nobody would be able to create such list and maintain it over time. Also, looking at huge list of shortcuts when you have available only let’s say, a third of them, is not that really helpful. The solution? Do it in a automated way.

I’ve discovered something similiar to Immediate Mode GUI (

Simple and obvious. Now let’s add something:

if (CONFIG::debug)
 
  {
 
      if (Game.info_keyboard_shortcuts_show)
 
      {
 
          Game.info_keyboard_shortcuts_list += "SHIFT+D = kill all enemiesn";
 
      }
 
  
 
      if (FlxG.keys.SHIFT && FlxG.keys.justPressed("D"))
 
      ...
 
      // code from above
 
  }
 
  
 
  // in singleton Game:
 
  public static var info_keyboard_shortcuts_show : Boolean = false;
 
  public static var info_keyboard_shortcuts_list : String;
 
  
 
  

Because the code is in if corresponding to compilation mode, its body will be executed only in that one specific mode. In this way, only the available shortcuts will be added to string and displayed.

Implementing the rest is trivial:

// once a frame call this function which checks whether you want to see the shortcuts list.
 
  // In such a case, display the list, collected in a simple string over the course of a frame
 
  
 
  static public function ShowKeyboardShortcuts():void
 
  {
 
      info_keyboard_shortcuts_show = FlxG.keys.pressed("H");
 
  
 
      if (info_keyboard_shortcuts_show)
 
      {
 
           var shortcuts_list : FlxText = new FlxText(200, 10, 350, info_keyboard_shortcuts_list);
 
           shortcuts_list.shadow = 0x000001;
 
           shortcuts_list.draw();
 
  
 
           info_keyboard_shortcuts_list = "";
 
      }
 
  }

That’s how it looks in practice:

key short info imgui

The text is displayed after holding H. So far I’ve added only few shortcuts.

Additions

1. The compilation modes can be combined, as well as mixed with runtime modes, ie.

if (CONFIG::debug && CONFIG::profile &&
 
      Game.is_profiling_state && Camera.is_some_other_state)
 
  {
 
        // some very specific shortcut
 
  }
 
  

2. My friend where I posted first about this technique; warning: polish language), how to make it even faster and less verbose. This:

if (CONFIG::debug)
 
  {
 
      if (Game.info_keyboard_shortcuts_show)
 
      {
 
          Game.info_keyboard_shortcuts_list += "SHIFT+D = kill all enemiesn";
 
      }
 
  
 
      if (FlxG.keys.SHIFT && FlxG.keys.justPressed("D"))
 
      ...
 
  }

should be quite easy to reduce to:

IF_DEBUG_KEY( SHIFT+'D', "kill all enemies" )
 
  {
 
     ... code ...
 
  }

Where IF_DEBUG_KEY does everything inside: testing compilation mode, checking whether to display shortcut info, testing whether such key combination is pressed, removing code in release build.

Summary

Little work, simple and flexible code, works everywhere (language agnostic), fixes specific problem. Maintainibility is high, because code handling the shortcuts is just next to the code displaying shortcut info. I recommend to use it to your bag of tricks / current game 🙂