Zero Initialisation for Classes

Original Author: Thomas Young

(Also posted to series of posts about Vectors and Vector based containers.)

This is a response to comments on a previous post, roll your own vector, and has also been rewritten and updated fairly significantly since first posted.

In roll your own vector I talked about a change we made to the initialisation semantics for PathEngine’s custom vector class. In my first followup post I looked more closely at possibilities for replacing resize() with reserve() (which can avoid the initialisation issue in many cases), but so far I’m been concentrating pretty much exclusively on zero initialisation for built-in types. In this post I come back to look at the issue of initialisation semantics for class element types.

Placement new subtleties

At it’s root the changed initialisation semantics for our vector all come down to a single (quite subtle) change in the way we write one of the placement new expressions.

It’s all about the placement new call for element default construction. This is required when elements need to be initialised, but no value is provided for initialisation by the calling code, for example in a call to vector resize() with no fill value argument.

As shown in my previous post, the standard way to implement this placement new is with the following syntax:

       new((void*)begin) T();

but we chose to replace this with the following, subtly different placement new syntax:

       new((void*)begin) T;

So we left out a pair of brackets.

Note that this missing pair of brackets is what I’m talking about when I refer to ‘changed initialisation semantics’. (Our custom vector class does not omit initialisation completely!)

What those brackets do

So what do those brackets do, and what happens when we remove them?

Well, this is all about ‘zero initialisation’.

In certain cases the memory for the object of type T being constructed will get zero initialised in the first version of the placement new call (‘new((void*)begin) T()’), but not in the second version (‘new((void*)begin) T’).

You can see find these two initialisation types documented on cppreference.com, in this stackoverflow answer, as well as in the related links.

This makes a difference during element construction for built in types, (as we saw with the buffer initialisation overhead in my previous post), but also for certain types classes and structs, and this is what I’ll be looking at in this post.

Initialisation of built in types

It’s quite well known that initialisation for built-in types works differently for global variables (which are usually created as part of the program’s address space) and local variables (which are allocated on the program stack).

If we start with the following:

int
 
  main(int argc, char* argv[])
 
  {
 
      int i;
 
      assert(i == 0);
 
      return 0;
 
  }

This runs through quite happily with the debug build, but if I turn assertions on in the release build then this assertion gets triggered. That’s not really surprising. This kind of uninitialised local variable is a well known gotcha and I think most people with a reasonable amount of experience in C++ have come across something like this.

But the point is that the local variable initialisation here is using ‘default initialisation’, as opposed to ‘zero initialisation’.

And if we change i from a local to a global variable the situation changes:

int i;
 
  int
 
  main(int argc, char* argv[])
 
  {
 
      assert(i == 0);
 
      return 0;
 
  }

This time the variable gets zero initialised, and the program runs through without assertion in both release and debug builds.

The reason for this is that global variables can be initialised in the linked binary for your program, at no cost (or else very cheaply at program startup), but local variables get instantiated on the program stack and initialising these explicitly to zero would add a bit of extra run time overhead to your program.

Since uninitialised data is a big potential source of error, many other (more modern) languages choose to always initialise data, but this inevitably adds some overhead, and part of the appeal of C++ is that it lets us get ‘close to the metal’ and avoid this kind of overhead.

Zero initialisation and ‘value’ classes

What’s less well known (I think) is that this can also apply to classes, in certain cases. This is something you’ll come across most commonly, I think, in the form of classes that are written to act as a kind of ‘value type’, and to behave in a similar way to the C++ built in types.

More specifically, it’s all about classes where internal state is not initialised in during class construction, and for which you could choose to omit the class default constructor.

In PathEngine we have a number of classes like this. One example looks something like this:

class cMeshElement
 
  {
 
  public:
 
      enum eType
 
      {
 
          FACE,
 
          EDGE,
 
          VERTEX,
 
      };
 
  
 
  //.. class methods
 
  
 
  private:
 
      eType _type;
 
      tSigned32 _index;
 
  };

Default construction of value classes

What should happen on default construction of a cMeshElement instance?

The safest thing to do will be to initialise _type and _index to some fixed, deterministic values, to eliminate the possibility of program execution being dependant on uninitialised data.

In PathEngine, however, we may need to set up some fairly large buffers with elements of this type. We don’t want to limit ourselves to only ever building these buffers through a purely iterator based paradigm (as discussed in my previous post), and sometimes want to just create big uninitialised vectors of cMeshElement type directly, without buffer initialisation overhead, so we leave the data members in this class uninitialised.

Empty default constructor or no default constructor?

So we don’t want to do anything on default construction.

There are two ways this can be implemented in our value type class. We can omit the class default constructor completely, or we can add an empty default constructor.

Omitting the constructor seems nice, insofar as avoids a bit of apparently unnecessary and extraneous code, but it turns out there’s some unexpected complexity in the rules for C++ object construction with respect to this choice, and to whether an object is being constructed with ‘zero initialisation’ or ‘default initialisation’.

Note that what the two terms refer to are actually two different sets of object construction semantics, with each defining a set of rules for what happens to memory during construction (depending on the exact construction situation), and ‘zero initialisation’ does not always result in an actual zero initialisation step.

We can test what happens in the context of our custom vector, and ‘value type’ elements, with the following code:

class cInitialisationReporter
 
  {
 
    int i;
 
  public:
 
    ~cInitialisationReporter()
 
    {
 
        std::cout << "cInitialisationReporter::i is " << i << 'n';
 
    }
 
  };
 
  
 
  class cInitialisationReporter2
 
  {
 
    int i;
 
  public:
 
    cInitialisationReporter2() {}
 
    ~cInitialisationReporter2()
 
    {
 
        std::cout << "cInitialisationReporter2::i is " << i << 'n';
 
    }
 
  };
 
  template <class T> void
 
  SetMemAndPlacementConstruct_ZeroInitialisation()
 
  {
 
    T* allocated = static_cast<T*>(malloc(sizeof(T)));
 
    signed char* asCharPtr = reinterpret_cast<signed char*>(allocated);
 
    for(int i = 0; i != sizeof(T); ++i)
 
    {
 
        asCharPtr[i] = -1;
 
    }
 
    new((void*)allocated) T();
 
    allocated->~T();
 
  }
 
  template <class T> void
 
  SetMemAndPlacementConstruct_DefaultInitialisation()
 
  {
 
    T* allocated = static_cast<T*>(malloc(sizeof(T)));
 
    signed char* asCharPtr = reinterpret_cast<signed char*>(allocated);
 
    for(int i = 0; i != sizeof(T); ++i)
 
    {
 
        asCharPtr[i] = -1;
 
    }
 
    new((void*)allocated) T;
 
    allocated->~T();
 
  }
 
  
 
  int
 
  main(int argc, char* argv[])
 
  {
 
    SetMemAndPlacementConstruct_ZeroInitialisation<cInitialisationReporter>();
 
    SetMemAndPlacementConstruct_ZeroInitialisation<cInitialisationReporter2>();
 
    SetMemAndPlacementConstruct_DefaultInitialisation<cInitialisationReporter>();
 
    SetMemAndPlacementConstruct_DefaultInitialisation<cInitialisationReporter2>();
 
    return 0;
 
  }

This gives the following results:

cInitialisationReporter::i is 0
 
  cInitialisationReporter2::i is -1
 
  cInitialisationReporter::i is -1
 
  cInitialisationReporter2::i is -1

In short:

  • If our vector uses ‘zero initialisation’ form (placement new with brackets), and the value type has default constructor omitted then the compiler will add code to zero element memory on construction.
  • If our vector uses ‘zero initialisation’ form (placement new with brackets), and the value type has an empty default then the compiler will leave element memory uninitialised on construction.
  • If the vector uses ‘default initialisation’ form (placement new without brackets), then the compiler will leave element memory uninitialised regardless of whether or not there is a default constructor.

Zero initialisation in std::vector

The std::vector implementations I’ve looked at also all perform ‘zero initialisation’ (and I assume this is then actually required by the standard). We can test this by supplying the following custom allocator:

template <class T>
 
  class cNonZeroedAllocator
 
  {
 
  public:
 
      typedef T value_type;
 
      typedef value_type* pointer;
 
      typedef const value_type* const_pointer;
 
      typedef value_type& reference;
 
      typedef const value_type& const_reference;
 
      typedef typename std::size_t size_type;
 
      typedef std::ptrdiff_t difference_type;
 
  
 
      template <class tTarget>
 
      struct rebind
 
      {
 
          typedef cNonZeroedAllocator<tTarget> other;
 
      };
 
  
 
      cNonZeroedAllocator() {}
 
      ~cNonZeroedAllocator() {}
 
      template <class T2>
 
      cNonZeroedAllocator(cNonZeroedAllocator<T2> const&)
 
      {
 
      }
 
  
 
      pointer
 
      address(reference ref)
 
      {
 
          return &ref;
 
      }
 
      const_pointer
 
      address(const_reference ref)
 
      {
 
          return &ref;
 
      }
 
  
 
      pointer
 
      allocate(size_type count, const void* = 0)
 
      {
 
          size_type byteSize = count * sizeof(T);
 
          void* result = malloc(byteSize);
 
          signed char* asCharPtr = reinterpret_cast<signed char*>(result);
 
          for(size_type i = 0; i != byteSize; ++i)
 
          {
 
              asCharPtr[i] = -1;
 
          }
 
          return reinterpret_cast<pointer>(result);
 
      }
 
      void deallocate(pointer ptr, size_type)
 
      {
 
          free(ptr);
 
      }
 
  
 
      size_type
 
      max_size() const
 
      {
 
          return 0xffffffffUL / sizeof(T);
 
      }
 
  
 
      void
 
      construct(pointer ptr, const T& t)
 
      {
 
          new(ptr) T(t);
 
      }
 
      void
 
      destroy(pointer ptr)
 
      {
 
          ptr->~T();
 
      }
 
  
 
      template <class T2> bool
 
      operator==(cNonZeroedAllocator<T2> const&) const
 
      {
 
          return true;
 
      }
 
      template <class T2> bool
 
      operator!=(cNonZeroedAllocator<T2> const&) const
 
      {
 
          return false;
 
      }
 
  };

Oh, by the way, did I mention that I don’t like STL allocators? (Not yet, I will in my next post!) This is a bog standard STL allocator with the allocate method hacked to set all the bytes in the allocated memory block to non-zero values. The important bit is the implementation of the allocate and deallocate methods. The rest is just boilerplate.

To apply this in our test code:

int
 
  main(int argc, char* argv[])
 
  {
 
    std::vector<cInitialisationReporter,
 
      cNonZeroedAllocator<cInitialisationReporter> > v1;
 
    v1.resize(1);
 
    std::vector<cInitialisationReporter2,
 
      cNonZeroedAllocator<cInitialisationReporter2> > v2;
 
    v2.resize(1);
 
    return 0;
 
  }

And this gives:

cInitialisationReporter::i is 0
 
  cInitialisationReporter2::i is -1

Class with no default constructor + std::vector = initialisation overhead

So if I implement a ‘value class’ without default constructor, and then construct an std::vector with elements of this type, then I get initialisation overhead. And this accounts for part of the speedups we saw when switching to a custom vector implementation (together with the corresponding issue for built in types).

But there’s a clear workaround for this issue, now, based on the above. To use std::vector, but avoid initialisation overhead for value type elements, we just need to make sure that each of our value type classes has an empty default constructor.

Extending to a wrapper for working around zero initialisation for built-in types

In the comments (commenting on the original version of this post!) Marek Knápek suggests using the following wrapper to avoid zero initialisation, in the context of built-in types:

template<typename T>
 
  // assuming T is int, short, long, std::uint64_t, ...
 
  // TODO: add static assert
 
  class MyInt{
 
  public:
 
  MyInt()
 
  // m_int is "garbage-initialized" here
 
  {}
 
  public:
 
  T m_int;
 
  };

And sure enough, this works (because of the empty default constructor in the wrapper class). But I really don’t like using this kind of wrapper in practice, as I think that this complicates (and slightly obfuscates!) each vector definition.

Using default initialisation semantics for our custom vector avoids the need for this kind of workaround. And, more generally, if we take each of the possible construction semantics on their merits (ignoring the fact that one of these is the behaviour of the standard vector implementation), I prefer ‘default initialisation’ semantics, since:

  • these semantics seem more consistent and avoid surprises based on whether or not an empty default constructor is included in a class, and
  • value type classes shouldn’t depend on zero initialisation, anyway (since they may be instantiated as local variables)

Type specialisation

One thing to be aware of, with this workaround, is that it looks like there can be implications for type specialisation, depending on your compiler version.

As I understand the type traits system in C++11, and the is_trivially_default_constructible method, this should return the same value for both cInitialisationReporter and cInitialisationReporter2, but when I try the following (with clang 3.2.1):

  cout
 
      << "is_trivially_default_constructible<cInitialisationReporter>: "
 
      << is_trivially_default_constructible<cInitialisationReporter>::value
 
      << 'n';
 
    cout
 
      << "is_trivially_default_constructible<cInitialisationReporter2>: "
 
      << is_trivially_default_constructible<cInitialisationReporter2>::value
 
      << 'n';

I get:

error: no template named 'is_trivially_default_constructible' in namespace 'std'; did you mean 'has_trivial_default_constructor'?

and then when I try with ‘has_trivial_default_constructor’:

  cout
 
      << "has_trivial_default_constructor<cInitialisationReporter>: "
 
      << has_trivial_default_constructor<cInitialisationReporter>::value
 
      << 'n';
 
    cout
 
      << "has_trivial_default_constructor<cInitialisationReporter2>: "
 
      << has_trivial_default_constructor<cInitialisationReporter2>::value
 
      << 'n';

I get:

has_trivial_default_constructor<cInitialisationReporter>: 1
 
  has_trivial_default_constructor<cInitialisationReporter2>: 0

This doesn’t matter for PathEngine since we still use an ‘old school’ type specialisation setup (to support older compilers), but could be something to look out for, nevertheless.

Conclusion

The overhead for zero initialisation in std::vector is something that has been an issue for us historically but it turns out that for std::vector of value type classes, zero initialisation can be avoided, without resorting to a custom vector implementation.

It’s interesting to see the implications of this kind of implementation detail. Watch out how you implement ‘value type’ classes if they’re going to be used as elements in large buffers, and maximum performance is desired!

Video game research sucks. Here’s how you can help.

Original Author: Joost van Dreunen

At moments that an industry undergoes a fundamental change, market research is most valuable. Retail game sales are down. Digital sales are way up. Mobile and tablet have everyone excited, and equally clueless. Today more people than ever before play games. The industry is now a mosaic of new platforms (both hardware and software-based), and offers a slew of new distribution methods and revenue models. For the first time, the games industry is a truly a global market, as even markets that historically were inaccessible to publishers have now become viable. Opportunity abound, but where do we begin?

Despite these massive changes, companies that traditionally provide publishers with market insight have been acting like it’s business as usual. Retail sales continue to be the focal point among the major firms, leading them to the persistent observation that the industry is in decline. As an industry analyst myself, it’s embarrassing to see how companies that get paid to help others understand gaming markets have so far failed to innovate or even update their own model. Same shit, different day.

Renewal and innovation in the games industry is a good thing: it allows new companies to enter the market and forces existing publishers, developers and distributors to adapt. The shift toward online, free-to-play, downloadable content and mobile gaming has allowed the games industry to finally become a mainstream form of entertainment. No longer dominated by a single homogenous demographic, the market today is more diverse and offers more opportunity for a wider range of interactive entertainment. It is a clear sign that games are maturing. As we, collectively as an industry, enter this next stage, each of us deals with these macro-level changes in their own way. And, just like platform holders, publishers and developers, the research that helps guide decisions and makes sense of what’s happening in the market needs to change, too.

What’s the problem?

The questions that we confront today are very different from a few years ago. To answer them, researchers have to overcome at least three key challenges: access, fragmentation and expertise.

Getting access to key industry data is much more difficult. Different from the retail-based games market, where retailers share sales figures allowing an assessment of title performance and competitive information, the digital era knows few leading companies that provide access. Facebook, Apple and Steam are all very important hubs in the games industry, and so far they have shown little interest in providing market transparency. Their argument is fairly simple: there is no advantage for them in it. In fact, their digital data sets are a key asset in their respective strategies, and offering it up would merely bolster their competition.

The games industry value chain is more fragmented than ever before. Previously, there existed only three console giants and a (much smaller) PC games market. Today there exists a host of digital platforms like downloadable content, free-to-play browser-based MMOs, social network-based games, and mobile games across a host of technical standards and markets. The challenge facing video game market research today is not the absence of data, but rather the abundance of specialized, partial data sets. There are plenty of solutions that offer daily traffic estimates for social games, or approximate downloads for the various mobile app stores. Others rely on traditional survey data, casting a wide net a few times a year across a range of topics in an effort to map out the gaming landscape. Relying on a single methodology or on a specialized data set makes it difficult to describe the games industry, as it presents an incomplete picture of user behavior and preferences. For example, overall traffic for a particular title or genre may show a month-over-month increase. Without understanding why this is the case, however, this information provides only partial insight. The reliance on a single data source or, worse, a skewed data source, greatly reduces one’s ability to describe the market as a whole.

In an industry that is increasingly information-driven, knowing how to interpret that information is crucial. As entertainment markets go, the games industry is very complex. Sure enough, most of us are aware that it is a hit-driven business, but what does that mean in practical terms? Well, for market researchers it means that you cannot just hire anyone off the street to provide insight and analyses. Rather, it takes years to fully understand how the games business works. Because of this, there are few true experts on the topic. The quality of your market research is inversely related to whatever number ‘some guy’ heard from ‘a friend’. Sure enough, there exists a host of small consulting outfits, manned by people who previously held a senior position at some well-known publisher. But understanding the industry mechanics, from financial analysis to consumer insight, is a skill only mastered over time. Drawing insight from the available data generated by digital games is as much an art as it is a science, and demands a dedicated effort and expertise.

How do we solve this?

plea at GDC this year to eliminate social injustice from the games industry, by making games geared to minorities, and received a standing ovation. Market research helps to develop, publish and make available interactive experiences that resonate with all gamers out there, not just a subset. Better still, knowing the market keeps companies in business even when they have a bad run. To operate alone in a volatile industry like interactive entertainment means you increase the risk. Ain’t nobody got time that for that.

The only way for the games industry to get the research it so desperately needs is through building relationships and working together. Four years ago, I started asking companies to help me compile a dataset on digital games: in exchange for market insight, we ask that companies share their data. This has proven very effective: today we have access to the monthly spending of over four million paying online gamers from 50 different publishers across 450 different titles, worldwide. Having an open conversation with publishers, developers, service providers and investors has allowed us to build a comprehensive methodology that provides critical market information, title-specific revenue estimates, and deep consumer insight. To build in a redundancy and ensure we’re not misreading any trends, we combine this quantitative dataset with regular qualitative research studies. This gives us multiple touch points in describing the market, assessing market share and figuring out what truly drives the market.

  • Case in point 1: The emergence of new business models like free-to-play demands expertise. By sharing data, even if it is anonymously, we’re able to get a better sense of what works well in the market, and how we can improve our games. This helps reduce the cost of figuring out how to best serve customers and, ideally, reduce the amount of garbage free-to-play games aimed solely at nickel-and-diming users, leaving more money to develop richer experiences.
  • Case in point 2: Different markets have different preferences. So, too, when it comes to how people like to pay for their games. Having a complete picture of the different payment options that users look for when they’re ready to send you money is a key piece to successful digital publishing. Just because your game only has a modest audience base, does not mean your efforts are not sustainable.
  • Case in point 3: The cost of user acquisition on Facebook, Android and iOS has increased tremendously in 2013. By sharing this information anonymously, we were able to provide contributors with detailed insight on when to spend, and when not to spend, their marketing budgets.

And so to repeat my earlier point: at moments that an industry undergoes a fundamental change, market research is most valuable. Working together helps build better games. So if you, too, feel that video games research sucks, shoot me a line.

joost at superdataresearch dot com

Unity3D Resumable Downloads

Original Author: Gabor Szauer

The Problem

I keep working with mobile games built using Unity, most of which have a target executable size of 50MB. These games however  tend to have more than 50MB worth of assets, so they pack content up into asset bundles, and download them later using Unity’s WWW class. At this point these games all run into the same problem. On a flaky mobile network downloading a 2MB file can be a challenge, often the download is disrupted or worse yet the player walks into an underground tunnel (or something like that).

The Solution

Abandon the built in WWW class to load it up as an asset bundle. It sounds simple but there are a few issues we will have to deal with. First, if you want a download to be resumable you have to make sure that the file you finish downloading is the same file you started downloading. You also need to know how many bytes you are trying to download, if this information isn’t available you can’t really resume your download. Then there is the issue of coming up with a simple interface of doing a complex task. I’ll try to tackle all these issues in this article.

Why HttpWebRequest?

The short answer is HttpWebRequest does not. There is also the option of using third party libraries for this functionality, but most of them are built on top of the same methods we are going to use, they offer really little added value.

Birds Eye View

We’re going to create a class to handle the loading of an asset for us. We’ll call this class RemoteFile. Ideally you should be able to create a remote file, retrieve information from it and trigger a download if needed. This is the target API:

Simple but powerful. Our class is going to be responsible for downloading files as well as comparing to local files to see if an update is needed. Everything load in the background, not blocking the application. The actual class will look like:

What are those variables?

  • mRemoteFile
    • String that stores the url of the remote file to be downloaded
  • mLocalFile
    • String that stores the uri of the local copy, that remote file is to be downloaded to
  • mRemoteLastModified
    • Timestamp of when the remote file was last modified
    • Used to check if the file needs to be re-dowloaded
  • mLocalLastModified
    • Timestamp of when the local file was last modified
    • Used to check if the file needs to be re-downloaded
  • mRemoteFileSize
    • Size (in bytes) of the remote file
    • Needed for the AddRange method
  • mAsynchResponse
    • HttpWebResponse response token
    • Used to signal when download finishes
    • Used to get file stream that gets written to disk
  • mBundle
    • Convenient access to the local copy of the file (If it’s an assetbundle)

Implementation Details

Lets go trough every method and accessor in this class one at a time.

LocalBundle

LocalBundle is a convenient accessor for the mBundle variable

IsOutdated

IsOutdated is an accessor that compares the last modified time of the server file to the last modified time of the local file. If the server was modified after the local file, our local copy is outdated and needs to be re-downloaded. If the server does not support returning this data (more on that in the constructor) then IsOutdated will always be true.

Constructor

The Constructor takes two strings, the url of the remote file to load, and the path of the local file where the remote file should be saved. The Constructor is responsible for getting the last modified times for the remote and local file, it also gets the size in bytes of the remote file. We get information about the remote file trough an HttpWebRequest with it’s content type set to HEAD. Setting the content type to HEAD ensures that we only get header data, not the whole file.

Setting the requests method to HEAD ensures we don’t get the full file, only header data. Because this is such a small amount of data we retrieve it with a blocking function. Beware, not all servers support the HEAD tag

The only return data that we will use from here is the files size, and last modified time. Much like the HEAD tag there is no guarantee that every server will support providing this information. If not available, the current time will be returned for last modified and zero will be returned for the file size.

Not all hosts support the HEAD command. If your host does not, it will serve up the full file and defeat the purpose of this function. Some hosts like Dropbox will support the HEAD command but will serve up invalid last modified times. If this happens IsOurdated will always be true and your file will always be downloaded, defeating the purpose of this class.

It is your responsibility as the programmer to run this code against your production server and make sure that all data is being returned as expected (HEAD is supported, and the correct last modified time gets returned.

Download

The Download coroutine will download the desired file using an BeginGetResponse takes two arguments, a callback which in our case its the AsynchCallback method and a user object, which will be the response it’s self. Once the download is compleate we just write the result of the stream to disk.

If the local file is large than the remote file we panic and delete the local file. If the remote file was modified more recently than the local file, the local copy is assumed outdated and is deleted.

A few things happen here, first a timeout must be set. If the timeout is not set, the download will just hang. We provide the request with a range of bytes to download. While the documentation claims that given a single argument (the size of the local file) it should download the entire file from that starting point. In my experience this isn’t the case, if a start and end byte is not provided the download just hangs. The method of the request is set to POST, this is because the previous calls set some body data and we can no longer send a get request. We next call BeginResponse, which will trigger the AsynchCallback method. Until AsynchCallback is called mAsynchResponse will be null, so we can yield on this.

Finally we write the downloaded file stream to disk and do some cleanup. It’s important remember that this is a coroutine. If called from another class be sure to call it as such. While IsOutdated is  public, you don’t need to worry about checking it. The Download function will early out and not download if the file is already up to date.

AsynchCallback

The AsynchCallback Gets called from the BeginResponse method when it’s ready. The request object was passed into the BeginResponse method, it will be forwarded to the AsynchCallback callback. From there we can get the HttpWebResponse and let the Download function finish executing.

LoadLocalBundle

LoadLocalBundle is a utility function to load the asset bundle we downloaded from disk. The only thing to note here is that in order to load the file from disk with the WWW class we must add file:// to the beginning.

UnloadLocalBundle

This is the counter to LoadLocalBundle. Managing memory is important, you do not want to have an asset bundle hanging around if it is not absolutely necessary.

Testing

The last thing to do is to test the code, i made a quick utility class for this

My example project is available on github

What next?

It may seem like the hard part is out of the way, but really this was the easy part. It’s up to you to come up with an efficient strategy for managing asset bundles. The real work is in making the system that manages bundles and the memory associated with them. To keep memory sane, download a bundle, load it into memory, instantiate any needed game objects and unload the bundle.

Challenges with projects and kids

Original Author: Colleen Delzer

10 years ago my husband and I decided we wanted to make games. We had tons of time to learn anything we needed to even with both of us having jobs.

I look back and wish we didn’t take it for granted. We made lots of practice projects that never made it past a level or two. Everything was very casual, work on an art piece one week, work on something else a month later.

Fast forward 8 years and we still didn’t have anything finished, just a project we worked on for a few years. We were on a roll with our first game Nono and the Cursed Treasure when we found out we were going to have a child. We tried to get as much stuff done as possible before she was born, but being pregnant with nausea and fatigue made it hard to accomplish much…but I could had tried harder. Our first daughter was born and it was nearly impossible to do much with a baby and we really didn’t schedule ourselves like we should have.

When she turned a year and she was able to walk, it got a lot better and we could start really pacing ourselves. Soon after that we were unexpectedly

going to have another baby. It was extremely daunting to think of the lack of time we were going to have. Would we have to just stop thinking of the possibility of even finishing our game we already spent too much time on? Being pregnant and having a toddler was already hard, and it got a lot more challenging when our second was born, top it off with our almost 2 year old getting into the tantrum stage being extremely jealous of the newborn.
girls

Our second daughter is now 7 months old and we are still able to create our projects. It can be challenging at times, but we manage. Having a schedule between us have helped immensely. We alternate days to work on things while the other looks after the kids and we keep one day for family day.

Our first game is just about finished and we are also in the works of a new project that is going well. We would have never thought it was possible but our lives evolve to manage new challenges and we get used to the changes. Things take a lot more time and we learn to take advantage of every second of free time we have. We think of all the things we could get done with the free time that we once had, but we couldn’t imagine our lives without the girls.

We have learned to not waste time we have when we have it because you never know what will happen in the future and believe me it goes faster than you think! At the same time, make sure to take some time to spend it with people you care about and not worry too much. Think positive even though it can be hard sometimes!

Keyboard shortcuts info in debug builds – IMGUI style

Original Author: Hubert Rutkowski

I want to share a little discovery, a trick easing the creation and development of games. First a short motivational and historical introduction.

Case study

I don’t know if you have it too, but when I’m creating a game, I often add to it a lot of keyboard shortcuts – even a few dozens. Player avatar, enemies, map, camera, physics, rules etc. Shortcuts use normal letters, digits, F keys, special keys (tab, home, backspace), and their combinations with ctrl, shift, alt. Sometimes it’s hard to figure out what is available and what does what.

Especially I had problem with that in Ninja Cat, where shortcut availability depended on compilation mode (CONFIG:: variables, you can treat them like #defines) – debug+editor had all of them, debug without editor had a part of that, editor without debug had yet another part etc. It was very useful, because sometimes having too much shortcuts and special keys was bad – ie. during testing, when pressing some keys would change gameplay.

But I didn’t want to have it as binary – either all, or none. Sometimes the ability to intervene meant beaing able to catch a bug on the hot spot (perhaps Heisenbug?), and do something about it. So it was nice having it at such a granular level (debug/release, editor, deploy, profile). The disadvantage was that often I forgot what the actual shortctus were, what could I do at this moment. I was pressing shortcuts and they didn’t work, or I was forgeting about some magic functionality, hidden in not so visible code places.

Solution

Ok, now I’ll explain how to fix this situation. The easiest is to just display a list of all shortcuts, upon pressing some key like F1 or H(elp). But nobody would be able to create such list and maintain it over time. Also, looking at huge list of shortcuts when you have available only let’s say, a third of them, is not that really helpful. The solution? Do it in a automated way.

I’ve discovered something similiar to Immediate Mode GUI (

Simple and obvious. Now let’s add something:

if (CONFIG::debug)
 
  {
 
      if (Game.info_keyboard_shortcuts_show)
 
      {
 
          Game.info_keyboard_shortcuts_list += "SHIFT+D = kill all enemiesn";
 
      }
 
  
 
      if (FlxG.keys.SHIFT && FlxG.keys.justPressed("D"))
 
      ...
 
      // code from above
 
  }
 
  
 
  // in singleton Game:
 
  public static var info_keyboard_shortcuts_show : Boolean = false;
 
  public static var info_keyboard_shortcuts_list : String;
 
  
 
  

Because the code is in if corresponding to compilation mode, its body will be executed only in that one specific mode. In this way, only the available shortcuts will be added to string and displayed.

Implementing the rest is trivial:

// once a frame call this function which checks whether you want to see the shortcuts list.
 
  // In such a case, display the list, collected in a simple string over the course of a frame
 
  
 
  static public function ShowKeyboardShortcuts():void
 
  {
 
      info_keyboard_shortcuts_show = FlxG.keys.pressed("H");
 
  
 
      if (info_keyboard_shortcuts_show)
 
      {
 
           var shortcuts_list : FlxText = new FlxText(200, 10, 350, info_keyboard_shortcuts_list);
 
           shortcuts_list.shadow = 0x000001;
 
           shortcuts_list.draw();
 
  
 
           info_keyboard_shortcuts_list = "";
 
      }
 
  }

That’s how it looks in practice:

key short info imgui

The text is displayed after holding H. So far I’ve added only few shortcuts.

Additions

1. The compilation modes can be combined, as well as mixed with runtime modes, ie.

if (CONFIG::debug && CONFIG::profile &&
 
      Game.is_profiling_state && Camera.is_some_other_state)
 
  {
 
        // some very specific shortcut
 
  }
 
  

2. My friend where I posted first about this technique; warning: polish language), how to make it even faster and less verbose. This:

if (CONFIG::debug)
 
  {
 
      if (Game.info_keyboard_shortcuts_show)
 
      {
 
          Game.info_keyboard_shortcuts_list += "SHIFT+D = kill all enemiesn";
 
      }
 
  
 
      if (FlxG.keys.SHIFT && FlxG.keys.justPressed("D"))
 
      ...
 
  }

should be quite easy to reduce to:

IF_DEBUG_KEY( SHIFT+'D', "kill all enemies" )
 
  {
 
     ... code ...
 
  }

Where IF_DEBUG_KEY does everything inside: testing compilation mode, checking whether to display shortcut info, testing whether such key combination is pressed, removing code in release build.

Summary

Little work, simple and flexible code, works everywhere (language agnostic), fixes specific problem. Maintainibility is high, because code handling the shortcuts is just next to the code displaying shortcut info. I recommend to use it to your bag of tricks / current game 🙂

Company growth and the development approach of Fragments of Him

Original Author: Tino van der Kraan

This blog is intended as a write-up of recent events and activities of our game development company SassyBot Studio. As a result, the contents of the blog reflect personal approaches and insights that we like to share and should not be taken as industry facts. We hope that by sharing our thoughts and experiences of past and future events it may help other start-up indie devs with struggles and questions of their own. We embrace contact and encourage you to share your adventures and lessons with us either here or through Twitter @SassyBotStudio.

Office space for digital media

As of this month, you can find SassyBot Studio in an office space of its very own. Well, that’s not technically true as we share it with a few other folks. Regardless, we believe that having a space for our projects and operations is invaluable.

It could be argued that, because the nature of our business is largely digital, game developers don’t really need office space. That would be true if virtual collaboration with others would be as rich, synchronous, and seamless as it is in the real world. The biggest bonus of having a dedicated space for work is most definitely the noticeable focus and productivity increase compared to working from home. Working on a different location outside of home gets you out of the everyday domestic distractions. It’s possible to find these work places at a library, university, a diner, or coffee shop. The real benefit of an office space over public work places is obviously the ability to customize the work space, determine your own work hours, and the peer pressure of showing up and putting in the necessary hours.

Of course, there are also downsides such as rent, insurance, furniture, equipment, and other matters that are mandatory or recommendable to a practice such as ours. Even though an office space comes with an initial time investment and financial cost, we think it is definitely worth the sacrifice as productivity has increased and collaboration is now a lot easier.

Fragments of Him update

Progress on our upcoming title Fragments of Him has been picking up over the last month, although other important matters, such as external work and the recent Casual Connect conference in Amsterdam, has taken a chunk out of our precious development time. An outline of the game’s design now exists and it is enough to block out the premise, characters, and mechanics for the game.

Based on the instructions and descriptions in the design, we have started blocking out environments that provide the player with context in which the game will be taking place. The basic gameplay systems are currently in place, such as player navigation and an art pipeline that allows for fast iterations.

Motion capture

The characters in Fragments of Him are going to be more dynamic than those that were seen in the prototype that is still up on Kongregate. Even though the story in the prototype could be told in a compelling way with static characters, we believe that lifelike animation can greatly emphasize the message, if acted out properly.

We want to have a lot of short, realistic animations in the game. In order to quickly iterate through motions until we have the right one, we think that a motion capture system could be greatly beneficial. Hand animating believable and lifelike character motions requires a substantial amount of time and expertise to pull off. Cleaning up animation keyframes from motion capture data takes less skill and it will probably be faster to get to the desired result. For these reasons, we are close to making the decision of creating a motion capture rig of our own.

The software we intend on using to create a motion capture setup that fits within our budget is called iPi Motion Capture (http://ipisoft.com/) and it supports consumer motion cameras such as Kinect and PS Eye as input devices. What we have seen up till now has made us very enthusiastic about the ease of use and overall possibilities using this technology.

Fragments of Him development approach

As we are creating a larger game than previously attempted, we have been doing quite a bit of high-level thinking on how we can best approach this project. It’s important that we make the best use of our available time and resources by catching problems early on when they do not yet have a considerable financial impact on the end result.

The high level approach we currently use for making Fragments of Him can be seen as a sequence of phases which we have labelled as follows:

  1. Horizontal slice phase
  2. Vertical slice phase
  3. Decor phase
  4. Polish phase

During production, we use the term horizontal slice to indicate the bare minimum that we need in order to have the game playable from A to Z. In this phase, we try to put the most important and rough objects into the game as soon as possible. This is also known as the MVP (Minimum Viable Product). The horizontal slice is made up out of critical elements that are necessary for the game to function.

We are going to approach the horizontal slice of Fragments of Him in this order:

  1. Environment blockout/whitebox
  2. Key interaction objects and functionality
  3. Critical interface elements
  4. Narrative system with placeholder scripts
  5. Main characters in crude forms
  6. Audio placeholders
    • Music
    • Sound effects
    • Foley

Great foley examples are ‘Attack of the Clones DVD – Episode II: Foley‘.

Good audio is critical to the experience of a narrative-heavy game such as Fragments of Him and, although this approach appears to place it at a low priority, we hope to move on to including it very rapidly. We expect to use free assets from classical, royalty-free, libraries before moving on to working on a finalized score for the final release.

The result from this phase can be used to figure out if the core value of the game can already be experienced. Usually, the first result will not create the experience that the design has in mind. From this point onwards, a lot of iterations need to take place that will nudge the game towards where it needs to be. When the core design is present and representable for the game’s intended experience, we can include playtesters into the iteration cycle who will be able to tell us whether or not the design and game is working as intended. While that happens, we can direct our attention towards creating a vertical slice.

Vertical slice phase

A vertical slice represents a short segment of the game to a standard that is representative of the final product. On a graphical level, this can set the standard as to how the rest of the game should look and feel. Additionally, this is useful for promotional purposes as it can give people a taste of what they can expect in the full game. If you are looking for funding for your game, then creating a vertical slice of your game can be used for this. With Fragments of Him, we intend to create a vertical slice for promotion in order to raise awareness and muster support. We’ll try to make this vertical slice available to as many conferences, game journalists, and let’s play content creators as possible.

To get an impression of what is required for a vertical slice, you could imagine the work required for the phases below applied to approximately a 10% segment of the final game. It is basically getting a taste for a piece of the cake.

Décor phase

The décor phase can be seen as the phase where non-essential assets and functionality gets added to the game. A lovely analogy for this is this:

Imagine actors rehearsing for a theater play. These actors picture the environments and props around them as they practice to get the entire piece acted out and presented properly. In many ways, that can be seen as the horizontal slice. The décor phase is where the set dressing, lights, music, audio, and character costumes are pulled out to complete the set for the grand performance.

Before décor

After décor

Essentially, the elements that get added in the décor phase of our game development process are to accommodate for mood, atmosphere, flow, and ambience that support the core of the game. The elements required for this phase are not critical for playing the game, but it does add tremendous value to the intended presentation. Some of the elements you can think of adding in this process are:

  • Scene dressing
  • Complete set of rigged character models
  • Textures
  • Additional, non-essential, animations
  • Scene lighting
  • Particles and effects
  • Additional, non-essential, functionality
  • Interface graphics
  • Complete audio set for music, sound effects, and foley

Polish phase

When we get to the point where all the elements of the game are put together, we will go into the polish phase. In this phase, we will not add any more assets or elements unless they will add considerable value to the game. Usually, this phase can take the longest to finish. Most of the work that is done in this phase is not really visible to the player and takes place behind the scenes. The purpose of this phase is to improve game performance by optimizing game assets and code as well as track down and minimize the occurrence of game breaking and experience hindering bugs. This process aims to make the game look as great as our resources allow us. Some of the activities we do in this process are:

  • Clean up and perform optimization of:
    • Meshes
    • Textures
    • Animations
    • Game systems
    • Interface
  • Tweak scene light parameters and light bake settings
  • Quality assurance of the game’s performance and experience

Conclusion

This last month has been pretty busy as we moved into SassyBot’s first office. Due to all this, the development of Fragments of Him has taken a slight backseat although we have made progress. We also decided that motion capture will very likely be the way we intend to get animations into the game and have researched cost-effective ways of doing this. Lastly, we have planned the development approach of Fragments of Him by dividing it up into the horizontal slice, vertical slice, décor, and polish phases. Thank you for reading all the way to the end and we hope you let us know what you think either through @SassyBotStudio).

Musical Dimensions: What Music Reveals About Your Game

Original Author: Chris Polus

The effort it takes to make a game is divided between several individuals, even more so that members of indie teams are often spread all over the world. Somebody does visuals, somebody game design, programming, 3D models and so on. But no matter what role I had in an indie team, be it composer, sound designer, team lead or producer: The more I knew about how other departments worked, the better I could do my job. Normally, composers are only needed for a short period of time. Thus, composers join and then leave indie teams again after they delivered their part of the puzzle. But what do composers work with so one can make the most of their time?

That’s why I’d like to open the box and give you an overview of what it is composers work with, what dimensions in music they have at their disposal in terms of music. Next time you run into one on your project, you maybe can even direct him / her better to get the best out of this team member and increase the quality of your game overall. The better you can direct a musician, the better you can phrase what you want, the easier, more efficient and enjoyable the collaboration will be for both sides.

First, why is there music at all?

I find this question very intriguing. Why is it so common to have music in games? Some games want to be as realistic as possible with graphics and sound and yet, there’s no music that accompanies our actions in the real world. No sad piano that starts playing when our dog dies, no string clusters and glissandi when we hear strange noises in our house and we suspect zombies. (Ha! This would be truly creepy if you suddenly heard a sinister music while sitting on your couch with the TV off). So why is there music in games? Is it too boring otherwise? Is the story not interesting enough? Is the ambience too quiet so we need something to listen to while we gather resources, jump our way to the next platform, or complete quests?

Maybe. Games can feel empty if there isn’t any music that fills the background. But in my opinion, there’s a far more important feat: I think music is the best and most effective way of transporting emotions in a way nothing else can. No sound, no people crying on-screen, no heartbreaking pictures can shake you up in the way music can. And easily so! Music has the power to significantly amplify whatever the game’s story requires the audience to feel. Thus, composers are able to influence the audience to the greatest extent. Music can add that extra tension to action packed car chase sequences, that narrow feel to horror first person shooters, or suggest an infinite landscape for adventurers to explore when, in reality, there is only a handful of hills and valleys.

Conclusion: One of the most powerful aspects of music is to transport emotions. Use this power in your game to leverage the player’s experience.

People have expectations, feed those expectations

Everybody has a long experience of playing games, watching movies, or traveling the world. We have a bag of personal experience of how things sound. Let’s think of a Western game for example. The hot ambience in Westerns is excellently transported by strummed guitars. Bells symbolize high noon, when there’s a shootout. Listen to a sample of Red Dead Redemption.

)

It’s because we have seen so many Westerns on TV that we can sort of guess that this music could be from a Western rather than a science fiction game. Maybe we can’t pin it down to a specific instrument or chord, but we have this gut feeling that’s based on our lifelong experience.

Composing music for games is about playing with those clichés to satisfy people’s expectations. At least up to a certain extent. To make it very clear: It would be very weird having electronic dance music in a Western style game, because this is simply not what people expect!

What music can tell us about location

Now that we know music can evoke emotions, and that we expect to hear certain music for certain kinds of genres, let’s pin this down further. For this example, let’s go from the Wild West all the way to Asia. The Asian culture is different from other cultures, obviously. Particularly interesting here is that the Asian culture developed different kinds of instruments! There is a plucked Japanese instrument called Erhu (make sure to listen to the sound bites on the Wikipedia pages). These instruments have a distinct sound and a special way they are played. As soon as you play a few notes with one of those instruments you instantly get this Asian feel in the music. Combine this with traditional Japanese scales and you immediately “know” this game has to play somewhere in Asia. Goal achieved.

Check out this sample from “Blossoms”, a free casual iOS game I worked on as the composer. It mixes Asian instruments with an orchestra, but still retains the Asian feel.

Other cultures developed their own instruments. A popular Middle Eastern instrument for example is the Armenian Duduk which evokes this wide open feel of an infinite desert. Here, too, make sure to listen to the sound bites on the Wikipedia page to get the feeling.

As you can see, different cultures have their specific instruments. We know the sound of these instruments from movies, documentaries, or personal travels. We associate their sound with a geographical region. If a game takes place, say, in the desert, I could try adding a Duduk to my composition to help make the setting more believable, because that’s the instrument you find in these parts of the world.

Conclusion: Music can give the audience a sense of geographical location. Help players immerse in your game by injecting the right location to the music.

What music can tell us about time

Similarly to our geographical sense there is a sense of time. If I wanted to compose music for a game that plays in the Middle Ages, I wouldn’t use a modern synthesizer as my go-to instrument. I’d look around to find videos that show what music people made in that time. And, apparently, there are specific instruments and scales that were used in the Middle Ages. Here’s an example of a medieval band called Flauto Dolce.

)

Did you get this feeling of kings, castles, jesters and feasts like I did? I would try and load the instruments from the video in my sampler (if I had them) and then try to mimic their playing style to get my first melodies down.

On the other hand, if I was composing for a modern, action-oriented Tron-style game, I’d look for cool and wobbly synthesizer sounds as this would feel more like a computer world with its technical, digital sounds to me. As a last example, think about the sixties era. It had its particular style of pop and rock songs one could recognize and associate with this time.

Conclusion: Music can give the audience an orientation in time. Use this to make your game feel modern, or ancient, or anything in between.

What music can tell us about the action

The pace of a game can be greatly manipulated with music. Action packed sequences benefit from fast rhythmical patterns. This could be percussion or rhythmical strings, horns or any other instrument. Here’s a combat example from the game Danny Baranowski.

)

Compare this to the next track of the same album, which is meditative and calm.

)

Depending on the rhythm of various instruments, activities in a game can be perceived as fast, slow, forward moving and so on. Fast patterns make the perfect companion to get an extra kick out of a car chase or a battle sequence, forward moving patterns like this one from the game Josh Whelchel.

)

Here, too, music just amplifies that feeling of velocity or danger. And when music changes from peaceful to fast moving, or even dangerous, you know something is about to happen in the game. ATTACK!

Conclusion: Music can give the audience a sense of the speed of the action. Use it to increase tension or make players feel relaxed, depending on the current gameplay.

What music can tell us about size

Some instruments and musical techniques suggest a physical size to things. High pitches tend to sound like small things. Slowly played instruments that play low pitches like double basses or tubas make you think of impressively large things. This doesn’t only go for things. The same effect can be applied to spaces like landscapes. Music can create an incredibly open feel and limitless freedom like in this stunning video by Michel Fletcher (go to 2m 20s).

A voice and a long reverb give a strong sense of grandeur. Likewise, music can create narrow spaces to support a claustrophobic, horror kind of feel with the help of sound effects. The pitch is a big factor in this equation.

Conclusion: Music can give the audience a sense of size of things or spaces. Use it to underline the huge monster that’s about to appear or the claustrophobic hallway the player is supposed to cross.

WOW, music does all that stuff!

All those aforementioned elements are like building blocks. Depending on what effects need to be achieved in a game, let’s say an epic battle, a calm resource gathering phase, a vast and infinite landscape for adventurers to explore, or a narrow, claustrophobic house filled with monsters, a composer can combine those building blocks in different ways.

  • emotional blocks (happy, sad, dangerous, horrified)
  • geographical blocks (Asia, Middle East, Ireland)
  • time blocks (future, sixties, Middle Age)
  • action blocks (fast, slow, forward moving)
  • space blocks (tiny, narrow, limitless)

Of course, these general directions, once mastered, can be explored creatively. Maybe you have a game that’s set in the future, but you deliberately use rusty-looking visuals and Middle Age instruments to get a medieval feel to stun your audience with an unusual combination. Or use Middle Age instruments to play modern melodies, as an accent in an otherwise electronic soundtrack. There are so many creative possibilities of using clichés, bending them, torturing them, creating completely new combinations of unheard-of music. And this is what it’s all about: Knowing the rules, exploring and bending them in creative ways.

With this introduction I hope to have given you an insight into the craft of composing music for games. You’re aware of the power music has to make the audience feel the way your story writer intended and you can communicate more efficiently what effect you want. Have fun and good luck with your game projects!

Vessel Post Mortem: Part 3

Original Author: Tony Albrecht

This is the final part of the Vessel Porting Trilogy. The previous parts can be found here


After the Christmas break (where I was forced into yet another family vacation) I dove straight back into the code, hoping to get the final issue fixed and the game submitted as soon as possible. There was still time to submit and if I passed Sony’s submission process first time then we could still have the game on the PSN store by the end of January. I fixed the remaining problem with the start up error reporting and added in a little more error checking just to be sure. I was pretty paranoid about the submission process as I didn’t want to delay the game anymore by failing – I carefully checked everything twice, crossed my fingers and hit the ‘submit’ button to Sony America (SCEA) on January 9.

M._Arkwright_Sprays_Lava_Fluros_with_Liquid_Gun

In parallel to the code submission process there is a ‘Metadata’ submission process. This corresponds to the PlayStation Store assets and the game’s presence there. It consists of all the text, images, trailers, prices etc and they all have very specific requirements that must be met in order to pass submission. James (our QA guy) managed most of this and involved communicating with Strange Loop’s art guy Milenko (who was incredibly responsive – I’m not sure he ever sleeps) and consisted of asking for different resolution images and screenshots, as well as sourcing the different language versions of the game text. It took us a few submissions of the meta data to get it all right, but the turnaround was pretty quick and a continuous dialog with the Sony helped a lot.

The code submission process consists of uploading the game in a special package format plus some extra documents that describe the trophies and other bits and pieces. We had to submit to both SCEA and Sony Europe (SCEE) so we could have the game released in both those regions. We hadn’t submitted to SCEE at the same time as SCEA as we were still waiting on some publisher registration details to come through, so all I could do was wait for that and for the response from SCEA on our initial submission while I busied myself with some other work.

On January 18th I received the first fail from Sony. As fails go, it was a pretty good one. There were three “Must Fix” bugs: one was due to my entering the wrong data in a field when submitting the game package, and two were due to saving games failing when there was no disk space left. They’d also mentioned some slowdown in some of the levels – I’d expected that, and as it wasn’t a critical bug, I ignored it. The save game bugs proved troublesome – Ben had written all of the save game code and with him gone I now had to learn how Sony wanted you to do it, how Ben had done it and how I could make it work correctly. It took me a few days to find and fix the problems and by this time the details that SCEE required had come through so I resubmitted to SCEE and SCEA at the same time (January 24)

I was quietly confident that it’d all pass this time. I’d thoroughly tested the save game code now and it all worked. What could go wrong? I seriously considered heading out and buying a bottle of Chimay Blue as a pre-emptive celebratory reward.

relaxing-with-a-chimay-blue1

JazzMonster

I eventually tracked the crash bugs down to a simple stupid error. There was a section of code that was responsible for limiting the amount of fluid, fluros and seeds in a given section. When the frame rate dropped below 60fps, this code would kick in and drop the number of drops/fluros or seeds to the minimum value deemed suitable for that level. During the final stages of development we had created a FINAL_RELEASE mode which turned off all of the unnecessary debugging code. Unfortunately, this erroneously included some code which updated the time values that the limiting code used, so the fluid, fluros and seeds were never being reduced. Ever. This meant that the game would have been running much slower than it should have, and was prone to crashes whenever certain hard coded limits were exceeded. I’ve never been so relieved to fail anything.

Something I’d like to focus on here is the level of performance that Vessel was submitted with; Ben and I spent a huge amount of our time just trying to squeeze as much performance out of this game as possible on the platform we had. For the most part, I’m pretty happy with the results. Yes, you can abuse the game and make it slow down quite easily – if you try to fill a room with fluros and water and seeds then you’ll most likely see some slowdown – but during normal play, the game maintains 60fps simulation at 30fps frame rate (the graphics is 30fps, so there are two simulation passes per visible frame). However there are a few levels where the sheer number of fluoros and fluid required to solve the puzzles means that the frame rate will drop to 20fps. Given another month I reckon I could have fixed that too, but given how far over time and budget we were, that wasn’t an option. So, while I’m happy with how much we improved the frame rate of the game, I’m still a little disappointed that we didn’t hit a full 60fps simulation everywhere.

Sure, the PlayStation 3 hardware did slow down the porting of the game initially, but given the complexity of the fluid simulation required I doubt that there will ever be an Xbox 360 version released. The SPUs allowed us to perform at least 80% of the physics simulation completely in parallel – with no impact on the main processor at all. There is no way you’d get the same level of performance on the X360.

PlayStation_3_Controller

With the FINAL_RELEASE bug fixed and the game resubmitted to SCEE and SCEA (which passed on the 20th and 22nd respectively) I was finally free! The metadata was all sorted and we’re now expecting the release on March 11 in the SCEA territories and March 12 in the SCEE regions. We’re all hoping it does well enough to cover our costs (as all devs with newborn progeny are).

What went right

Having a QA guy in house was invaluable. An impartial, non-technical third party to throw our fixes at kept us honest. As a developer you get very close to the game and like to assume that your fixes have made a positive difference and you’re moving forwards. That’s not always the case.

Honest communication with the Publisher/client. I tried to keep John at Strange Loop Games constantly up to date with the progress (or lack thereof) of the game. He was incredibly understanding with the continuous delays, and while it was hard to deliver bad news, at least it reduced the surprises.

Having good, experienced coders on the job. Bringing Ben onboard was a good choice, even though we had to let him go in the end. There is no way I could have done this without him. Working from the same physical office was also beneficial. I worked remotely for over 5 years leading up to this port, and I think we’d have delivered even later if we’d worked separately.

What went wrong

Work estimation. I fundamentally underestimated the amount of work required. I should have applied Yossarian’s Circular Estimation Conjecture and multiplied it by PI – that would have been closer to the mark. The big killer was the misunderstanding with which threads needed to run at 60fps. If it was just the physics I’d underestimated, we probably would have been a shade over a month late (maybe two), but with both the physics and game threads needing to execute in under 16.6ms, we really had our work cut out for us. The amount of time taken for submission shouldn’t be forgotten either; 4 to 8 weeks after the initial submission should see your game on the store.

I’d also recommend to anyone doing a console game that they get as much TRC compliance done as soon as possible in the development of the game. Get the load/save stuff working, trophies, error reporting, quitting, controller connection, file error reporting and the like in place and functioning sooner rather than later.

In Closing

The porting of this game was a trial for me. There was a week or two around September/October where I was really doubting that we could deliver a game that ran at a playable frame rate. I’m proud of what we delivered in the end, and if you’d taken away the financial and time pressures I would have thoroughly enjoyed the challenge. I’ve learnt a lot from this experience, and I’m looking forwards to the next one.

Just not right now, OK? Maybe a bit later in the year.

Vessel Post Mortem: Part 2

Original Author: Tony Albrecht

In the last episode, our intrepid heroes had discovered that rather than having almost finished the PlayStation 3 port of Vessel, they were barely half way there. Read on to find out what happens next…

(This is part 2 of a 3 part series on the porting of Vessel from PC to PS3. Part 1 can be found here)

With the realisation that both the game and render threads needed to run a 60fps, we had to reassess where we focussed our optimisation efforts. The game thread was doing a lot of different jobs – Lua scripting, playing audio, AI, animating sprites – all sorts of things. There was a lot to understand there, and on top of that, both the physics and render threads still needed more work. But performance wasn’t the only concern – there was TRC compliance, game play bugs, new bugs introduced by our changes and, as we eventually discovered, the need for a PC build (more on that later).

Vessel Screenshot 4

Now, Vessel was ‘mostly’ bug free on PC when we started – we did find a few bugs in the Lua and in game code and the few compiler and shader compiler bugs added to that, but there were subtle differences in the way the two platforms dealt with numbers which meant that weird things sometimes happened. For instance, a door on one of the levels would get stuck and your character would have to bump into it in order for it to open. This was due to a variation in position values about 5 decimal places in causing the jamming on PS3. Additionally, there was some code that was used in a number of places that did something like the following;

x = MIN( y/z, 1.0);

What this did on PC was catch the case where z tended to zero and clamped x to 1.0 ( MIN(infinity, 1.0) = 1.0). Unfortunately, on PS3 value/0.0 was QNAN and MIN(QNAN, 1.0) was QNAN. So the clamping never happened resulting in much weirdness in hard to reproduce places (it only occurred when z was zero), and was therefore a bugger to find.

This meant that we weren’t just optimising and adding in new PS3 functionality, we were also debugging a large foreign codebase. I hadn’t expected that we’d need to change Vessel much – it was a shipped game after all and so I had assumed that it was pretty much bug free.

I had also assumed that for the most part we wouldn’t need to change any assets significantly. Things like platform specific images for controllers and buttons were just texture changes and so weren’t a problem. Text was fine too. Unfortunately, problems like the sticking door mentioned above meant that we need to go into some of the levels and jiggle bits around. Later in the life of the port we had to tweak fluid flows, emission rates, drains and other fluid based effects to help with performance.

All of this meant that we needed to have the editor up and running, and the editor was built using the same engine as the game so we needed to maintain two builds of the game – one for PS3 and one for PC solely for the editor. This was done crudely by #defineing most of our PS3 changes out and leaving the PC code in there inside the #else conditional. This did slow us down a bit and also made our code that much uglier, but it had the side effect of allowing us to easily test if a bug was present in the PC build or was specific to the PS3.

By the end of September we had fixed many audio issues and managed to get LuaJIT (an optimised version of Lua) working again (it was mostly working but was causing some repeatable problems in some places). Ben also profiled the Lua memory usage (article here) and tweaked its garbage collection to be more CPU friendly (it was occasionally peaking at 13ms in a call). So, slowly, we were scraping back time spent on the game thread. It was still a serious problem, but it was getting better.

Vessel Screenshot 3

Back on the physics system, more and more the CPU code was being ported to the SPUs – in many ways, this was getting easier as I was building a lot of infrastructure which made it more convenient to port. Many of the patterns that I used for one system would translate well to others. I started reducing the amount of memory used by parts of the physics, like dropping the size of the WaterDrop struct to 128 bytes from 144 bytes meant better cache usage and simpler processing on SPU (nicely aligned structs meant that I could assume alignment for SIMD processing).

Some optimisations were straightforward. For example, I changed the way that we processed drops on fluros skeletons. Instead of iterating over the entire drop list and checking each drop to see if it was part of skeleton ‘N’ I changed it to pass over the drop list once, partitioning the drops into buckets of drops per skeleton. Then we could just process a given skeleton, touching only those drops on that skeleton. Obvious in hindsight, and unnecessary on the PC build, but it takes time to get to the point where you really start to understand the code and can begin to change the way it works (rather than optimise a function to perform the same algorithm, just faster).

At this time we also saw the transition of the render thread to the status of low hanging fruit. So we started working on that – unrolling loops, prefetching, minimising data copying. Tweaking, poking, prodding and breaking, testing, measuring, then swearing or cheering and checking in.

The end of September rolled around and we still had no real end in sight. I kept talking to Strange Loop, trying to keep them up to date while we furiously worked to finish this game. As we’d already blown the initial quote, our contract specified that we would have reduce our daily rate for the remainder of the project, to be recouped on sales. So not only did we have significant time pressure, we now had financial pressure on top of that.

October

October was very productive – it saw over 150 bug fixes and optimisations checked in for the month. As the game was improving in performance and becoming more playable James, our QA guy, was able to play more and more of it. We hadn’t actually managed to do a complete play through of the game by this stage, so we focussed heavily on removing all blocking bugs and getting the game in a functionally shippable state (assuming that we would continue to make performance improvements).

Second week in, after discussions with Strange Loop Games we decided to postpone the release of the game to January. This meant submitting the game to Sony in Europe (SCEE) and in America (SCEA) by the end of the year, a mere 4 months after our initial estimate. There was no way I was going to let this date slip again.

Screenshot 2

Ben was still working on optimising the Lua execution and audio (which at this stage was still taking 4 or 5 ms per frame with peaks way higher than that). I’d thought that as Vessel used FMOD that it would all just work when I turned it on. Unfortunately, the audio was a complete mess on the PS3. While profiling the game thread in detail we discovered that it was taking up huge amounts of time, spiking and causing frame stutters as well as just plain behaving badly. It took weeks of intermittent work plus lots of discussions with the FMOD boys to figure out what the problems were. Turns out, many of the sounds were just set up incorrectly for the PS3 build – effects and layers that were disabled on PC were left on for PS3. Some sounds would loop infinitely and multiple sounds would play at the same time. All these things took time to discover, time to investigate a solution and time to fix. Something we had very little of.

To make things worse, Vessel performed no culling for the rendering, AI or audio. At all. The entire level was submitted to the renderer which culled it and then passed off the visible bits to the SPU renderer. Also, Lua scripts (which are slow at the best of times) were being called even when the things they were modifying weren’t in view. Audio was also being triggered by objects that were a long way from the player, relying on distance attenuation to reduce volume. All of this meant that there was a lot of work going on under the hood for no visible or audible impact.

We were still stumbling across weird little issues. For example, the animated textures in the game were never animating. They were loading correctly, but never changing. Ben tracked this down to an endian swap function which was broken (the PC and PS3 have different byte ordering for their numbers and so the same data when loaded from file must be modified depending on platform). This endian swap was only ever called in one place and only by this particular animated material.

The fix is often easy. Finding what to fix is hard.

November

Even though we had a tight timeline, we also had other client obligations which saw us each lose 2 weeks in November. This helped financially (well, it would have if they had paid on time) but put even more time pressure on us. Regardless, we were confident that we’d make the end of December deadline. Plenty of time.

We finally managed to get a complete play though of the game in November. Only to realise that the endgame didn’t work. Fix, patch, hack, test.

It was during the last week of November that Ben had a breakthrough. He bit the bullet and implemented a simple container based culling system for the game thread. All objects in a loaded level were placed within a rectangular container and there were an arbitrary number of these per level. This meant that we could quickly check what should be rendered or processed (Lua or audio) by looking at the container in view and those next to it. Conceptually simple, but implementation required that we modify the editor (which we’d not done before) and renderer, then visit each level in the editor and manually add the new containers then re-export everything.

This fix made a massive difference to performance. Audio was faster as it wasn’t playing as many sounds. There was less Lua processing happening as there were fewer things active. And rendering was faster as there was less being sent to the render thread. So it worked, and it worked well. In fact, we had to limit the frame rate to 30fps as some places were now rendering at 60fps on all threads!

Twitter

 

The container fix was instrumental in getting the game thread to a more playable speed, but it also introduced a lot of little bugs that took a month to fix. And still, it was too slow.

December

With only three weeks until the Christmas break we figured we needed to start cutting back on the content in some of the levels in order to get the frame rate up. We picked a few of the worst and closely examined them to try and figure out how we could speed them up. We found the following;

Trees were very costly (as were their roots). So we cut some of them down.

Drains were expensive, so we removed some of them.

Areas with lots of water were obviously expensive, so we reduced the amount of water in some areas, cutting back on the flow or adding drains to reduce the amount of water in pools.

We tweaked the object limiter code to ensure that there were always enough fluros to finish a given level, yet not too many to make it run too slow. Same with the number of drops and seeds.

All of the above helped, and the game was now running pretty well. There were still areas where it would slow down, and you could (and still can) slow down some areas intentionally by spraying lots of water and generating lots of fluros, but there was no time left for further optimisations. I’d already built 13 different SPU tasks to speed the physics up and one or two for the render thread – it was getting very hard to speed things up more, and at this stage it was risky to make any extra significant changes. This would just have to do.

James was now noticing some play specific bugs. Some fluros weren’t moving correctly – jumping too high and destroying themselves on the scenery or just behaving badly. Which is fine in most cases, but this was happening with a key fluro that you needed to use to complete the level. We had to modify the level to make it work again.

In addition to the optimisations that we were still implementing, and the bug fixes for the odd things that were happening, we also had to make sure that the game was TRC compliant. James trawled through that massive missive, highlighting things that we needed to check – the ways that save/load games functioned, how the game shut down, what would happen in the case of no hard drive space left, the order that trophies unlock, so many things. And so little time.

Screenshot 5

And, on top of that, the financial pressure on the company due to the length of time that Vessel was taking to port, plus the reduced pay rate we were getting for the overage and the fact that there was very little work lined up for the new year meant that I had to notify Ben that we were going to have to let him go.

It was one of the hardest things I’ve ever had to do. I felt like I’d betrayed him – he is an awesome coder and he’d become a good friend. And just before Xmas FFS. To make things worse, the day I had to do it he was working from home and I had to let him know over Skype. It was just awful.

He called me back within a couple of hours to tell me he had just managed to land a new job. In just two hours. I told you he was good.

So, back to the code. With a week and a bit to go we mainly focussed on the remaining TRC issues. The game was running as good as we were going to get it to in the time we had and I was satisfied with that. We weren’t hitting the full frame rate on all the levels or in all cases, but it would just have to do. TRC violations were disappearing and it looked like we might just make it.

In the last week (on the Wednesday) I realised that the file format that we’re using needed to be DRM enabled in order to be TRC compliant, so I spent 17 hours straight trying to fix it. Did my first all nighter in a long time and managed to solve the problem at 5am. I pushed through the next day, fixing, tweaking, hacking, testing while full of sugar and caffeine and then dragged myself home to sleep on the Thursday night.

Submission Time

The final Friday arrived. We performed some clean ups and readied the code for submission. We were going to make it! On one last pass over the TRC checklist I realised that we’d missed something. We needed to be able to report an error if any of the required startup files were missing, but the way that the engine was designed, there was no way to display anything until those critical files were loaded. We had a few hours so I quickly hacked together an emergency renderer that I could use in the case of a critical failure. I gave myself 2 hours to do it and it was working within 30 minutes. Awesome! But, upon further testing, the loading screens refused to work. I still have no idea why – the code I added was never executed and yet still affected the loading screens. The game itself played fine, but those bloody loading screens didn’t bloody work. I wasn’t willing to submit with what I knew would be a TRC failure so, once again, we’d missed our deadline.

We’d have to delay submission until next year.

To be continued…

Data Structures for Entity Systems – Contiguous memory

Original Author: Adam Martin

This year I’m working on two different projects that need an Entity System (ES). One of them is a non-game app written natively on iOS + Android. The other is an FPS in Unity3D.

There are Aliqua.org – to fix these problems, and I’m already using it in an app that’s in alpha-testing.

I’ll be blogging experiences, challenges, and ideas as I go.

this mirror (working CSS)

Background: focus on ES theory, or ES practice?

If you’re new to ES’s, you should read source code + articles from the ES wiki.

My posts focussed on theory: I wanted to inspire developers, and get people using an ES effectively. I was fighting institutionalised mistakes – e.g. the over-use of OOP in ES development – and I wrote provocatively to try and shock people out of their habits.

But I avoided telling people “how” to implement their ES. At the extreme, I feared it would end up specifying a complete Game Engine:

…OK. Fine. 7 years later, ES’s are widely understood and used well. It’s time to look at the practice: how do you make a “good” ES?

NB: I’ve always assumed that well-resourced teams – e.g. AAA studios – need no help writing a good ES. That’s why I focussed on theory: once you grok it, implementation concerns are no different from writing any game-engine code. These posts are aimed at non-AAA teams: those who lack the money (or expertise) to make an optimized ES first time around.

For my new ES library, I’m starting with the basics: Data Structures, and how you store your ES data in memory.

Where you see something that can be done better – please comment!

Aside on Terminology: “Processors, née Systems”

ES “Systems” should be batch-processing algorithms: you give them an array/stream of homogeneous data, and they repeat one algorithm on each row/item. Calling them “Processors” instead of “Systems” reduces confusion.

Why care about Data Structures?

There is a tension at the heart of Entity Systems:

  • In an ES game, we design our code to be Functional: independent, data-oriented, highly efficient for streaming, batching, and multi-threaded execution. Individual Processors should be largely independent, and easy to split out onto different CPU cores.
  • With the “Entity” (ID) itself, we tie those Functional chunks together into big, messy, inter-dependent, cross-functional … well, pretty much: BLOBs. And we expect everything to Just Work.

If our code/data were purely independent, we’d have many options for writing high-performance code in easy ways.

If our data were purely chunked, fixed at compile-time, we’d have tools that could auto-generate great code.

But combining the two, and muddling it around at runtime, poses tricky problems. For instance:

  1. Debugging: we’ve gone from clean, separable code you can hold in your head … to amorphous chunks that keep swelling and contracting from frame-to-frame. Ugh.
  2. Performance: we pretend that ES’s are fast, cache-efficient, streamable … but at runtime they’re the opposite: re-assembled every frame from their constituent parts, scattered all over memory
  3. Determinism: BLOBs are infamously difficult to reason about. How big? What’s in them? What’s the access cost? … we probably don’t know.

With a little care, ES’s handle these challenges well. Today I’m focussing on performance. Let’s look at the core need here:

  • Each frame, we must:
    1. Iterate over all the Processors
    2. For each Processor:
      1. Establish what subset of Entity/Component blobs it needs (e.g. “everything that has both a Position and a Velocity”)
      2. Select that from the global Entity/Component pool
      3. Send the data to the CPU, along with the code for the Processor itself

The easiest way to implement selection is to use Maps (aka Associative Arrays, aka Dictionaries). Each Processor asks for “all Components that meet [some criteria]“, and you jump around in memory, looking them up and putting them into a List, which you hand to the Processor.

But Maps scatter their data randomly across RAM, by design. And the words “jump around in memory” should have every game-developer whimpering: performance will be bad, very bad.

NB: my original ES articles not only use Maps, but give complete source implementations using them. To recap: even in 2011, Android phones could run realtime 30 FPS games using this. It’s slow – but fast enough for simple games

Volume of data in an ES game

We need some figures as a reference point. There’s not enough detailed analysis of ES’s in particular, so a while back I wrote an analysis of Components needed to make a Bomberman clone.

…that’s effectively a high-end mobile game / mid-tier desktop game.

Reaching back to 2003, we also have the slides from Scott’s GDC talk on Dungeon Siege.

…that’s effectively a (slightly old) AAA desktop game.

From that, we can predict:

  • Number of Component-types: 50 for AA, 150 for AAA
  • Number of unique assemblages (sets of Component-types on an Entity): 1k for AA, 10k for AAA
  • Number of Entities at runtime: 20k for AA, 100k for AAA
  • Size of each Component in bytes: 64bits * 10-50 primitives = 100-500 bytes

How do OS’s process data, fast?

In a modern game the sheer volume of data slows a modern computer to a crawl – unless you co-operate with the OS and Hardware. This is true of all games. CPU and RAM both run at a multiple of the bus-speed – the read/write part is massively slow compared to the CPU’s execution speed.

OS’s reduce this problem by pre-emptively reading chunks of memory and caching them on-board the CPU (or near enough). If the CPU is processing M1, it probably wants M2 next. You transfer M2 … Mn in parallel, and if the CPU asks for them next, it doesn’t have to wait.

Similarly, RAM hardware reads whole rows of data at once, and can transfer it faster than if you asked for each individual byte.

Net effect: Contiguous memory is King

If you store your data contiguously in RAM, it’ll be fast onto the Bus, the CPU will pre-fetch it, and it’ll remain in cache long enough for the CPU(s) to use it with no extra delays.

NB: this is independent of the programming-language you’re using. In C/C++ you can directly control the data flow, and manually optimize CPU-caching – but whatever language you use, it’ll be compiled down to something similar. Careful selection and use of data-structures will improve CPU/cache performance in almost all languages

But this requires that your CPU reads and writes that data in increasing order: M1, M2, M3, …, M(n).

With Data Structures, we’ll prioritize meeting these targets:

  1. All data will be as contiguous in RAM as possible; it might not be tightly-packed, but it will always be “in order”
  2. All EntitySystem Processors will process their data – every frame (tick) – in the order it sits in RAM
    • NOTE: a huge advantage of ES’s (when used correctly) is that they don’t care what order you process your gameobjects. This simplifies our performance problems
  3. Keep the structures simple and easy to use/debug
  4. Type-safety, compile-time checks, and auto-complete FTW.

The problem in detail: What goes wrong?

When talking about ES’s we often say that they allow or support contiguous data-access. What’s the problem? Isn’t that what we want?

NB: I’ll focus on C as the reference language because it’s the closest to raw hardware. This makes it easier to describe what’s happening, and to understand the nuances. However, these techniques should also be possible directly in your language of choice. e.g. Java’s ByteBuffer, Objective-C’s built-in C, etc.

Usually you see examples like a simple “Renderer” Processor:

  • Reads all Position components
    • (Position: { float: x, float y })
  • Each tick, draws a 10px x 10px black square at the Position of each Component

We can store all Position components in a tightly-packed Array:

compressed-simple-array

This is the most efficient way a computer can store / process them – everything contiguous, no wasted space. It also gives us the smallest possible memory footprint, and lets the RAM + Bus + CPU perform at top speed. It probably runs as fast or faster than any other engine architecture.

But … in reality, that’s uncommon or rare.

The hard case: One Processor reads/writes multiple Component-types

To see why, think about how we’d update the Positions. Perhaps a simple “Movement” Processor:

  • Reads all Position components and all Velocity components
    • (Position: { float: x, float y })
    • (Velocity: { float: dx, float dy })
  • Each tick, scales Velocity.dx by frame-time, and adds it to Position.x (and repeats for .dy / .y)
  • Writes the results directly to the Position components

“Houston, we have a problem”

This is no longer possible with a single, purely homogeneous array. There are many ways we can go from here, but none of them are as trivial or efficient as the tight-packed array we had before.

Depending on our Data Structure, we may be able to make a semi-homogeneous array: one that alternates “Position, Velocity, Position, Velocity, …” – or even an array-of-structs, with a struct that wraps: “{ Position, Velocity }”.

…or maybe not. This is where most of our effort will go.

The third scenario: Cross-referencing

There’s one more case we need to consider. Some games (for instance) let you pick up items and store them in an inventory. ARGH!

…this gives us an association not between Components (which we could handle by putting them on the same Entity), but between Entities.

To act on this, one of our Processors will be iterating across contiguous memory and will suddenly (unpredictably) need to read/write the data for a different Entity (and probably a different ComponentType) elsewhere.

This is slow and problematic, but it only happens thousands of times per second … while the other cases happen millions of times (they have to read EVERYTHING, multiple times – once per Processor). We’ll optimize the main cases first, and I’ll leave this one for a later post.

Iterating towards a solution…

So … our common-but-difficult case is: Processors reading multiple Components in parallel. We need a good DS to handle this.

Iteration 1: a BigArray per ComponentType

The most obvious way forwards is to store the EntityID of each row into our Arrays, so that you can match rows from different Arrays.

If we have a lot of spare memory, instead of “tightly-packing” our data into Arrays, we can use the array-index itself as the EntityID. This works because our EntityID’s are defined as integers – the same as an array-index.

rect3859

Usage algorithm:

  • For iterating, we send the whole Array at once
  • When a Processor needs to access N Components, we send it N * big-arrays
  • For random access, we can directly jump to the memory location
    • The Memory location is: (base address of Array) + (Component-size * EntityID)
    • The base-address can easily be kept/cached with the CPU while iterating
    • Bonus: Random access isn’t especially random; with some work, we could optimize it further

Problem 1: Blows the cache

This approach works for our “simple” scenario (1 Component / Processor). It seems to work for our “complex” case (multiple Components / Processor) – but in practice it fails.

We iterate through the Position array, and at each line we now have enough info to fetch the related row from the Velocity array. If both arrays are small enough to fit inside the CPU’s L1 cache (or at least the L2), then we’ll be OK.

Each instance is 500 bytes

Each BigArray has 20k entries

Total: 10 MegaBytes per BigArray

This quickly overwhelms the caches (even an L3 Cache would struggle to hold a single BigArray, let alone multiple). What happens net depends a lot on both the algorithm (does it read both arrays on every row? every 10th row?), and the platform (how does the OS handle RAM reads when the CPU cache is overloaded?).

We can optimize this per-platform, but I’d prefer to avoid the situation.

Problem 2: memory usage

Our typeArray’s will need to be approimately 10 megabytes each:

For 1 Component type: 20,000 Entities * 50 variables * 8 bytes each = 8 MB

…and that’s not so bad. Smaller components will give smaller typeArrays, helping a bit. And with a maximum of 50 unique ComponentTypes, we’ve got an upper bound of 500 MB for our entire ES. On a modern desktop, that’s bearable.

But if we’re doing mobile (Apple devices in 2014 still ship with 512 MB RAM), we’re way too big. Or if we’re doing dynamic textures and/or geometry, we’ll lose a lot of RAM to them, and be in trouble even on desktop.

Problem 3: streaming cost

This is tied to RAM usage, but sometimes it presents a bottleneck before you run out of memory.

The data has to be streamed from RAM to the CPU. If the data is purely contiguous (for each component-type, it is!), this will be “fast”, but … 500 MB data / frame? DDR3 peaks around 10 Gigabytes / second, i.e.:

Peak frame rate: 20 FPS … divided by the number of Processors

1 FPS sound good? No? Oh.

Summary: works for small games

If you can reduce your entity count by a factor of 10 (or even better: 100), this approach works fine.

  • Memory usage was only slightly too big; a factor of 10 reduction and we’re fine
  • CPU caching algorithms are often “good enough” to handle this for small datasets

The current build of Aliqua is using this approach. Not because I like it, but because it’s extremely quick and easy to implement. You can get surprisingly far with this approach – MyEarth runs at 60 FPS on an iPad, with almost no detectable overhead from the ES.

Iteration 2: the Mega-Array of Doom

Even on a small game, we often want to burst up to 100,000+ Entities. There are many things we could do to reduce RAM usage, but our biggest problem is the de-contiguous data (multiple independent Arrays). We shot ourselves in the foot. If we can fix that, our code will scale better.

es-datastructures-structured-bigarray

In an ideal world, the CPU wants us to interleave the components for each Entity. i.e. all the Components for a single Entity are adjacent in memory. When a Processor needs to “read from the Velocity and write to the Position”, it has both of them immediately to hand.

Problem 1: Interleaving only works for one set at a time

If we interleave “all Position’s with all Velocity’s”, we can’t interleave either of them with anything else. The Velocity’s are probably being generated by a different Processor – e.g. a Physics Engine – from yet another ComponentType.

mega-array

So, ultimately, the mega-array only lets us optimize one Processor – all the rest will find their data scattered semi-randomly across the mega-array.

NB: this may be acceptable for your game; I’ve seen cases where one or two Processors accounted for most of the CPU time. The authors optimized the DS for one Processor (and/or had a duplicate copy for the other Processor), and got enough speed boost not to worry about the rest

Summary: didn’t really help?

The Mega Array is too big, and it’s too interconnected. In a lot of ways, our “lots of smaller arrays – one per ComponentType” was a closer fit. Our Processors are mostly independent of one another, so our ideal Data Structure will probably consist of multiple independent structures.

Perhaps there’s a halfway-house?

Iteration 3: Add internal structure to our MegaArray

When you use an Entity System in a real game, and start debugging, you notice something interesting. Most people start with an EntityID counter that increases by 1 each time a new Entity is created. A side-effect is that the layout of components on entities becomes a “map” of your source code, showing how it executed, and in what order.

e.g. With the Iteration-1 BigArrays, my Position’s array might look like this:

rect3859

  1. First entity was an on-screen “loading” message, that needed a position
  2. BLANK (next entity holds info to say if loading is finished yet, which never renders, so has no position)
  3. BLANK (next entity is the metadata for the texture I’m loading in background; again: no position)
  4. Fourth entity is a 3d object which I’ll apply the texture to. I create this once the texture has finished loading, so that I can remove the “loading” message and display the object instead
  5. …etc

If the EntityID’s were generated randomly, I couldn’t say which Component was which simply by looking at the Array like this. Most ES’s generate ID’s sequentially because it’s fast, it’s easy to debug (and because “lastID++;” is quick to type ;)). But do they need to? Nope.

If we generate ID’s intelligently, we could impose some structure on our MegaArray, and simplify the problems…

  1. Whenever a new Entity is created, the caller gives a “hint” of the Component Types that entity is likely to acquire at some time during this run of the app
  2. Each time a new unique hint is presented, the EntitySystem pre-reserves a block of EntityID’s for “this and all future entities using the same hint”
  3. If a range runs out, no problem: we add a new range to the end of the MegaArray, with the same spec, and duplicate the range in the mini-table.
  4. Per frame, per Processor: we send a set of ranges within the MegaArray that are needed. The gaps will slow-down the RAM-to-CPU transfer a little – but not much

es-datastructures-structured-megaarray

Problem 1: Heterogeneity

Problem 1 from the MegaArray approach has been improved, but not completely solved.

When a new Entity is created that intends to have Position, Velocity, and Physics … do we include it as “Pos, Vel”, “Pos, Phys” … or create a new template, and append it at end of our MegaArray?

If we include it as a new template, and insist that templates are authoritative (i.e. the range for “Pos, Vel” templates only includes Entities with those Components, and no others) … we’ll rapidly fragment our mini-table. Every time an Entity gains or loses a Component, it will cause a split in the mini-table range.

Alternatively, if we define templates as indicative (i.e. the range for “Pos, Vel” contains things that are usually, but not always Pos + Vel combos), we’ll need some additional info to remember precisely which entities in that range really do have Pos + Vel.

Problem 2: Heterogeneity and Fragmentation from gaining/losing Components

When an Entity loses a Component, or gains one, it will mess-up our mini-table of ranges. The approach suggested above will work … the mini-table will tend to get more and more fragmented over time. Eventually every range is only one item long. At that point, we’ll be wasting a lot of bus-time and CPU-cache simply tracking which Entity is where.

NB: As far as I remember, it’s impossible to escape Fragmentation when working with dynamic data-structures – it’s a fundamental side effect of mutable data. So long as our fragmentating problems are “small” I’ll be happy.

Problem 3: Heterogeneity and Finding the Components within the Array

If we know that “Entity 4″ starts at byte-offset “2048″, and might have a Position and Velocity, that’s great.

But where do we find the Position? And the Velocity?

They’re at “some” offset from 2048 … but unless we know all the Components stored for Entity 4 … and what order they were appended / replaced … we have no idea which. Raw array-data is typeless by nature…

Iteration 4: More explicit structure; more indexing tables

We add a table holding “where does each Entity start”, and tables for each Component stating “offset for that Component within each Entity”. Conveniently, this also gives us a small, efficient index of “which Entities have Component (whatever)”:

es-datastructures-structured-megaarray-by-component

Problem 1: non-contiguous data!

To iterate over our gameobjects, we now need:

  • One big mega-array (contiguous)
  • N x mini arrays (probably scattered around memory)

Back to square one? Not quite – the mini-arrays are tiny. If we assume a limit of 128,000 entities, and at most 8kb of data for all Components on an Entity, our tables will be:

[ID: 17bits][Offset: 13 bits] = 30 bits per Component

…so that each mini-array is 1-40 kB in size. That’s small enough that several could fit in the cache at once.

Good enough? Maybe…

At this point, our iterations are quite good, but we’re seeing some recurring problems:

  • Re-allocation of arrays when Components are added/removed (I’ve not covered this above – if you’re not familiar with the problem, google “C dynamic array”)
  • Fragmentation (affects every iteration after Iteration 1, which doesn’t get any worse simple because it’s already as bad as it could be)
  • Cross-referencing (which I skipped)

I’ve also omitted history-tracking – none of the DS’s above facilitate snapshots or deltas of game state. This doesn’t matter for e.g. rendering – but for e.g. network code it becomes extremely important.

There’s also an elephant in the room: multi-threaded access to the ES. Some ES’s, and ES-related engines (*cough*Unity*cough*), simply give-up on MT. But the basis of an ES – independent, stateless, batch-oriented programming – is perfect for multi threading. So there must be a good way of getting there…

…which gives me a whole bunch of things to look at in future posts :).

PS … thanks to:

Writing these things takes ages. So much to say, so hard to keep it concise. I inflicted early drafts of this on a lot of people, and I wanted to say “thanks, guys” :). In no particular order (and sorry in advance if final version cut bits you thought should be in there, or vice versa): TCE’ers (especially Dog, Simon Cooke, doihaveto, archangelmorph, Hypercube, et al), ADB’ers (Amir Ebrahimi, Yggy King, Joseph Simons, Alex Darby, Thomas Young, etc). Final edit – and any stupid mistakes – are mine, but those people helped a lot with improving, simplifying, and explaining what I was trying to say.