Company growth and the development approach of Fragments of Him

Original Author: Tino van der Kraan

This blog is intended as a write-up of recent events and activities of our game development company SassyBot Studio. As a result, the contents of the blog reflect personal approaches and insights that we like to share and should not be taken as industry facts. We hope that by sharing our thoughts and experiences of past and future events it may help other start-up indie devs with struggles and questions of their own. We embrace contact and encourage you to share your adventures and lessons with us either here or through Twitter @SassyBotStudio.

Office space for digital media

As of this month, you can find SassyBot Studio in an office space of its very own. Well, that’s not technically true as we share it with a few other folks. Regardless, we believe that having a space for our projects and operations is invaluable.

It could be argued that, because the nature of our business is largely digital, game developers don’t really need office space. That would be true if virtual collaboration with others would be as rich, synchronous, and seamless as it is in the real world. The biggest bonus of having a dedicated space for work is most definitely the noticeable focus and productivity increase compared to working from home. Working on a different location outside of home gets you out of the everyday domestic distractions. It’s possible to find these work places at a library, university, a diner, or coffee shop. The real benefit of an office space over public work places is obviously the ability to customize the work space, determine your own work hours, and the peer pressure of showing up and putting in the necessary hours.

Of course, there are also downsides such as rent, insurance, furniture, equipment, and other matters that are mandatory or recommendable to a practice such as ours. Even though an office space comes with an initial time investment and financial cost, we think it is definitely worth the sacrifice as productivity has increased and collaboration is now a lot easier.

Fragments of Him update

Progress on our upcoming title Fragments of Him has been picking up over the last month, although other important matters, such as external work and the recent Casual Connect conference in Amsterdam, has taken a chunk out of our precious development time. An outline of the game’s design now exists and it is enough to block out the premise, characters, and mechanics for the game.

Based on the instructions and descriptions in the design, we have started blocking out environments that provide the player with context in which the game will be taking place. The basic gameplay systems are currently in place, such as player navigation and an art pipeline that allows for fast iterations.

Motion capture

The characters in Fragments of Him are going to be more dynamic than those that were seen in the prototype that is still up on Kongregate. Even though the story in the prototype could be told in a compelling way with static characters, we believe that lifelike animation can greatly emphasize the message, if acted out properly.

We want to have a lot of short, realistic animations in the game. In order to quickly iterate through motions until we have the right one, we think that a motion capture system could be greatly beneficial. Hand animating believable and lifelike character motions requires a substantial amount of time and expertise to pull off. Cleaning up animation keyframes from motion capture data takes less skill and it will probably be faster to get to the desired result. For these reasons, we are close to making the decision of creating a motion capture rig of our own.

The software we intend on using to create a motion capture setup that fits within our budget is called iPi Motion Capture (http://ipisoft.com/) and it supports consumer motion cameras such as Kinect and PS Eye as input devices. What we have seen up till now has made us very enthusiastic about the ease of use and overall possibilities using this technology.

Fragments of Him development approach

As we are creating a larger game than previously attempted, we have been doing quite a bit of high-level thinking on how we can best approach this project. It’s important that we make the best use of our available time and resources by catching problems early on when they do not yet have a considerable financial impact on the end result.

The high level approach we currently use for making Fragments of Him can be seen as a sequence of phases which we have labelled as follows:

  1. Horizontal slice phase
  2. Vertical slice phase
  3. Decor phase
  4. Polish phase

During production, we use the term horizontal slice to indicate the bare minimum that we need in order to have the game playable from A to Z. In this phase, we try to put the most important and rough objects into the game as soon as possible. This is also known as the MVP (Minimum Viable Product). The horizontal slice is made up out of critical elements that are necessary for the game to function.

We are going to approach the horizontal slice of Fragments of Him in this order:

  1. Environment blockout/whitebox
  2. Key interaction objects and functionality
  3. Critical interface elements
  4. Narrative system with placeholder scripts
  5. Main characters in crude forms
  6. Audio placeholders
    • Music
    • Sound effects
    • Foley

Great foley examples are ‘Attack of the Clones DVD – Episode II: Foley‘.

Good audio is critical to the experience of a narrative-heavy game such as Fragments of Him and, although this approach appears to place it at a low priority, we hope to move on to including it very rapidly. We expect to use free assets from classical, royalty-free, libraries before moving on to working on a finalized score for the final release.

The result from this phase can be used to figure out if the core value of the game can already be experienced. Usually, the first result will not create the experience that the design has in mind. From this point onwards, a lot of iterations need to take place that will nudge the game towards where it needs to be. When the core design is present and representable for the game’s intended experience, we can include playtesters into the iteration cycle who will be able to tell us whether or not the design and game is working as intended. While that happens, we can direct our attention towards creating a vertical slice.

Vertical slice phase

A vertical slice represents a short segment of the game to a standard that is representative of the final product. On a graphical level, this can set the standard as to how the rest of the game should look and feel. Additionally, this is useful for promotional purposes as it can give people a taste of what they can expect in the full game. If you are looking for funding for your game, then creating a vertical slice of your game can be used for this. With Fragments of Him, we intend to create a vertical slice for promotion in order to raise awareness and muster support. We’ll try to make this vertical slice available to as many conferences, game journalists, and let’s play content creators as possible.

To get an impression of what is required for a vertical slice, you could imagine the work required for the phases below applied to approximately a 10% segment of the final game. It is basically getting a taste for a piece of the cake.

Décor phase

The décor phase can be seen as the phase where non-essential assets and functionality gets added to the game. A lovely analogy for this is this:

Imagine actors rehearsing for a theater play. These actors picture the environments and props around them as they practice to get the entire piece acted out and presented properly. In many ways, that can be seen as the horizontal slice. The décor phase is where the set dressing, lights, music, audio, and character costumes are pulled out to complete the set for the grand performance.

Before décor

After décor

Essentially, the elements that get added in the décor phase of our game development process are to accommodate for mood, atmosphere, flow, and ambience that support the core of the game. The elements required for this phase are not critical for playing the game, but it does add tremendous value to the intended presentation. Some of the elements you can think of adding in this process are:

  • Scene dressing
  • Complete set of rigged character models
  • Textures
  • Additional, non-essential, animations
  • Scene lighting
  • Particles and effects
  • Additional, non-essential, functionality
  • Interface graphics
  • Complete audio set for music, sound effects, and foley

Polish phase

When we get to the point where all the elements of the game are put together, we will go into the polish phase. In this phase, we will not add any more assets or elements unless they will add considerable value to the game. Usually, this phase can take the longest to finish. Most of the work that is done in this phase is not really visible to the player and takes place behind the scenes. The purpose of this phase is to improve game performance by optimizing game assets and code as well as track down and minimize the occurrence of game breaking and experience hindering bugs. This process aims to make the game look as great as our resources allow us. Some of the activities we do in this process are:

  • Clean up and perform optimization of:
    • Meshes
    • Textures
    • Animations
    • Game systems
    • Interface
  • Tweak scene light parameters and light bake settings
  • Quality assurance of the game’s performance and experience

Conclusion

This last month has been pretty busy as we moved into SassyBot’s first office. Due to all this, the development of Fragments of Him has taken a slight backseat although we have made progress. We also decided that motion capture will very likely be the way we intend to get animations into the game and have researched cost-effective ways of doing this. Lastly, we have planned the development approach of Fragments of Him by dividing it up into the horizontal slice, vertical slice, décor, and polish phases. Thank you for reading all the way to the end and we hope you let us know what you think either through @SassyBotStudio).

Musical Dimensions: What Music Reveals About Your Game

Original Author: Chris Polus

The effort it takes to make a game is divided between several individuals, even more so that members of indie teams are often spread all over the world. Somebody does visuals, somebody game design, programming, 3D models and so on. But no matter what role I had in an indie team, be it composer, sound designer, team lead or producer: The more I knew about how other departments worked, the better I could do my job. Normally, composers are only needed for a short period of time. Thus, composers join and then leave indie teams again after they delivered their part of the puzzle. But what do composers work with so one can make the most of their time?

That’s why I’d like to open the box and give you an overview of what it is composers work with, what dimensions in music they have at their disposal in terms of music. Next time you run into one on your project, you maybe can even direct him / her better to get the best out of this team member and increase the quality of your game overall. The better you can direct a musician, the better you can phrase what you want, the easier, more efficient and enjoyable the collaboration will be for both sides.

First, why is there music at all?

I find this question very intriguing. Why is it so common to have music in games? Some games want to be as realistic as possible with graphics and sound and yet, there’s no music that accompanies our actions in the real world. No sad piano that starts playing when our dog dies, no string clusters and glissandi when we hear strange noises in our house and we suspect zombies. (Ha! This would be truly creepy if you suddenly heard a sinister music while sitting on your couch with the TV off). So why is there music in games? Is it too boring otherwise? Is the story not interesting enough? Is the ambience too quiet so we need something to listen to while we gather resources, jump our way to the next platform, or complete quests?

Maybe. Games can feel empty if there isn’t any music that fills the background. But in my opinion, there’s a far more important feat: I think music is the best and most effective way of transporting emotions in a way nothing else can. No sound, no people crying on-screen, no heartbreaking pictures can shake you up in the way music can. And easily so! Music has the power to significantly amplify whatever the game’s story requires the audience to feel. Thus, composers are able to influence the audience to the greatest extent. Music can add that extra tension to action packed car chase sequences, that narrow feel to horror first person shooters, or suggest an infinite landscape for adventurers to explore when, in reality, there is only a handful of hills and valleys.

Conclusion: One of the most powerful aspects of music is to transport emotions. Use this power in your game to leverage the player’s experience.

People have expectations, feed those expectations

Everybody has a long experience of playing games, watching movies, or traveling the world. We have a bag of personal experience of how things sound. Let’s think of a Western game for example. The hot ambience in Westerns is excellently transported by strummed guitars. Bells symbolize high noon, when there’s a shootout. Listen to a sample of Red Dead Redemption.

)

It’s because we have seen so many Westerns on TV that we can sort of guess that this music could be from a Western rather than a science fiction game. Maybe we can’t pin it down to a specific instrument or chord, but we have this gut feeling that’s based on our lifelong experience.

Composing music for games is about playing with those clichés to satisfy people’s expectations. At least up to a certain extent. To make it very clear: It would be very weird having electronic dance music in a Western style game, because this is simply not what people expect!

What music can tell us about location

Now that we know music can evoke emotions, and that we expect to hear certain music for certain kinds of genres, let’s pin this down further. For this example, let’s go from the Wild West all the way to Asia. The Asian culture is different from other cultures, obviously. Particularly interesting here is that the Asian culture developed different kinds of instruments! There is a plucked Japanese instrument called Erhu (make sure to listen to the sound bites on the Wikipedia pages). These instruments have a distinct sound and a special way they are played. As soon as you play a few notes with one of those instruments you instantly get this Asian feel in the music. Combine this with traditional Japanese scales and you immediately “know” this game has to play somewhere in Asia. Goal achieved.

Check out this sample from “Blossoms”, a free casual iOS game I worked on as the composer. It mixes Asian instruments with an orchestra, but still retains the Asian feel.

Other cultures developed their own instruments. A popular Middle Eastern instrument for example is the Armenian Duduk which evokes this wide open feel of an infinite desert. Here, too, make sure to listen to the sound bites on the Wikipedia page to get the feeling.

As you can see, different cultures have their specific instruments. We know the sound of these instruments from movies, documentaries, or personal travels. We associate their sound with a geographical region. If a game takes place, say, in the desert, I could try adding a Duduk to my composition to help make the setting more believable, because that’s the instrument you find in these parts of the world.

Conclusion: Music can give the audience a sense of geographical location. Help players immerse in your game by injecting the right location to the music.

What music can tell us about time

Similarly to our geographical sense there is a sense of time. If I wanted to compose music for a game that plays in the Middle Ages, I wouldn’t use a modern synthesizer as my go-to instrument. I’d look around to find videos that show what music people made in that time. And, apparently, there are specific instruments and scales that were used in the Middle Ages. Here’s an example of a medieval band called Flauto Dolce.

)

Did you get this feeling of kings, castles, jesters and feasts like I did? I would try and load the instruments from the video in my sampler (if I had them) and then try to mimic their playing style to get my first melodies down.

On the other hand, if I was composing for a modern, action-oriented Tron-style game, I’d look for cool and wobbly synthesizer sounds as this would feel more like a computer world with its technical, digital sounds to me. As a last example, think about the sixties era. It had its particular style of pop and rock songs one could recognize and associate with this time.

Conclusion: Music can give the audience an orientation in time. Use this to make your game feel modern, or ancient, or anything in between.

What music can tell us about the action

The pace of a game can be greatly manipulated with music. Action packed sequences benefit from fast rhythmical patterns. This could be percussion or rhythmical strings, horns or any other instrument. Here’s a combat example from the game Danny Baranowski.

)

Compare this to the next track of the same album, which is meditative and calm.

)

Depending on the rhythm of various instruments, activities in a game can be perceived as fast, slow, forward moving and so on. Fast patterns make the perfect companion to get an extra kick out of a car chase or a battle sequence, forward moving patterns like this one from the game Josh Whelchel.

)

Here, too, music just amplifies that feeling of velocity or danger. And when music changes from peaceful to fast moving, or even dangerous, you know something is about to happen in the game. ATTACK!

Conclusion: Music can give the audience a sense of the speed of the action. Use it to increase tension or make players feel relaxed, depending on the current gameplay.

What music can tell us about size

Some instruments and musical techniques suggest a physical size to things. High pitches tend to sound like small things. Slowly played instruments that play low pitches like double basses or tubas make you think of impressively large things. This doesn’t only go for things. The same effect can be applied to spaces like landscapes. Music can create an incredibly open feel and limitless freedom like in this stunning video by Michel Fletcher (go to 2m 20s).

A voice and a long reverb give a strong sense of grandeur. Likewise, music can create narrow spaces to support a claustrophobic, horror kind of feel with the help of sound effects. The pitch is a big factor in this equation.

Conclusion: Music can give the audience a sense of size of things or spaces. Use it to underline the huge monster that’s about to appear or the claustrophobic hallway the player is supposed to cross.

WOW, music does all that stuff!

All those aforementioned elements are like building blocks. Depending on what effects need to be achieved in a game, let’s say an epic battle, a calm resource gathering phase, a vast and infinite landscape for adventurers to explore, or a narrow, claustrophobic house filled with monsters, a composer can combine those building blocks in different ways.

  • emotional blocks (happy, sad, dangerous, horrified)
  • geographical blocks (Asia, Middle East, Ireland)
  • time blocks (future, sixties, Middle Age)
  • action blocks (fast, slow, forward moving)
  • space blocks (tiny, narrow, limitless)

Of course, these general directions, once mastered, can be explored creatively. Maybe you have a game that’s set in the future, but you deliberately use rusty-looking visuals and Middle Age instruments to get a medieval feel to stun your audience with an unusual combination. Or use Middle Age instruments to play modern melodies, as an accent in an otherwise electronic soundtrack. There are so many creative possibilities of using clichés, bending them, torturing them, creating completely new combinations of unheard-of music. And this is what it’s all about: Knowing the rules, exploring and bending them in creative ways.

With this introduction I hope to have given you an insight into the craft of composing music for games. You’re aware of the power music has to make the audience feel the way your story writer intended and you can communicate more efficiently what effect you want. Have fun and good luck with your game projects!

Vessel Post Mortem: Part 3

Original Author: Tony Albrecht

This is the final part of the Vessel Porting Trilogy. The previous parts can be found here


After the Christmas break (where I was forced into yet another family vacation) I dove straight back into the code, hoping to get the final issue fixed and the game submitted as soon as possible. There was still time to submit and if I passed Sony’s submission process first time then we could still have the game on the PSN store by the end of January. I fixed the remaining problem with the start up error reporting and added in a little more error checking just to be sure. I was pretty paranoid about the submission process as I didn’t want to delay the game anymore by failing – I carefully checked everything twice, crossed my fingers and hit the ‘submit’ button to Sony America (SCEA) on January 9.

M._Arkwright_Sprays_Lava_Fluros_with_Liquid_Gun

In parallel to the code submission process there is a ‘Metadata’ submission process. This corresponds to the PlayStation Store assets and the game’s presence there. It consists of all the text, images, trailers, prices etc and they all have very specific requirements that must be met in order to pass submission. James (our QA guy) managed most of this and involved communicating with Strange Loop’s art guy Milenko (who was incredibly responsive – I’m not sure he ever sleeps) and consisted of asking for different resolution images and screenshots, as well as sourcing the different language versions of the game text. It took us a few submissions of the meta data to get it all right, but the turnaround was pretty quick and a continuous dialog with the Sony helped a lot.

The code submission process consists of uploading the game in a special package format plus some extra documents that describe the trophies and other bits and pieces. We had to submit to both SCEA and Sony Europe (SCEE) so we could have the game released in both those regions. We hadn’t submitted to SCEE at the same time as SCEA as we were still waiting on some publisher registration details to come through, so all I could do was wait for that and for the response from SCEA on our initial submission while I busied myself with some other work.

On January 18th I received the first fail from Sony. As fails go, it was a pretty good one. There were three “Must Fix” bugs: one was due to my entering the wrong data in a field when submitting the game package, and two were due to saving games failing when there was no disk space left. They’d also mentioned some slowdown in some of the levels – I’d expected that, and as it wasn’t a critical bug, I ignored it. The save game bugs proved troublesome – Ben had written all of the save game code and with him gone I now had to learn how Sony wanted you to do it, how Ben had done it and how I could make it work correctly. It took me a few days to find and fix the problems and by this time the details that SCEE required had come through so I resubmitted to SCEE and SCEA at the same time (January 24)

I was quietly confident that it’d all pass this time. I’d thoroughly tested the save game code now and it all worked. What could go wrong? I seriously considered heading out and buying a bottle of Chimay Blue as a pre-emptive celebratory reward.

relaxing-with-a-chimay-blue1

JazzMonster

I eventually tracked the crash bugs down to a simple stupid error. There was a section of code that was responsible for limiting the amount of fluid, fluros and seeds in a given section. When the frame rate dropped below 60fps, this code would kick in and drop the number of drops/fluros or seeds to the minimum value deemed suitable for that level. During the final stages of development we had created a FINAL_RELEASE mode which turned off all of the unnecessary debugging code. Unfortunately, this erroneously included some code which updated the time values that the limiting code used, so the fluid, fluros and seeds were never being reduced. Ever. This meant that the game would have been running much slower than it should have, and was prone to crashes whenever certain hard coded limits were exceeded. I’ve never been so relieved to fail anything.

Something I’d like to focus on here is the level of performance that Vessel was submitted with; Ben and I spent a huge amount of our time just trying to squeeze as much performance out of this game as possible on the platform we had. For the most part, I’m pretty happy with the results. Yes, you can abuse the game and make it slow down quite easily – if you try to fill a room with fluros and water and seeds then you’ll most likely see some slowdown – but during normal play, the game maintains 60fps simulation at 30fps frame rate (the graphics is 30fps, so there are two simulation passes per visible frame). However there are a few levels where the sheer number of fluoros and fluid required to solve the puzzles means that the frame rate will drop to 20fps. Given another month I reckon I could have fixed that too, but given how far over time and budget we were, that wasn’t an option. So, while I’m happy with how much we improved the frame rate of the game, I’m still a little disappointed that we didn’t hit a full 60fps simulation everywhere.

Sure, the PlayStation 3 hardware did slow down the porting of the game initially, but given the complexity of the fluid simulation required I doubt that there will ever be an Xbox 360 version released. The SPUs allowed us to perform at least 80% of the physics simulation completely in parallel – with no impact on the main processor at all. There is no way you’d get the same level of performance on the X360.

PlayStation_3_Controller

With the FINAL_RELEASE bug fixed and the game resubmitted to SCEE and SCEA (which passed on the 20th and 22nd respectively) I was finally free! The metadata was all sorted and we’re now expecting the release on March 11 in the SCEA territories and March 12 in the SCEE regions. We’re all hoping it does well enough to cover our costs (as all devs with newborn progeny are).

What went right

Having a QA guy in house was invaluable. An impartial, non-technical third party to throw our fixes at kept us honest. As a developer you get very close to the game and like to assume that your fixes have made a positive difference and you’re moving forwards. That’s not always the case.

Honest communication with the Publisher/client. I tried to keep John at Strange Loop Games constantly up to date with the progress (or lack thereof) of the game. He was incredibly understanding with the continuous delays, and while it was hard to deliver bad news, at least it reduced the surprises.

Having good, experienced coders on the job. Bringing Ben onboard was a good choice, even though we had to let him go in the end. There is no way I could have done this without him. Working from the same physical office was also beneficial. I worked remotely for over 5 years leading up to this port, and I think we’d have delivered even later if we’d worked separately.

What went wrong

Work estimation. I fundamentally underestimated the amount of work required. I should have applied Yossarian’s Circular Estimation Conjecture and multiplied it by PI – that would have been closer to the mark. The big killer was the misunderstanding with which threads needed to run at 60fps. If it was just the physics I’d underestimated, we probably would have been a shade over a month late (maybe two), but with both the physics and game threads needing to execute in under 16.6ms, we really had our work cut out for us. The amount of time taken for submission shouldn’t be forgotten either; 4 to 8 weeks after the initial submission should see your game on the store.

I’d also recommend to anyone doing a console game that they get as much TRC compliance done as soon as possible in the development of the game. Get the load/save stuff working, trophies, error reporting, quitting, controller connection, file error reporting and the like in place and functioning sooner rather than later.

In Closing

The porting of this game was a trial for me. There was a week or two around September/October where I was really doubting that we could deliver a game that ran at a playable frame rate. I’m proud of what we delivered in the end, and if you’d taken away the financial and time pressures I would have thoroughly enjoyed the challenge. I’ve learnt a lot from this experience, and I’m looking forwards to the next one.

Just not right now, OK? Maybe a bit later in the year.

Vessel Post Mortem: Part 2

Original Author: Tony Albrecht

In the last episode, our intrepid heroes had discovered that rather than having almost finished the PlayStation 3 port of Vessel, they were barely half way there. Read on to find out what happens next…

(This is part 2 of a 3 part series on the porting of Vessel from PC to PS3. Part 1 can be found here)

With the realisation that both the game and render threads needed to run a 60fps, we had to reassess where we focussed our optimisation efforts. The game thread was doing a lot of different jobs – Lua scripting, playing audio, AI, animating sprites – all sorts of things. There was a lot to understand there, and on top of that, both the physics and render threads still needed more work. But performance wasn’t the only concern – there was TRC compliance, game play bugs, new bugs introduced by our changes and, as we eventually discovered, the need for a PC build (more on that later).

Vessel Screenshot 4

Now, Vessel was ‘mostly’ bug free on PC when we started – we did find a few bugs in the Lua and in game code and the few compiler and shader compiler bugs added to that, but there were subtle differences in the way the two platforms dealt with numbers which meant that weird things sometimes happened. For instance, a door on one of the levels would get stuck and your character would have to bump into it in order for it to open. This was due to a variation in position values about 5 decimal places in causing the jamming on PS3. Additionally, there was some code that was used in a number of places that did something like the following;

x = MIN( y/z, 1.0);

What this did on PC was catch the case where z tended to zero and clamped x to 1.0 ( MIN(infinity, 1.0) = 1.0). Unfortunately, on PS3 value/0.0 was QNAN and MIN(QNAN, 1.0) was QNAN. So the clamping never happened resulting in much weirdness in hard to reproduce places (it only occurred when z was zero), and was therefore a bugger to find.

This meant that we weren’t just optimising and adding in new PS3 functionality, we were also debugging a large foreign codebase. I hadn’t expected that we’d need to change Vessel much – it was a shipped game after all and so I had assumed that it was pretty much bug free.

I had also assumed that for the most part we wouldn’t need to change any assets significantly. Things like platform specific images for controllers and buttons were just texture changes and so weren’t a problem. Text was fine too. Unfortunately, problems like the sticking door mentioned above meant that we need to go into some of the levels and jiggle bits around. Later in the life of the port we had to tweak fluid flows, emission rates, drains and other fluid based effects to help with performance.

All of this meant that we needed to have the editor up and running, and the editor was built using the same engine as the game so we needed to maintain two builds of the game – one for PS3 and one for PC solely for the editor. This was done crudely by #defineing most of our PS3 changes out and leaving the PC code in there inside the #else conditional. This did slow us down a bit and also made our code that much uglier, but it had the side effect of allowing us to easily test if a bug was present in the PC build or was specific to the PS3.

By the end of September we had fixed many audio issues and managed to get LuaJIT (an optimised version of Lua) working again (it was mostly working but was causing some repeatable problems in some places). Ben also profiled the Lua memory usage (article here) and tweaked its garbage collection to be more CPU friendly (it was occasionally peaking at 13ms in a call). So, slowly, we were scraping back time spent on the game thread. It was still a serious problem, but it was getting better.

Vessel Screenshot 3

Back on the physics system, more and more the CPU code was being ported to the SPUs – in many ways, this was getting easier as I was building a lot of infrastructure which made it more convenient to port. Many of the patterns that I used for one system would translate well to others. I started reducing the amount of memory used by parts of the physics, like dropping the size of the WaterDrop struct to 128 bytes from 144 bytes meant better cache usage and simpler processing on SPU (nicely aligned structs meant that I could assume alignment for SIMD processing).

Some optimisations were straightforward. For example, I changed the way that we processed drops on fluros skeletons. Instead of iterating over the entire drop list and checking each drop to see if it was part of skeleton ‘N’ I changed it to pass over the drop list once, partitioning the drops into buckets of drops per skeleton. Then we could just process a given skeleton, touching only those drops on that skeleton. Obvious in hindsight, and unnecessary on the PC build, but it takes time to get to the point where you really start to understand the code and can begin to change the way it works (rather than optimise a function to perform the same algorithm, just faster).

At this time we also saw the transition of the render thread to the status of low hanging fruit. So we started working on that – unrolling loops, prefetching, minimising data copying. Tweaking, poking, prodding and breaking, testing, measuring, then swearing or cheering and checking in.

The end of September rolled around and we still had no real end in sight. I kept talking to Strange Loop, trying to keep them up to date while we furiously worked to finish this game. As we’d already blown the initial quote, our contract specified that we would have reduce our daily rate for the remainder of the project, to be recouped on sales. So not only did we have significant time pressure, we now had financial pressure on top of that.

October

October was very productive – it saw over 150 bug fixes and optimisations checked in for the month. As the game was improving in performance and becoming more playable James, our QA guy, was able to play more and more of it. We hadn’t actually managed to do a complete play through of the game by this stage, so we focussed heavily on removing all blocking bugs and getting the game in a functionally shippable state (assuming that we would continue to make performance improvements).

Second week in, after discussions with Strange Loop Games we decided to postpone the release of the game to January. This meant submitting the game to Sony in Europe (SCEE) and in America (SCEA) by the end of the year, a mere 4 months after our initial estimate. There was no way I was going to let this date slip again.

Screenshot 2

Ben was still working on optimising the Lua execution and audio (which at this stage was still taking 4 or 5 ms per frame with peaks way higher than that). I’d thought that as Vessel used FMOD that it would all just work when I turned it on. Unfortunately, the audio was a complete mess on the PS3. While profiling the game thread in detail we discovered that it was taking up huge amounts of time, spiking and causing frame stutters as well as just plain behaving badly. It took weeks of intermittent work plus lots of discussions with the FMOD boys to figure out what the problems were. Turns out, many of the sounds were just set up incorrectly for the PS3 build – effects and layers that were disabled on PC were left on for PS3. Some sounds would loop infinitely and multiple sounds would play at the same time. All these things took time to discover, time to investigate a solution and time to fix. Something we had very little of.

To make things worse, Vessel performed no culling for the rendering, AI or audio. At all. The entire level was submitted to the renderer which culled it and then passed off the visible bits to the SPU renderer. Also, Lua scripts (which are slow at the best of times) were being called even when the things they were modifying weren’t in view. Audio was also being triggered by objects that were a long way from the player, relying on distance attenuation to reduce volume. All of this meant that there was a lot of work going on under the hood for no visible or audible impact.

We were still stumbling across weird little issues. For example, the animated textures in the game were never animating. They were loading correctly, but never changing. Ben tracked this down to an endian swap function which was broken (the PC and PS3 have different byte ordering for their numbers and so the same data when loaded from file must be modified depending on platform). This endian swap was only ever called in one place and only by this particular animated material.

The fix is often easy. Finding what to fix is hard.

November

Even though we had a tight timeline, we also had other client obligations which saw us each lose 2 weeks in November. This helped financially (well, it would have if they had paid on time) but put even more time pressure on us. Regardless, we were confident that we’d make the end of December deadline. Plenty of time.

We finally managed to get a complete play though of the game in November. Only to realise that the endgame didn’t work. Fix, patch, hack, test.

It was during the last week of November that Ben had a breakthrough. He bit the bullet and implemented a simple container based culling system for the game thread. All objects in a loaded level were placed within a rectangular container and there were an arbitrary number of these per level. This meant that we could quickly check what should be rendered or processed (Lua or audio) by looking at the container in view and those next to it. Conceptually simple, but implementation required that we modify the editor (which we’d not done before) and renderer, then visit each level in the editor and manually add the new containers then re-export everything.

This fix made a massive difference to performance. Audio was faster as it wasn’t playing as many sounds. There was less Lua processing happening as there were fewer things active. And rendering was faster as there was less being sent to the render thread. So it worked, and it worked well. In fact, we had to limit the frame rate to 30fps as some places were now rendering at 60fps on all threads!

Twitter

 

The container fix was instrumental in getting the game thread to a more playable speed, but it also introduced a lot of little bugs that took a month to fix. And still, it was too slow.

December

With only three weeks until the Christmas break we figured we needed to start cutting back on the content in some of the levels in order to get the frame rate up. We picked a few of the worst and closely examined them to try and figure out how we could speed them up. We found the following;

Trees were very costly (as were their roots). So we cut some of them down.

Drains were expensive, so we removed some of them.

Areas with lots of water were obviously expensive, so we reduced the amount of water in some areas, cutting back on the flow or adding drains to reduce the amount of water in pools.

We tweaked the object limiter code to ensure that there were always enough fluros to finish a given level, yet not too many to make it run too slow. Same with the number of drops and seeds.

All of the above helped, and the game was now running pretty well. There were still areas where it would slow down, and you could (and still can) slow down some areas intentionally by spraying lots of water and generating lots of fluros, but there was no time left for further optimisations. I’d already built 13 different SPU tasks to speed the physics up and one or two for the render thread – it was getting very hard to speed things up more, and at this stage it was risky to make any extra significant changes. This would just have to do.

James was now noticing some play specific bugs. Some fluros weren’t moving correctly – jumping too high and destroying themselves on the scenery or just behaving badly. Which is fine in most cases, but this was happening with a key fluro that you needed to use to complete the level. We had to modify the level to make it work again.

In addition to the optimisations that we were still implementing, and the bug fixes for the odd things that were happening, we also had to make sure that the game was TRC compliant. James trawled through that massive missive, highlighting things that we needed to check – the ways that save/load games functioned, how the game shut down, what would happen in the case of no hard drive space left, the order that trophies unlock, so many things. And so little time.

Screenshot 5

And, on top of that, the financial pressure on the company due to the length of time that Vessel was taking to port, plus the reduced pay rate we were getting for the overage and the fact that there was very little work lined up for the new year meant that I had to notify Ben that we were going to have to let him go.

It was one of the hardest things I’ve ever had to do. I felt like I’d betrayed him – he is an awesome coder and he’d become a good friend. And just before Xmas FFS. To make things worse, the day I had to do it he was working from home and I had to let him know over Skype. It was just awful.

He called me back within a couple of hours to tell me he had just managed to land a new job. In just two hours. I told you he was good.

So, back to the code. With a week and a bit to go we mainly focussed on the remaining TRC issues. The game was running as good as we were going to get it to in the time we had and I was satisfied with that. We weren’t hitting the full frame rate on all the levels or in all cases, but it would just have to do. TRC violations were disappearing and it looked like we might just make it.

In the last week (on the Wednesday) I realised that the file format that we’re using needed to be DRM enabled in order to be TRC compliant, so I spent 17 hours straight trying to fix it. Did my first all nighter in a long time and managed to solve the problem at 5am. I pushed through the next day, fixing, tweaking, hacking, testing while full of sugar and caffeine and then dragged myself home to sleep on the Thursday night.

Submission Time

The final Friday arrived. We performed some clean ups and readied the code for submission. We were going to make it! On one last pass over the TRC checklist I realised that we’d missed something. We needed to be able to report an error if any of the required startup files were missing, but the way that the engine was designed, there was no way to display anything until those critical files were loaded. We had a few hours so I quickly hacked together an emergency renderer that I could use in the case of a critical failure. I gave myself 2 hours to do it and it was working within 30 minutes. Awesome! But, upon further testing, the loading screens refused to work. I still have no idea why – the code I added was never executed and yet still affected the loading screens. The game itself played fine, but those bloody loading screens didn’t bloody work. I wasn’t willing to submit with what I knew would be a TRC failure so, once again, we’d missed our deadline.

We’d have to delay submission until next year.

To be continued…

Data Structures for Entity Systems – Contiguous memory

Original Author: Adam Martin

This year I’m working on two different projects that need an Entity System (ES). One of them is a non-game app written natively on iOS + Android. The other is an FPS in Unity3D.

There are Aliqua.org – to fix these problems, and I’m already using it in an app that’s in alpha-testing.

I’ll be blogging experiences, challenges, and ideas as I go.

this mirror (working CSS)

Background: focus on ES theory, or ES practice?

If you’re new to ES’s, you should read source code + articles from the ES wiki.

My posts focussed on theory: I wanted to inspire developers, and get people using an ES effectively. I was fighting institutionalised mistakes – e.g. the over-use of OOP in ES development – and I wrote provocatively to try and shock people out of their habits.

But I avoided telling people “how” to implement their ES. At the extreme, I feared it would end up specifying a complete Game Engine:

…OK. Fine. 7 years later, ES’s are widely understood and used well. It’s time to look at the practice: how do you make a “good” ES?

NB: I’ve always assumed that well-resourced teams – e.g. AAA studios – need no help writing a good ES. That’s why I focussed on theory: once you grok it, implementation concerns are no different from writing any game-engine code. These posts are aimed at non-AAA teams: those who lack the money (or expertise) to make an optimized ES first time around.

For my new ES library, I’m starting with the basics: Data Structures, and how you store your ES data in memory.

Where you see something that can be done better – please comment!

Aside on Terminology: “Processors, née Systems”

ES “Systems” should be batch-processing algorithms: you give them an array/stream of homogeneous data, and they repeat one algorithm on each row/item. Calling them “Processors” instead of “Systems” reduces confusion.

Why care about Data Structures?

There is a tension at the heart of Entity Systems:

  • In an ES game, we design our code to be Functional: independent, data-oriented, highly efficient for streaming, batching, and multi-threaded execution. Individual Processors should be largely independent, and easy to split out onto different CPU cores.
  • With the “Entity” (ID) itself, we tie those Functional chunks together into big, messy, inter-dependent, cross-functional … well, pretty much: BLOBs. And we expect everything to Just Work.

If our code/data were purely independent, we’d have many options for writing high-performance code in easy ways.

If our data were purely chunked, fixed at compile-time, we’d have tools that could auto-generate great code.

But combining the two, and muddling it around at runtime, poses tricky problems. For instance:

  1. Debugging: we’ve gone from clean, separable code you can hold in your head … to amorphous chunks that keep swelling and contracting from frame-to-frame. Ugh.
  2. Performance: we pretend that ES’s are fast, cache-efficient, streamable … but at runtime they’re the opposite: re-assembled every frame from their constituent parts, scattered all over memory
  3. Determinism: BLOBs are infamously difficult to reason about. How big? What’s in them? What’s the access cost? … we probably don’t know.

With a little care, ES’s handle these challenges well. Today I’m focussing on performance. Let’s look at the core need here:

  • Each frame, we must:
    1. Iterate over all the Processors
    2. For each Processor:
      1. Establish what subset of Entity/Component blobs it needs (e.g. “everything that has both a Position and a Velocity”)
      2. Select that from the global Entity/Component pool
      3. Send the data to the CPU, along with the code for the Processor itself

The easiest way to implement selection is to use Maps (aka Associative Arrays, aka Dictionaries). Each Processor asks for “all Components that meet [some criteria]“, and you jump around in memory, looking them up and putting them into a List, which you hand to the Processor.

But Maps scatter their data randomly across RAM, by design. And the words “jump around in memory” should have every game-developer whimpering: performance will be bad, very bad.

NB: my original ES articles not only use Maps, but give complete source implementations using them. To recap: even in 2011, Android phones could run realtime 30 FPS games using this. It’s slow – but fast enough for simple games

Volume of data in an ES game

We need some figures as a reference point. There’s not enough detailed analysis of ES’s in particular, so a while back I wrote an analysis of Components needed to make a Bomberman clone.

…that’s effectively a high-end mobile game / mid-tier desktop game.

Reaching back to 2003, we also have the slides from Scott’s GDC talk on Dungeon Siege.

…that’s effectively a (slightly old) AAA desktop game.

From that, we can predict:

  • Number of Component-types: 50 for AA, 150 for AAA
  • Number of unique assemblages (sets of Component-types on an Entity): 1k for AA, 10k for AAA
  • Number of Entities at runtime: 20k for AA, 100k for AAA
  • Size of each Component in bytes: 64bits * 10-50 primitives = 100-500 bytes

How do OS’s process data, fast?

In a modern game the sheer volume of data slows a modern computer to a crawl – unless you co-operate with the OS and Hardware. This is true of all games. CPU and RAM both run at a multiple of the bus-speed – the read/write part is massively slow compared to the CPU’s execution speed.

OS’s reduce this problem by pre-emptively reading chunks of memory and caching them on-board the CPU (or near enough). If the CPU is processing M1, it probably wants M2 next. You transfer M2 … Mn in parallel, and if the CPU asks for them next, it doesn’t have to wait.

Similarly, RAM hardware reads whole rows of data at once, and can transfer it faster than if you asked for each individual byte.

Net effect: Contiguous memory is King

If you store your data contiguously in RAM, it’ll be fast onto the Bus, the CPU will pre-fetch it, and it’ll remain in cache long enough for the CPU(s) to use it with no extra delays.

NB: this is independent of the programming-language you’re using. In C/C++ you can directly control the data flow, and manually optimize CPU-caching – but whatever language you use, it’ll be compiled down to something similar. Careful selection and use of data-structures will improve CPU/cache performance in almost all languages

But this requires that your CPU reads and writes that data in increasing order: M1, M2, M3, …, M(n).

With Data Structures, we’ll prioritize meeting these targets:

  1. All data will be as contiguous in RAM as possible; it might not be tightly-packed, but it will always be “in order”
  2. All EntitySystem Processors will process their data – every frame (tick) – in the order it sits in RAM
    • NOTE: a huge advantage of ES’s (when used correctly) is that they don’t care what order you process your gameobjects. This simplifies our performance problems
  3. Keep the structures simple and easy to use/debug
  4. Type-safety, compile-time checks, and auto-complete FTW.

The problem in detail: What goes wrong?

When talking about ES’s we often say that they allow or support contiguous data-access. What’s the problem? Isn’t that what we want?

NB: I’ll focus on C as the reference language because it’s the closest to raw hardware. This makes it easier to describe what’s happening, and to understand the nuances. However, these techniques should also be possible directly in your language of choice. e.g. Java’s ByteBuffer, Objective-C’s built-in C, etc.

Usually you see examples like a simple “Renderer” Processor:

  • Reads all Position components
    • (Position: { float: x, float y })
  • Each tick, draws a 10px x 10px black square at the Position of each Component

We can store all Position components in a tightly-packed Array:

compressed-simple-array

This is the most efficient way a computer can store / process them – everything contiguous, no wasted space. It also gives us the smallest possible memory footprint, and lets the RAM + Bus + CPU perform at top speed. It probably runs as fast or faster than any other engine architecture.

But … in reality, that’s uncommon or rare.

The hard case: One Processor reads/writes multiple Component-types

To see why, think about how we’d update the Positions. Perhaps a simple “Movement” Processor:

  • Reads all Position components and all Velocity components
    • (Position: { float: x, float y })
    • (Velocity: { float: dx, float dy })
  • Each tick, scales Velocity.dx by frame-time, and adds it to Position.x (and repeats for .dy / .y)
  • Writes the results directly to the Position components

“Houston, we have a problem”

This is no longer possible with a single, purely homogeneous array. There are many ways we can go from here, but none of them are as trivial or efficient as the tight-packed array we had before.

Depending on our Data Structure, we may be able to make a semi-homogeneous array: one that alternates “Position, Velocity, Position, Velocity, …” – or even an array-of-structs, with a struct that wraps: “{ Position, Velocity }”.

…or maybe not. This is where most of our effort will go.

The third scenario: Cross-referencing

There’s one more case we need to consider. Some games (for instance) let you pick up items and store them in an inventory. ARGH!

…this gives us an association not between Components (which we could handle by putting them on the same Entity), but between Entities.

To act on this, one of our Processors will be iterating across contiguous memory and will suddenly (unpredictably) need to read/write the data for a different Entity (and probably a different ComponentType) elsewhere.

This is slow and problematic, but it only happens thousands of times per second … while the other cases happen millions of times (they have to read EVERYTHING, multiple times – once per Processor). We’ll optimize the main cases first, and I’ll leave this one for a later post.

Iterating towards a solution…

So … our common-but-difficult case is: Processors reading multiple Components in parallel. We need a good DS to handle this.

Iteration 1: a BigArray per ComponentType

The most obvious way forwards is to store the EntityID of each row into our Arrays, so that you can match rows from different Arrays.

If we have a lot of spare memory, instead of “tightly-packing” our data into Arrays, we can use the array-index itself as the EntityID. This works because our EntityID’s are defined as integers – the same as an array-index.

rect3859

Usage algorithm:

  • For iterating, we send the whole Array at once
  • When a Processor needs to access N Components, we send it N * big-arrays
  • For random access, we can directly jump to the memory location
    • The Memory location is: (base address of Array) + (Component-size * EntityID)
    • The base-address can easily be kept/cached with the CPU while iterating
    • Bonus: Random access isn’t especially random; with some work, we could optimize it further

Problem 1: Blows the cache

This approach works for our “simple” scenario (1 Component / Processor). It seems to work for our “complex” case (multiple Components / Processor) – but in practice it fails.

We iterate through the Position array, and at each line we now have enough info to fetch the related row from the Velocity array. If both arrays are small enough to fit inside the CPU’s L1 cache (or at least the L2), then we’ll be OK.

Each instance is 500 bytes

Each BigArray has 20k entries

Total: 10 MegaBytes per BigArray

This quickly overwhelms the caches (even an L3 Cache would struggle to hold a single BigArray, let alone multiple). What happens net depends a lot on both the algorithm (does it read both arrays on every row? every 10th row?), and the platform (how does the OS handle RAM reads when the CPU cache is overloaded?).

We can optimize this per-platform, but I’d prefer to avoid the situation.

Problem 2: memory usage

Our typeArray’s will need to be approimately 10 megabytes each:

For 1 Component type: 20,000 Entities * 50 variables * 8 bytes each = 8 MB

…and that’s not so bad. Smaller components will give smaller typeArrays, helping a bit. And with a maximum of 50 unique ComponentTypes, we’ve got an upper bound of 500 MB for our entire ES. On a modern desktop, that’s bearable.

But if we’re doing mobile (Apple devices in 2014 still ship with 512 MB RAM), we’re way too big. Or if we’re doing dynamic textures and/or geometry, we’ll lose a lot of RAM to them, and be in trouble even on desktop.

Problem 3: streaming cost

This is tied to RAM usage, but sometimes it presents a bottleneck before you run out of memory.

The data has to be streamed from RAM to the CPU. If the data is purely contiguous (for each component-type, it is!), this will be “fast”, but … 500 MB data / frame? DDR3 peaks around 10 Gigabytes / second, i.e.:

Peak frame rate: 20 FPS … divided by the number of Processors

1 FPS sound good? No? Oh.

Summary: works for small games

If you can reduce your entity count by a factor of 10 (or even better: 100), this approach works fine.

  • Memory usage was only slightly too big; a factor of 10 reduction and we’re fine
  • CPU caching algorithms are often “good enough” to handle this for small datasets

The current build of Aliqua is using this approach. Not because I like it, but because it’s extremely quick and easy to implement. You can get surprisingly far with this approach – MyEarth runs at 60 FPS on an iPad, with almost no detectable overhead from the ES.

Iteration 2: the Mega-Array of Doom

Even on a small game, we often want to burst up to 100,000+ Entities. There are many things we could do to reduce RAM usage, but our biggest problem is the de-contiguous data (multiple independent Arrays). We shot ourselves in the foot. If we can fix that, our code will scale better.

es-datastructures-structured-bigarray

In an ideal world, the CPU wants us to interleave the components for each Entity. i.e. all the Components for a single Entity are adjacent in memory. When a Processor needs to “read from the Velocity and write to the Position”, it has both of them immediately to hand.

Problem 1: Interleaving only works for one set at a time

If we interleave “all Position’s with all Velocity’s”, we can’t interleave either of them with anything else. The Velocity’s are probably being generated by a different Processor – e.g. a Physics Engine – from yet another ComponentType.

mega-array

So, ultimately, the mega-array only lets us optimize one Processor – all the rest will find their data scattered semi-randomly across the mega-array.

NB: this may be acceptable for your game; I’ve seen cases where one or two Processors accounted for most of the CPU time. The authors optimized the DS for one Processor (and/or had a duplicate copy for the other Processor), and got enough speed boost not to worry about the rest

Summary: didn’t really help?

The Mega Array is too big, and it’s too interconnected. In a lot of ways, our “lots of smaller arrays – one per ComponentType” was a closer fit. Our Processors are mostly independent of one another, so our ideal Data Structure will probably consist of multiple independent structures.

Perhaps there’s a halfway-house?

Iteration 3: Add internal structure to our MegaArray

When you use an Entity System in a real game, and start debugging, you notice something interesting. Most people start with an EntityID counter that increases by 1 each time a new Entity is created. A side-effect is that the layout of components on entities becomes a “map” of your source code, showing how it executed, and in what order.

e.g. With the Iteration-1 BigArrays, my Position’s array might look like this:

rect3859

  1. First entity was an on-screen “loading” message, that needed a position
  2. BLANK (next entity holds info to say if loading is finished yet, which never renders, so has no position)
  3. BLANK (next entity is the metadata for the texture I’m loading in background; again: no position)
  4. Fourth entity is a 3d object which I’ll apply the texture to. I create this once the texture has finished loading, so that I can remove the “loading” message and display the object instead
  5. …etc

If the EntityID’s were generated randomly, I couldn’t say which Component was which simply by looking at the Array like this. Most ES’s generate ID’s sequentially because it’s fast, it’s easy to debug (and because “lastID++;” is quick to type ;)). But do they need to? Nope.

If we generate ID’s intelligently, we could impose some structure on our MegaArray, and simplify the problems…

  1. Whenever a new Entity is created, the caller gives a “hint” of the Component Types that entity is likely to acquire at some time during this run of the app
  2. Each time a new unique hint is presented, the EntitySystem pre-reserves a block of EntityID’s for “this and all future entities using the same hint”
  3. If a range runs out, no problem: we add a new range to the end of the MegaArray, with the same spec, and duplicate the range in the mini-table.
  4. Per frame, per Processor: we send a set of ranges within the MegaArray that are needed. The gaps will slow-down the RAM-to-CPU transfer a little – but not much

es-datastructures-structured-megaarray

Problem 1: Heterogeneity

Problem 1 from the MegaArray approach has been improved, but not completely solved.

When a new Entity is created that intends to have Position, Velocity, and Physics … do we include it as “Pos, Vel”, “Pos, Phys” … or create a new template, and append it at end of our MegaArray?

If we include it as a new template, and insist that templates are authoritative (i.e. the range for “Pos, Vel” templates only includes Entities with those Components, and no others) … we’ll rapidly fragment our mini-table. Every time an Entity gains or loses a Component, it will cause a split in the mini-table range.

Alternatively, if we define templates as indicative (i.e. the range for “Pos, Vel” contains things that are usually, but not always Pos + Vel combos), we’ll need some additional info to remember precisely which entities in that range really do have Pos + Vel.

Problem 2: Heterogeneity and Fragmentation from gaining/losing Components

When an Entity loses a Component, or gains one, it will mess-up our mini-table of ranges. The approach suggested above will work … the mini-table will tend to get more and more fragmented over time. Eventually every range is only one item long. At that point, we’ll be wasting a lot of bus-time and CPU-cache simply tracking which Entity is where.

NB: As far as I remember, it’s impossible to escape Fragmentation when working with dynamic data-structures – it’s a fundamental side effect of mutable data. So long as our fragmentating problems are “small” I’ll be happy.

Problem 3: Heterogeneity and Finding the Components within the Array

If we know that “Entity 4″ starts at byte-offset “2048″, and might have a Position and Velocity, that’s great.

But where do we find the Position? And the Velocity?

They’re at “some” offset from 2048 … but unless we know all the Components stored for Entity 4 … and what order they were appended / replaced … we have no idea which. Raw array-data is typeless by nature…

Iteration 4: More explicit structure; more indexing tables

We add a table holding “where does each Entity start”, and tables for each Component stating “offset for that Component within each Entity”. Conveniently, this also gives us a small, efficient index of “which Entities have Component (whatever)”:

es-datastructures-structured-megaarray-by-component

Problem 1: non-contiguous data!

To iterate over our gameobjects, we now need:

  • One big mega-array (contiguous)
  • N x mini arrays (probably scattered around memory)

Back to square one? Not quite – the mini-arrays are tiny. If we assume a limit of 128,000 entities, and at most 8kb of data for all Components on an Entity, our tables will be:

[ID: 17bits][Offset: 13 bits] = 30 bits per Component

…so that each mini-array is 1-40 kB in size. That’s small enough that several could fit in the cache at once.

Good enough? Maybe…

At this point, our iterations are quite good, but we’re seeing some recurring problems:

  • Re-allocation of arrays when Components are added/removed (I’ve not covered this above – if you’re not familiar with the problem, google “C dynamic array”)
  • Fragmentation (affects every iteration after Iteration 1, which doesn’t get any worse simple because it’s already as bad as it could be)
  • Cross-referencing (which I skipped)

I’ve also omitted history-tracking – none of the DS’s above facilitate snapshots or deltas of game state. This doesn’t matter for e.g. rendering – but for e.g. network code it becomes extremely important.

There’s also an elephant in the room: multi-threaded access to the ES. Some ES’s, and ES-related engines (*cough*Unity*cough*), simply give-up on MT. But the basis of an ES – independent, stateless, batch-oriented programming – is perfect for multi threading. So there must be a good way of getting there…

…which gives me a whole bunch of things to look at in future posts :).

PS … thanks to:

Writing these things takes ages. So much to say, so hard to keep it concise. I inflicted early drafts of this on a lot of people, and I wanted to say “thanks, guys” :). In no particular order (and sorry in advance if final version cut bits you thought should be in there, or vice versa): TCE’ers (especially Dog, Simon Cooke, doihaveto, archangelmorph, Hypercube, et al), ADB’ers (Amir Ebrahimi, Yggy King, Joseph Simons, Alex Darby, Thomas Young, etc). Final edit – and any stupid mistakes – are mine, but those people helped a lot with improving, simplifying, and explaining what I was trying to say.

Implementing a code generator with libclang

Original Author: Tamás Szelei

10.5708/szelei.2014.A.1]

Introduction

Click here to skip the introduction and see the code.

The following article covers the process of implementing a practical code generator for C++ in detail. You can find the full source code for the article Source Code Tales.

A code generator is a very useful asset in a larger C++ project. Due to the lack of introspection in the language, implementing the likes of reflection, script binding and serialization requires writing some sort of boilerplate that essentially keeps the data which is otherwise thrown away by the compiler. These solutions are either intrusive (heavily macro-based, thus hard to debug and require weird syntax in declarations) or fragile (the boilerplate must be constantly updated to follow the actual code, and might break without warning). One way to improve the robustness is to automate writing this boilerplate. In order to achieve this, we need to parse the code somehow, in other words, understand what information to keep. However, parsing C++ is an extremely complex task, and with the copious amount of weird corner cases, we are in for quite a ride if we attempt to do so.

Attempts to parse a “good enough” subset of C++ generally fail or require the project to follow strict coding guidelines. That is, to avoid syntax that the parser can’t understand – and may break at any time when someone commits code that doesn’t follow these guidelines. The excellent official documentation is pretty much only the Doxygen-generated reference, which is very useful, but not as an introduction to the usage of the library; due to the complex nature of the problem, it has a steep learning curve.

I am going to present the process of implementing a practical code generator for C++ using libclang and its Python bindings. “Practical” in the sense that it is not a general solution. What I’m presenting here is not meant to be taken as the concrete implementation I made but rather as one possible way of solving a problem, a detailed example. Using these ideas, it is possible to create an all-encompassing reflection solution or to generate code for existing libraries of any purpose. The aim is to write natural C++ syntax with minimal intrusive bits to provide the functionality. I encourage readers to experiment with the code and to try and implement a code generator from scratch.

I can’t go on without mentioning Eli Bendersky’s excellent post on the topic, which served as a great resource when I started gathering information.

Prerequisites

To understand everything in this article, I recommend that the reader knows:

  • Intermediate C++
  • Intermediate Python
  • What serialization and reflection is
  • What an AST (Abstract Syntax Tree) is

Required tools:

  • Python 2.7 (3.x will probably also work apart from little things)
  • LLVM 3.2+
  • libclang python bindings (I recommend installing with pip: pip install clang)
  • A text editor or IDE (mostly for editing Python code)

The example problem

In our example, we are implementing automatic script binding, so we don’t need to write and maintain binding code by hand. We also want to be able to omit certain parts of the source so that they are not taken into account when the binding boilerplate code is generated ((Such a feature is useful when the C++ layer is meant to serve as a low level implementatation of features tied together by scripts. In some cases we want to keep functions public while hiding them from the scripting interface (e.g. debug functionality).)). Keep in mind that his article is not about automatic script binding ((There are plenty of great solutions for automatic script binding, far more versatile than what we implement here (SWIG is one of them).)). It is just one thing that can be done with code generation and used as an example here.

In the example, we are going to work with the following C++ class declaration:

class TextComponent  
 
  {
 
  public:
 
      TextComponent();
 
   
 
      std::string text() const;
 
      void setText(const std::string& value);
 
   
 
      void superSecretFunction();
 
   
 
  private:
 
      std::string m_text;
 
  };

Adding scripting

Our goal is to be able to write the following in Python:

t = CodegenExample.TextComponent()
 
  t.setText("Hello")
 
  print t.text()
 
  t.superSecretFunction()

– with the expected output being:

 
  Hello
 
  Traceback (most recent call last):
 
    File "src/test.py", line 7, in 
 
      t.superSecretFunction()
 
  AttributeError: 'TextComponent' object has no attribute 'superSecretFunction'

Now let’s see the simple solution. We will utilize Boost.Python, a seasoned and battle-tried library, which allows us to write binding code in a very expressive manner. The following is the entire binding code in a separate source file which we will link with our executable. We also define an init_bindings() function, which does what its name says.

#include <boost/python.hpp>
 
  #include "textcomponent.h"
 
   
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(CodegenExample)
 
  {
 
      class_<TextComponent>("TextComponent")
 
          .def("text", &TextComponent::text)
 
          .def("setText", &TextComponent::setText)
 
      ;
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      initCodegenExample();
 
  }

After init_bindings is called, our TextComponent class is available to use in Python. The above code expresses exactly what we wanted to achieve: one constructor, and two member functions. We simply don’t bind the superSecretFunction, because we don’t want that to be available from Python.

All of the above is what we would do in a typical project to make a class scriptable. Our aim is to generate this code automatically.

Automation

Traversing the AST

Now we are going to inspect the abstract syntax tree (AST) of the header file and use that information to generate the above binding code.

Traversing the AST is performed with cursors. A cursor points to a node in the AST, and can tell what kind of node that is (for example, a class declaration) and what are its children (e.g. the members of the class), as well as numerous other information. The first cursor you need points to the root of the translation unit, that is, the file you are parsing.

To obtain this cursor, we need to do the following in Python:

clang.cindex.Config.set_library_file('/usr/local/lib/libclang.so')
 
  index = clang.cindex.Index.create()
 
  translation_unit = index.parse(sys.argv[1], ['-x', 'c++', '-std=c++11', '-D__CODE_GENERATOR__'])

The index object is our main interface to libclang, this is where we normally initiate “talking to the library”.

The parameters of the parse call

The parse function takes a filename and a list of compiler flags. We need to specify that we are compiling a C++ header (-x c++), because otherwise libclang will assume it is C, based on the .h extension (and consequently produce an AST that misses most parts of our header file). This option will cause libclang to preprocess our file (resolve macros and includes) and then treat it as a C++ source. The other options should be self-explanatory: setting the standard we use in the parsed source, and providing the __CODE_GENERATOR__ macro definition which will come handy in most implementations. Now, back to processing the AST – recursively. The AST of the TextComponent class can be dumped like this (see dump_ast.py):

 
  TRANSLATION_UNIT textcomponent.h
 
    +--CLASS_DECL TextComponent
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CONSTRUCTOR TextComponent
 
       +--CXX_METHOD text
 
       |  +--NAMESPACE_REF std
 
       |  +--TYPE_REF string
 
       +--CXX_METHOD setText
 
       |  +--PARM_DECL value
 
       |     +--NAMESPACE_REF std
 
       |     +--TYPE_REF string
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CXX_METHOD superSecretFunction
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--FIELD_DECL m_text
 
          +--NAMESPACE_REF std
 
          +--TYPE_REF string

Comparing this syntax tree dump to the source above should help with understanding what libclang does when parsing the source code.

Did you mean recursion?

The libclang C API has a visitor-based way of traversing the AST. This is also available from the Python bindings via the function clang.cindex.Cursor_visit, but we are going to utilize the more Pythonic, iterator-based approach, which is an addition of the Python bindings. The ‘T’ stands for tree in the abbreviation, and the most straightforward way to process such a data structure is to recursively traverse it. It is pretty simple:

def traverse(cursor):
 
      # ...
 
      # do something with the current node here, i.e. 
 
      # check the kind, spelling, displayname and act based on those
 
      # ...
 
      for child_node in node.get_children():
 
          traverse(child_node)

Useful properties of cursors

Cursors provide a rich set of information about each node in the AST. For our purposes, we will use the following:

  • kind: The kind of node we are looking at in the AST. Answers questions like: Is this a class declaration? Is this a function declaration? Is this a parameter declaration?
  • spelling: The literal text defining the token. For example, if the cursor points to a function declaration void foo(int x);, the spelling of this cursor will be foo.
  • displayname: Similar to spelling, but the displayname also contains some extra information which helps to distinguish between identically spelled tokens, such as function overloads. The displayname of the above example will be foo(int).
  • location.file: The source location where the node was found. This can be used to filter out included contents from the source file being parsed, because usually we are interested in that.

If you are implementing something different, you might find the following properties useful, too: location,extent. Sometimes the only way to get a particular string is to read the source directly. With location and extent you can find the exact point in the file that you need to read.

Poor man’s code model

online manner, I find it clearer (and more reusable) to actually build a code model in which the C++ classes, functions (and whatever else is interesting for your purposes) are objects.

The following piece of code illustrates what I mean here and also showcases how a (very thin) object model of a class is constructed:

class Function(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
   
 
  class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
   
 
          for c in cursor.get_children():
 
              if (c.kind == clang.cindex.CursorKind.CXX_METHOD):
 
                  f = Function(c)
 
                  self.functions.append(f)

It all really boils down to traversing a tree and filtering for certain elements. The for loop above does just that: if the current node is a C++ method (member function), then construct and store a Function object using the information found in that node.

This is a very simplistic code model: classes have names and member functions, and member functions have names. It is possible to gather much more than that, but for our purposes, this is mostly enough. By the way, that above is almost half of all the code we need in our code generator!

Now let’s see the other half: how the classes are built. Reusing the traversal approach:

def build_classes(cursor):
 
      result = []
 
      for c in cursor.get_children():
 
          if (c.kind == clang.cindex.CursorKind.CLASS_DECL
 
              and c.location.file.name == sys.argv[1]):
 
              a_class = Class(c)
 
              result.append(a_class)
 
          elif c.kind == clang.cindex.CursorKind.NAMESPACE:
 
              child_classes = build_classes(c)
 
              result.extend(child_classes)
 
   
 
      return result

One important step here is that we are checking the location of the node. This ensures that we are only taking the contents of the file being parsed into account. Otherwise, we would end up with a huge mess of an AST due to the includes being , well, included. To put it all together, we would call the above function with the translation unit cursor (see above), to find all classes and their functions in the source file:

classes = build_classes(translation_unit.cursor)

Code generation

Now that we have a code model, we can easily process it and generate the code we need. We could iterate over the list of classes, print the class_… part in the binding code, then iterate their member functions… etc. This approach can work and is easy to implement, albeit not very elegant. We are going to do something way cooler: we will use templates to generate our code.

Templates. Duh?

Of course you already saw we are using templates in the Boost.Python code.

What is so cool about that?

Oh, I didn’t mean C++ templates. Generating text from data is a well-understood problem, with lots of great solutions from the web programming world. Titen. Another approach could be to avoid template engines overall and just use Python string formatting with {variables_like_this}.)), with prominent users like reddit and python.org. Sceptical? Take a look at the template code we will use:

#include <boost/python.hpp>
 
  #include "${include_file}"
 
   
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(${module_name})
 
  {
 
  % for c in classes:
 
      class_<${c.name}>("${c.name}")
 
      % for f in c.functions:
 
          .def("${f.name}", &${c.name}::${f.name})
 
      % endfor
 
      ;
 
  % endfor
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      init${module_name}();
 
  }

This template directly uses the code model we defined above, c.name refers to the name of the Class object. It is easy to see how even this simple code model can be used to generate code for various purposes. You can register functions not only for script binding, but also reflection libraries which allow a huge variety of dynamic uses (serialization, RPC, thin layer for script binding etc.). Save that template to a file named bind.mako, and then using it is really just a few lines:

from mako.template import Template
 
  classes = build_classes(translation_unit.cursor)
 
  tpl = Template(filename='bind.mako')
 
  print tpl.render(
 
                  classes=classes,
 
                  module_name='CodegenExample',
 
                  include_file=sys.argv[1]))

If we process textcomponent.h with our full script, this is what we get:

#include <boost/python.hpp>
 
  #include "textcomponent.h"
 
   
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(CodegenExample)
 
  {
 
      class_<TextComponent>("TextComponent")
 
          .def("text", &text)
 
          .def("setText", &setText)
 
          .def("superSecretFunction", &superSecretFunction)  // oops, should not be generated!
 
      ;
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      initCodegenExample();
 
  }

That is almost what we wanted. The remaining problem is that our script also bound the superSecretFunction, which we meant to hide.

Hiding member functions

Now, to tell our code generator script that some parts of the AST are different than others (namely, we want to ignore them), we need to use some annotation magic ((It is also possible to simply use the __CODE_GENERATOR__ macro to hide some parts, but that is quite ugly and intrusive, one of the things we wanted to avoid)). When I was experimenting, I tried using the new(-ish) C++11 [[attributes]] to mark functions, but libclang seemed to omit unknown attributes from the AST. That is correct behavior, as far as the standard is concerned: compilers should ignore unknown (non-standard) attributes. Clang simply chooses to ignore them while building the AST, which is unfortunate for our case. Luckily, clang has a language extension which can be used to apply string attributes to many kinds of nodes, and that information is readily available in the AST. With a graceful macro definition, we can use this extension without losing compatibility with other compilers. The syntax is the following:

__attribute__((annotate("something"))) void foo(int x);

Admittedly, that is a bit lengthy. We do need to employ conditional compilation for portability anyway, so let’s do the following:

#ifdef __CODE_GENERATOR__
 
  #define HIDDEN __attribute__((annotate("hidden")))
 
  #else
 
  #define HIDDEN
 
  #endif

This way our code generator will see the annotations, but compilers won’t. To use the annotation, we will need to revisit some of the code above. First, we write the annotation where we want it in the source:

#pragma once 
 
   
 
  #include <string>
 
   
 
  #ifdef __CODE_GENERATOR__
 
  #define HIDDEN __attribute__((annotate("hidden")))
 
  #else
 
  #define HIDDEN
 
  #endif
 
   
 
  class TextComponent  
 
  {
 
  public:
 
      TextComponent();
 
   
 
      std::string text() const;
 
      void setText(const std::string& value);
 
   
 
      HIDDEN void superSecretFunction();
 
   
 
  private:
 
      std::string m_text;
 
  };

The AST produced from the above source:

 
  TRANSLATION_UNIT textcomponent.h
 
    +--CLASS_DECL TextComponent
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CONSTRUCTOR TextComponent
 
       +--CXX_METHOD text
 
       |  +--NAMESPACE_REF std
 
       |  +--TYPE_REF string
 
       +--CXX_METHOD setText
 
       |  +--PARM_DECL value
 
       |     +--NAMESPACE_REF std
 
       |     +--TYPE_REF string
 
       +--<b>CXX_METHOD superSecretFunction</b>
 
       |  +--<b>ANNOTATE_ATTR hidden</b>
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--FIELD_DECL m_text
 
          +--NAMESPACE_REF std
 
          +--TYPE_REF string

As you can see, the annotations are added as children to the annotated nodes. A little change of our script is needed where we build the code model:

def get_annotations(node):
 
      return [c.displayname for c in node.get_children()
 
              if c.kind == clang.cindex.CursorKind.ANNOTATE_ATTR]
 
   
 
  class Function(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.annotations = get_annotations(cursor)
 
   
 
  class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
   
 
          for c in cursor.get_children():
 
              if c.kind == clang.cindex.CursorKind.CXX_METHOD:
 
                  f = Function(c)
 
                  self.functions.append(f)

We can filter out the unwanted elements (classes from includes and functions that are meant to be hidden) in two places: during building the code model or when our mako template is rendered. Choosing either is a matter of taste, I voted for the latter (I feel it is better the keep the code model consistent with the source file). The modified template looks like this:

#include <boost/python.hpp>
 
  #include "textcomponent.h"
 
   
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(${module_name})
 
  {
 
  % for c in classes:
 
      % if "scripted" in c.annotations:
 
      class_<${c.name}>("${c.name}")
 
      % for f in c.functions:
 
          % if not "hidden" in f.annotations:
 
          .def("${f.name}", &${f.name})
 
          % endif
 
      % endfor
 
      ;
 
      % endif
 
  % endfor
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      init${module_name}();
 
  }

At this point, the generated binding code is almost identical to what we wrote by hand. There is one last issue we need to cover.

Hiding non-public members

Our example source only had public member functions. In a real-world scenario, this rarely happens. The code generator did not take access specifiers (public, private, protected) into account, but it would be very important to do so. The generated binding code would not compile if it would contain non-public members. Unfortunately, at the time this is written the Python bindings do not expose the access specifiers on the cursors. I recommend using my patched cindex.py to access this information ((I submitted a patch to the clang frontend project during writing the article and I will update the post if it gets approved)). The last revision of our Class constructor filters out the non-public members:

class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
          self.annotations = get_annotations(cursor)
 
   
 
          for c in cursor.get_children():
 
              if (c.kind == clang.cindex.CursorKind.CXX_METHOD and
 
                  c.access_specifier == clang.cindex.AccessSpecifier.PUBLIC):
 
                  f = Function(c)
 
                  self.functions.append(f)

With this, our solution is pretty much complete. The generated source will look like the one we wrote by hand. You can find the full, integrated project in the git repository.

Implementing a code generator with libclang

Original Author: Tamás Szelei

10.5708/szelei.2014.A.1]

Introduction

Click here to skip the introduction and see the code.

The following article covers the process of implementing a practical code generator for C++ in detail. You can find the full source code for the article Source Code Tales.

A code generator is a very useful asset in a larger C++ project. Due to the lack of introspection in the language, implementing the likes of reflection, script binding and serialization requires writing some sort of boilerplate that essentially keeps the data which is otherwise thrown away by the compiler. These solutions are either intrusive (heavily macro-based, thus hard to debug and require weird syntax in declarations) or fragile (the boilerplate must be constantly updated to follow the actual code, and might break without warning). One way to improve the robustness is to automate writing this boilerplate. In order to achieve this, we need to parse the code somehow, in other words, understand what information to keep. However, parsing C++ is an extremely complex task, and with the copious amount of weird corner cases, we are in for quite a ride if we attempt to do so.

Attempts to parse a “good enough” subset of C++ generally fail or require the project to follow strict coding guidelines. That is, to avoid syntax that the parser can’t understand – and may break at any time when someone commits code that doesn’t follow these guidelines. The excellent official documentation is pretty much only the Doxygen-generated reference, which is very useful, but not as an introduction to the usage of the library; due to the complex nature of the problem, it has a steep learning curve.

I am going to present the process of implementing a practical code generator for C++ using libclang and its Python bindings. “Practical” in the sense that it is not a general solution. What I’m presenting here is not meant to be taken as the concrete implementation I made but rather as one possible way of solving a problem, a detailed example. Using these ideas, it is possible to create an all-encompassing reflection solution or to generate code for existing libraries of any purpose. The aim is to write natural C++ syntax with minimal intrusive bits to provide the functionality. I encourage readers to experiment with the code and to try and implement a code generator from scratch.

I can’t go on without mentioning Eli Bendersky’s excellent post on the topic, which served as a great resource when I started gathering information.

Prerequisites

To understand everything in this article, I recommend that the reader knows:

  • Intermediate C++
  • Intermediate Python
  • What serialization and reflection is
  • What an AST (Abstract Syntax Tree) is

Required tools:

  • Python 2.7 (3.x will probably also work apart from little things)
  • LLVM 3.2+
  • libclang python bindings (I recommend installing with pip: pip install clang)
  • A text editor or IDE (mostly for editing Python code)

The example problem

In our example, we are implementing automatic script binding, so we don’t need to write and maintain binding code by hand. We also want to be able to omit certain parts of the source so that they are not taken into account when the binding boilerplate code is generated ((Such a feature is useful when the C++ layer is meant to serve as a low level implementatation of features tied together by scripts. In some cases we want to keep functions public while hiding them from the scripting interface (e.g. debug functionality).)). Keep in mind that his article is not about automatic script binding ((There are plenty of great solutions for automatic script binding, far more versatile than what we implement here (SWIG is one of them).)). It is just one thing that can be done with code generation and used as an example here.

In the example, we are going to work with the following C++ class declaration:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  
class TextComponent  
 
  {
 
  public:
 
      TextComponent();
 
   
 
      std::string text() const;
 
      void setText(const std::string&amp; value);
 
   
 
      void superSecretFunction();
 
   
 
  private:
 
      std::string m_text;
 
  };

Adding scripting

Our goal is to be able to write the following in Python:

1
 
  2
 
  3
 
  4
 
  
t = CodegenExample.TextComponent()
 
  t.setText("Hello")
 
  print t.text()
 
  t.superSecretFunction()

– with the expected output being:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  
<pre>
 
  Hello
 
  Traceback (most recent call last):
 
    File "src/test.py", line 7, in 
 
      t.superSecretFunction()
 
  AttributeError: 'TextComponent' object has no attribute 'superSecretFunction'</pre>

Now let’s see the simple solution. We will utilize Boost.Python, a seasoned and battle-tried library, which allows us to write binding code in a very expressive manner. The following is the entire binding code in a separate source file which we will link with our executable. We also define an init_bindings() function, which does what its name says.

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  
<pre>
 
  #include 
 
  #include "textcomponent.h"
 
  
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(CodegenExample)
 
  {
 
      class_("TextComponent")
 
          .def("text", &amp;TextComponent::text)
 
          .def("setText", &amp;TextComponent::setText)
 
      ;
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      initCodegenExample();
 
  }
 
  </pre>

After init_bindings is called, our TextComponent class is available to use in Python. The above code expresses exactly what we wanted to achieve: one constructor, and two member functions. We simply don’t bind the superSecretFunction, because we don’t want that to be available from Python.

All of the above is what we would do in a typical project to make a class scriptable. Our aim is to generate this code automatically.

Automation

Traversing the AST

Now we are going to inspect the abstract syntax tree (AST) of the header file and use that information to generate the above binding code.

Traversing the AST is performed with cursors. A cursor points to a node in the AST, and can tell what kind of node that is (for example, a class declaration) and what are its children (e.g. the members of the class), as well as numerous other information. The first cursor you need points to the root of the translation unit, that is, the file you are parsing.

To obtain this cursor, we need to do the following in Python:

1
 
  2
 
  3
 
  4
 
  5
 
  
<pre>
 
  clang.cindex.Config.set_library_file('/usr/local/lib/libclang.so')
 
  index = clang.cindex.Index.create()
 
  translation_unit = index.parse(sys.argv[1], ['-x', 'c++', '-std=c++11', '-D__CODE_GENERATOR__'])
 
  </pre>

The index object is our main interface to libclang, this is where we normally initiate “talking to the library”.

The parameters of the parse call

The parse function takes a filename and a list of compiler flags. We need to specify that we are compiling a C++ header (-x c++), because otherwise libclang will assume it is C, based on the .h extension (and consequently produce an AST that misses most parts of our header file). This option will cause libclang to preprocess our file (resolve macros and includes) and then treat it as a C++ source. The other options should be self-explanatory: setting the standard we use in the parsed source, and providing the __CODE_GENERATOR__ macro definition which will come handy in most implementations. Now, back to processing the AST – recursively. The AST of the TextComponent class can be dumped like this (see dump_ast.py):

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  
<pre>
 
  TRANSLATION_UNIT textcomponent.h
 
    +--CLASS_DECL TextComponent
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CONSTRUCTOR TextComponent
 
       +--CXX_METHOD text
 
       |  +--NAMESPACE_REF std
 
       |  +--TYPE_REF string
 
       +--CXX_METHOD setText
 
       |  +--PARM_DECL value
 
       |     +--NAMESPACE_REF std
 
       |     +--TYPE_REF string
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CXX_METHOD superSecretFunction
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--FIELD_DECL m_text
 
          +--NAMESPACE_REF std
 
          +--TYPE_REF string</pre>

Comparing this syntax tree dump to the source above should help with understanding what libclang does when parsing the source code.

Did you mean recursion?

The libclang C API has a visitor-based way of traversing the AST. This is also available from the Python bindings via the function clang.cindex.Cursor_visit, but we are going to utilize the more Pythonic, iterator-based approach, which is an addition of the Python bindings. The ‘T’ stands for tree in the abbreviation, and the most straightforward way to process such a data structure is to recursively traverse it. It is pretty simple:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  
<pre>
 
  def traverse(cursor):
 
      # ...
 
      # do something with the current node here, i.e. 
 
      # check the kind, spelling, displayname and act based on those
 
      # ...
 
      for child_node in node.get_children():
 
          traverse(child_node)</pre>

Useful properties of cursors

Cursors provide a rich set of information about each node in the AST. For our purposes, we will use the following:

  • kind: The kind of node we are looking at in the AST. Answers questions like: Is this a class declaration? Is this a function declaration? Is this a parameter declaration?
  • spelling: The literal text defining the token. For example, if the cursor points to a function declaration void foo(int x);, the spelling of this cursor will be foo.
  • displayname: Similar to spelling, but the displayname also contains some extra information which helps to distinguish between identically spelled tokens, such as function overloads. The displayname of the above example will be foo(int).
  • location.file: The source location where the node was found. This can be used to filter out included contents from the source file being parsed, because usually we are interested in that.

If you are implementing something different, you might find the following properties useful, too: location,extent. Sometimes the only way to get a particular string is to read the source directly. With location and extent you can find the exact point in the file that you need to read.

Poor man’s code model

online manner, I find it clearer (and more reusable) to actually build a code model in which the C++ classes, functions (and whatever else is interesting for your purposes) are objects.

The following piece of code illustrates what I mean here and also showcases how a (very thin) object model of a class is constructed:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  
<pre>
 
  class Function(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
   
 
  class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
   
 
          for c in cursor.get_children():
 
              if (c.kind == clang.cindex.CursorKind.CXX_METHOD):
 
                  f = Function(c)
 
                  self.functions.append(f)
 
  </pre>

It all really boils down to traversing a tree and filtering for certain elements. The for loop above does just that: if the current node is a C++ method (member function), then construct and store a Function object using the information found in that node.

This is a very simplistic code model: classes have names and member functions, and member functions have names. It is possible to gather much more than that, but for our purposes, this is mostly enough. By the way, that above is almost half of all the code we need in our code generator!

Now let’s see the other half: how the classes are built. Reusing the traversal approach:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  
<pre>
 
  def build_classes(cursor):
 
      result = []
 
      for c in cursor.get_children():
 
          if (c.kind == clang.cindex.CursorKind.CLASS_DECL
 
              and c.location.file.name == sys.argv[1]):
 
              a_class = Class(c)
 
              result.append(a_class)
 
          elif c.kind == clang.cindex.CursorKind.NAMESPACE:
 
              child_classes = build_classes(c)
 
              result.extend(child_classes)
 
   
 
      return result
 
  </pre>

One important step here is that we are checking the location of the node. This ensures that we are only taking the contents of the file being parsed into account. Otherwise, we would end up with a huge mess of an AST due to the includes being , well, included. To put it all together, we would call the above function with the translation unit cursor (see above), to find all classes and their functions in the source file:

1
 
  2
 
  3
 
  
<pre>
 
  classes = build_classes(translation_unit.cursor)
 
  </pre>

Code generation

Now that we have a code model, we can easily process it and generate the code we need. We could iterate over the list of classes, print the class_… part in the binding code, then iterate their member functions… etc. This approach can work and is easy to implement, albeit not very elegant. We are going to do something way cooler: we will use templates to generate our code.

Templates. Duh?

Of course you already saw we are using templates in the Boost.Python code.

What is so cool about that?

Oh, I didn’t mean C++ templates. Generating text from data is a well-understood problem, with lots of great solutions from the web programming world. Titen. Another approach could be to avoid template engines overall and just use Python string formatting with {variables_like_this}.)), with prominent users like reddit and python.org. Sceptical? Take a look at the template code we will use:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  
<pre>
 
  #include 
 
  #include "${include_file}"
 
  
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(${module_name})
 
  {
 
  % for c in classes:
 
      class_("${c.name}")
 
      % for f in c.functions:
 
          .def("${f.name}", &amp;${c.name}::${f.name})
 
      % endfor
 
      ;
 
  % endfor
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      init${module_name}();
 
  }
 
  </pre>

This template directly uses the code model we defined above, c.name refers to the name of the Class object. It is easy to see how even this simple code model can be used to generate code for various purposes. You can register functions not only for script binding, but also reflection libraries which allow a huge variety of dynamic uses (serialization, RPC, thin layer for script binding etc.). Save that template to a file named bind.mako, and then using it is really just a few lines:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  
<pre>
 
  from mako.template import Template
 
  classes = build_classes(translation_unit.cursor)
 
  tpl = Template(filename='bind.mako')
 
  print tpl.render(
 
                  classes=classes,
 
                  module_name='CodegenExample',
 
                  include_file=sys.argv[1]))
 
  </pre>

If we process textcomponent.h with our full script, this is what we get:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  
<pre>
 
  #include 
 
  #include "textcomponent.h"
 
  
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(CodegenExample)
 
  {
 
      class_("TextComponent")
 
          .def("text", &amp;text)
 
          .def("setText", &amp;setText)
 
          .def("superSecretFunction", &amp;superSecretFunction)  // oops, should not be generated!
 
      ;
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      initCodegenExample();
 
  }</pre>

That is almost what we wanted. The remaining problem is that our script also bound the superSecretFunction, which we meant to hide.

Hiding member functions

Now, to tell our code generator script that some parts of the AST are different than others (namely, we want to ignore them), we need to use some annotation magic ((It is also possible to simply use the __CODE_GENERATOR__ macro to hide some parts, but that is quite ugly and intrusive, one of the things we wanted to avoid)). When I was experimenting, I tried using the new(-ish) C++11 [[attributes]] to mark functions, but libclang seemed to omit unknown attributes from the AST. That is correct behavior, as far as the standard is concerned: compilers should ignore unknown (non-standard) attributes. Clang simply chooses to ignore them while building the AST, which is unfortunate for our case. Luckily, clang has a language extension which can be used to apply string attributes to many kinds of nodes, and that information is readily available in the AST. With a graceful macro definition, we can use this extension without losing compatibility with other compilers. The syntax is the following:

1
 
  2
 
  3
 
  
<pre>
 
  __attribute__((annotate("something"))) void foo(int x);
 
  </pre>

Admittedly, that is a bit lengthy. We do need to employ conditional compilation for portability anyway, so let’s do the following:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  
<pre>
 
  #ifdef __CODE_GENERATOR__
 
  #define HIDDEN __attribute__((annotate("hidden")))
 
  #else
 
  #define HIDDEN
 
  #endif
 
  </pre>

This way our code generator will see the annotations, but compilers won’t. To use the annotation, we will need to revisit some of the code above. First, we write the annotation where we want it in the source:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  
<pre>
 
  #pragma once 
 
  
 
  #include 
 
  
 
  #ifdef __CODE_GENERATOR__
 
  #define HIDDEN __attribute__((annotate("hidden")))
 
  #else
 
  #define HIDDEN
 
  #endif
 
  
 
  class TextComponent  
 
  {
 
  public:
 
      TextComponent();
 
   
 
      std::string text() const;
 
      void setText(const std::string&amp; value);
 
   
 
      HIDDEN void superSecretFunction();
 
   
 
  private:
 
      std::string m_text;
 
  };
 
  </pre>

The AST produced from the above source:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  
<pre>
 
  TRANSLATION_UNIT textcomponent.h
 
    +--CLASS_DECL TextComponent
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CONSTRUCTOR TextComponent
 
       +--CXX_METHOD text
 
       |  +--NAMESPACE_REF std
 
       |  +--TYPE_REF string
 
       +--CXX_METHOD setText
 
       |  +--PARM_DECL value
 
       |     +--NAMESPACE_REF std
 
       |     +--TYPE_REF string
 
       +--&lt;b&gt;CXX_METHOD superSecretFunction&lt;/b&gt;
 
       |  +--&lt;b&gt;ANNOTATE_ATTR hidden&lt;/b&gt;
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--FIELD_DECL m_text
 
          +--NAMESPACE_REF std
 
          +--TYPE_REF string</pre>

As you can see, the annotations are added as children to the annotated nodes. A little change of our script is needed where we build the code model:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  
<pre>
 
  def get_annotations(node):
 
      return [c.displayname for c in node.get_children()
 
              if c.kind == clang.cindex.CursorKind.ANNOTATE_ATTR]
 
   
 
  class Function(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.annotations = get_annotations(cursor)
 
   
 
  class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
   
 
          for c in cursor.get_children():
 
              if c.kind == clang.cindex.CursorKind.CXX_METHOD:
 
                  f = Function(c)
 
                  self.functions.append(f)
 
  </pre>

We can filter out the unwanted elements (classes from includes and functions that are meant to be hidden) in two places: during building the code model or when our mako template is rendered. Choosing either is a matter of taste, I voted for the latter (I feel it is better the keep the code model consistent with the source file). The modified template looks like this:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  
<pre>
 
  #include 
 
  #include "textcomponent.h"
 
  
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(${module_name})
 
  {
 
  % for c in classes:
 
      % if "scripted" in c.annotations:
 
      class_("${c.name}")
 
      % for f in c.functions:
 
          % if not "hidden" in f.annotations:
 
          .def("${f.name}", &amp;${f.name})
 
          % endif
 
      % endfor
 
      ;
 
      % endif
 
  % endfor
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      init${module_name}();
 
  }</pre>

At this point, the generated binding code is almost identical to what we wrote by hand. There is one last issue we need to cover.

Hiding non-public members

Our example source only had public member functions. In a real-world scenario, this rarely happens. The code generator did not take access specifiers (public, private, protected) into account, but it would be very important to do so. The generated binding code would not compile if it would contain non-public members. Unfortunately, at the time this is written the Python bindings do not expose the access specifiers on the cursors. I recommend using my patched cindex.py to access this information ((I submitted a patch to the clang frontend project during writing the article and I will update the post if it gets approved)). The last revision of our Class constructor filters out the non-public members:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  
<pre>
 
  class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
          self.annotations = get_annotations(cursor)
 
   
 
          for c in cursor.get_children():
 
              if (c.kind == clang.cindex.CursorKind.CXX_METHOD and
 
                  c.access_specifier == clang.cindex.AccessSpecifier.PUBLIC):
 
                  f = Function(c)
 
                  self.functions.append(f)
 
  </pre>

With this, our solution is pretty much complete. The generated source will look like the one we wrote by hand. You can find the full, integrated project in the git repository.

Vessel Post Mortem: Part 1

Original Author: Tony Albrecht

(This article has also been posted at Overbyte.com.au and is the first part of a 3 part series)

I started looking at Vessel in January 2013 – initially just in my evenings. In my ‘spare’ time. I was looking at it because John and Martin (the founders of Strange Loop Games and the creators of Vessel) had asked me to consider porting it to PS3. We knew each other from our time together at Pandemic Studios in Brisbane where I had worked closely with Martin in the Engine team while John worked on AI. I was keen to do the port but I wanted to be as sure as possible that I knew how much work was involved in it before committing.

Vessel Screenshot

Vessel in action.

I found the idea of porting such a game daunting – it had custom rendering engine, a bespoke fluid physics system with major parts of the game code written in Lua – but I figured it was time to step up and take on such a challenge. I have an immense amount of respect for John and Martin – they are both incredibly smart coders and I relished the chance of working with them again. I wanted to do it but I didn’t want to let them down. They were going to be my clients, but they were also friends.

I started by looking at their code – in order to give a quote on the porting process I needed to have the game running so that I could profile it. Vessel had already been partially ported to the PS3 by a third party, but wasn’t compiling when I first got my hands on it. So I spent a few weeks of long evenings fixing the FileIO, the template compilation errors (GCC and MSVC disagree on some of the finer points of templates), and just commenting out anything I didn’t think was relevant in order to get the game up and running. Once it was working, I ran my trusty profiler over the game and tried to guess how much time it would take to get this baby to a shippable state. What I found was a little scary.


Visualisation of the code changes made during the porting of Vessel to PS3.

Vessel runs predominantly on three threads. The game thread which runs all the Lua scripts which control the majority of the game logic and in game objects, the render thread which takes the output of the game thread and converts it to a format that can be fed to the SPUs which then build the graphics command buffer and feed it to the GPU in parallel, plus the physics thread which manages all the physics processing – fluid dynamics, rigid body collisions, everything. John had built the physics system himself and it was pretty impressive (and complicated). It was too much to grok initially, so all I could do was look at the game’s performance on some of the worst levels, profile it there and see how much I needed to speed it up.

I investigated a few of the levels on the PS3 build and picked two that I thought were the worst. What they told me was that the physics thread was the primary cause of slowdown and so I concentrated my analysis on that thread. The game was running at about 60ms per frame in the worst levels I was testing – about 15 to 20 frames per second. From discussions with Strange Loop, I knew that the physics thread had to run at 60fps so this meant that I needed a factor of 4 speed up. From experience, I knew that this was something that was achievable. The game thread was running at around 30ms per frame and the Render thread was clocking in at around 5ms per frame. SPU rendering was pretty fast, and I figured that wouldn’t be much of a problem.

Months of fannying about took place and a contract was finally signed in June. So I did the only appropriate thing one could do in that situation – I did two frantic weeks of work on Vessel and then went on a 6 week family holiday to the UK.

DragonPetting2

The kids and I carefully feeding an English Dragon.

In anticipation of the Vessel contract I’d hired an experienced programmer, Ben Driehuis. Ben came highly recommended and had worked on the Bioshock series of games, plus, most importantly, he was living in the same city as me. Having spent the last 6 years working remotely I knew that my best bet of delivering this game on time was to be in the same physical space as my team. I threw Ben in the deep end. He was hired and then given work on platforms and engines that he’d never had to deal with before. Then when I buggered off to the UK he had to manage by himself, dealing with a performance contract on a big name title and then kicking on with Vessel in my absence. Poor bugger.

I also had a friend who was interested in the QA side of things, so James (an old Uni mate) was responsible for playing Vessel over and over and over again, reporting bugs to us and telling us that the performance improvements we’d just made hadn’t made any difference. James also helped us out on the testing for the TRC (Technical Requirements Checklist – a massive list of tests that need to be checked off in order to get your game through submission to Sony) and the actual submission of the game.

Once I got back from my holiday we moved into the new office and started working in earnest on the Vessel port. Our goal was to finish Vessel by the end of August – we had about four man-months to do the work, and I had carefully calculated how much time each section of the port would take. Surely that was enough time?

Ben and I divided up the work between us: I took on the optimisation of the physics and Ben was to deal with pretty much everything else – audio was broken, some of the in game objects didn’t animate, loading needed to be optimised, memory usage had to be reduced, we needed save games and trophies and a whole swathe of PS3 specific features implemented. Ben has written about some of the fun he had during the port here.

From my initial inspection of the code, I had noticed a lot of memory allocations taking place, so I was optimistic that I could reduce (or remove) these and scrape back a lot of frame time. I also figured that rearranging the memory usage to be more cache friendly would help the PlayStation 3’s main processor so I could leave the core of the physics code alone. The last thing I wanted to do was to port all of the physics to SPU.

Three weeks later I started porting the physics to SPU. The changes I was making were incrementally improving the performance, but I could see that we’d never hit the speed we needed without dramatic changes. Porting code to SPU isn’t like porting code from one platform to another. You can program SPUs in C++, but they have very limited memory available to them – only 256kb each. So, in order to port any significantly complex task, you need to understand how that task uses memory – where it is, what order it reads it, where it writes it – the things that most programmers of high level languages take for granted. Not only that, but you have to consider how the task will parallelise. Running on one SPU is good, but running on 6 is awesome. (I’ll go into more detail on the SPU porting process in later articles.)

The first SPU task I built was one that looked for drops that were near a set of objects. My first shot at this shaved almost 2ms off the frame time – the function in question dropped from a peak of 2.4ms to 0.6ms! That meant that in order to hit 16ms per frame I just needed to do this 21 more times!

A perilous journey

A perilous journey awaits.

Still, Ben and I kept hammering away at the code. By the end of the first week in August I’d shaved 15ms off the Physics thread and yet that thread remained the bottleneck. Ben had freed up over 200MB of memory and we were close to running on a retail PS3. But we’d started to notice a few things – some of the levers and triggers in the game weren’t working, fluid shaders weren’t working properly, we were getting spikes of over 200ms in some cases and the game still wasn’t playable due to not just the levers not working but the AI of the fluros seemed to be broken (some would teleport away).

It was evident at this point that we weren’t going to hit the end of August deadline. Other projects had taken some time from both Ben and myself, so we still had billable time available going into September, so I let Strange Loop know that things were going to take a little longer than I had expected. They were very understanding and comfortable with the release date pushing out a little more.

Things were progressing well though – by the end of August the physics thread was running at better than 33ms per frame – we were over halfway there! Save games were in place, loading was optimised, and trophies were functioning. Unfortunately, the game thread was slowing down – as Ben was fixing problems with the code and data, more things were being processed. For example, audio wasn’t working before and now there was a lot of audio being triggered so this obviously took more processing time.

It was the second week in September when I realised that I’d made a horribly flawed assumption about the game thread in Vessel. I discovered that it had to function in lock step with the physics thread and so it too had to run at 16ms per frame. Reducing the framerate of the physics wasn’t an option as it was fixed to 16ms per frame – experimentation with varying that resulted in fluid collisions breaking horribly.

So, at that point we had the physics running at sub-30ms per frame, and the game thread running at 30ms and above. We were close, but we still had a long way to go – we now had to figure out how to optimise the Lua heavy game thread down to 16ms per frame. Actually, it had to be faster than that – as the PS3 has only two hardware threads, the 3 software threads have to share those two hardware threads which meant that the game thread plus the render thread had to be faster than 16ms per frame.

Then Ben discovered the reason for the teleporting fluros – a compiler bug triggered only in optimised builds was causing fluros to incorrectly calculate their trajectory during jumps, making them disappear. Turning off optimisation for that function fixed the problem. That was a tricky bug to find, so we were pretty stoked – until we realised that this meant that the number of fluros in view was now higher that it was before. Much higher. More fluros means more processing – more physics, more drops, more rendering, more audio. This one fix resulted in the Physics thread jumping back up to 50ms from around 24ms, the render thread jumping up to 12ms and the Game thread climbing to 40ms.

I was horrified. We were effectively back to where we had started. We now needed to save a total of 60ms over three threads. I wasn’t sure we could do it. It was like that bit in horror movies where the protagonist has killed the evil monster, only to have it rise up, stronger than it was before.

Revision visualisation of the code and data for the Vessel port.

The initial optimisations on a project are the easiest. As time goes on and all the low hanging fruit has been picked, the remaining optimisations are harder and harder to perform. We really had our work cut out for us – and we’d come too far to go back now.

To be continued…

Fragments of Him prototype – designer’s post-mortem: Scope, sexuality, and scripts

Original Author: Tino van der Kraan

<foreword>Hello dear #AltDev community. Thank you for including me into this community and hopefully your trust will not be misplaced in allowing me to submit my contributions to this blog. My name is Tino and I write from the perspective of a co-founder and creative at a young independent game development studio called SassyBot Studio. What you can read below is a look back on the narrative experience ‘Fragments of Him’, a well received Ludum Dare game that is currently in development to be a full title. Hopefully, the contents of this blog will be valuable to you.</foreword>

A little over nine months ago, Elwin (lead programmer) and I put our faith in Mata Haggis as he proposed that we should make a narrative game for the, back then, upcoming Ludum Dare game jam challenge. Ludum Dare is an online game jam event where developers around the world create a game, either alone (48 hour time limit) or in a small team (72 hour time limit), under great time pressure. These games are encouraged to follow a theme that is announced at the start of the event.

The game that Mata had in mind was to be a narrative game. With the little game jam experience we had back then, we knew the scope had to be manageable if we were to finish the game at all. Mata explained we needed 3D characters, fully decorated indoor and outdoor scenes, several written and voiced dialogue lines for multiple playthroughs, and shader magic for some of the interactions. As you can imagine, we told him he was mad.

As with any mad doctor, he convinced us that this could be done and we were willing to give him the benefit of the doubt. In retrospect, the faith has not been misplaced. As of this writing, Fragments of Him has been played over 28.000 times on Fragments of Him funded it might be a good idea to reflect on how the initial prototype came to be.

FoH8

Fragments of Him (Work in progress)

We are still very early in development and we do not want to reveal too much. What we can share is the complete post-mortem of Dr. Mata Haggis (@MataHaggis) , written roughly a week after the game jam. It takes you through the process that Mata used to create the prototype that is the foundation for the upcoming full version.

All text onwards is written by Dr. Mata Haggis.

 

Introduction

My name is Dr. Mata Haggis and I was the narrative & game designer/producer on Fragments of Him, our entry to Ludum Dare 26.

I had never done a game jam before, so this was a new experience for me. I was fortunate enough to have two talented developers ask me to join their team. Between us we created a narrative game experience with one programmer, one artist (and his 3D modeller girlfriend for an evening), and myself in only 72 hours.

The game jam that enabled us to conceive Fragments of Him

Below you will find out what a narrative designer does on a game jam, what a producer can do, and a little more about the decisions that I made when creating the title.

Before we go any further you should play the game:

Play Fragments of Him on Kongregate

Seriously. I mean it. This post will have a lot of spoilers very soon, so go play the game now and come back in ten minutes.

Done that?

Really?

Okay, let’s talk about the process.

 

Starting

The theme was minimalism, and I wanted to do a game with a very prominent story in it. I worked on several ideas in my head, but the one that stuck was investigating why a person would choose to live in a minimalist style in their house. The result of this was the idea that the lead character had lost their partner and couldn’t stand to be reminded of the loss.

Fragments_of_Him-Gameplay

Gameplay of the Fragments of Him prototype

In narrative terminology, the loss of the partner would be referred to as ‘the inciting incident’. The process of creating minimalist spaces gave me a gameplay mechanic too.

I pitched the concept to the team – one artist and one programmer – and I was fortunate that they were enthusiastic about it. There were several key decisions that are worth examining here:

 

Scope

The theme was going to need a lot of art. Specifically, the world was going to need a lot of models in it. I immediately said that the world was going to be stylised and colour coded: the protagonist (the ‘hero’ of the story) would be blue, and yellow would be used to indicate the dead partner, along with objects related to that partner. Everything else in the game was going to be white – this would allow the artist to focus on building the space and objects without worrying about texturing any of it.

In terms of showing figures in the environment, again I needed to keep a close eye on the scope of the project. I asked the artist to create one generic character model which would be used for both the protagonist and the dead partner. In every scene, these figures would be posed in a tableau (a static pose that suggests narrative action). In this way we could give a powerful idea of character relationships without the difficulty of animating figures.

For the programmer, there were a few challenges that I had to consider. The audio and subtitles would need to be displayed, there would need to be highlighting and signposting of affordances, and the most difficult task was probably going to be the final scenes where the gameplay mechanic (clicking to remove objects) is reversed. By keeping these interactions very simple, I could limit the complexity of the task that I was giving to the programmer.

 

Sexuality

I voiced the lead character of the game, and I am male. In the game, the character talks about his dead boyfriend. I felt that the story would work perfectly with either a male or female partner character, but I also feel that non-heterosexual relationships are under-represented in gaming, and so I had a preference for making both characters male.

I described the outline of the story to the team and they had no feelings either way on the gender of the partner, and so we ended up with a story about coping with grief, where the lead characters happen to both be male.

When stories are told about the death of a non-heterosexual man (it is not defined if he was gay or bisexual), they often focus on stereotypical perceptions of gay lifestyle choices: drugs, promiscuity, clubbing, and of course HIV/AIDS. I didn’t want to tell a gay love story; I just wanted to tell a love story.

Fragments_of_Him-ApartmentMood

The reminders hurt too much.

I know that the audience for any story will be predominantly heterosexual, with then lower proportions of gay, bisexual, transgender, and other queer-identifying players. Part of my goal was to ignore the non-heterosexual elements and to write a story that anyone could relate to; in doing this, I wanted to completely normalise the lives of the men. To do this, I choose to focus on the small and relatable things in people’s lives.

I suspect that anyone who has had a break-up, especially after living with a partner, can relate to the quiet sadness of removing one towel from the bathroom.  I think that feeling of sorrow is a universal experience that has nothing to do with sexual preferences, and that is what I wanted to convey in this game. In doing this, the sexuality of the characters became irrelevant in the sea of everyday memories.

As a side-benefit to the choice of going for a homosexual relationship, the artist only needed to make one body and animation rig, saving him a lot of time in the game jam time constraints!

 

How to write a good story quickly

I’ve been writing for several years and have cobbled together a system which works well for me. I’ve based it on several sources, but the main ones are ‘Will Write for Shoes’ by Cathy Yardley.

Neither of these are high-brow books on how to create your epic masterpiece, but they are very focussed on creating a tight, enjoyable story.

I put ideas from the books together and now here’s what I use whenever I start writing:

Before the inciting incident: ‘Save The Cat’

Show the life of the character before life goes wrong. The character does something that makes you like them (‘save the cat’).

5% Inciting incident

 Something changes that forces the protagonist to act.

25% Plot point one: state the external motivation

What forces the protagonist to make this clear statement of their objective?

50% Plot point two: the low mid-point

It appears impossible to complete the external motivation, protagonist loses hope

75% Plot point three: Hope

The protagonist is given hope that they can fulfil their external motivation goal, but only if they truly dedicate themselves to it.

90 – 95% “The Black Moment”

The external motivation appears impossible to fulfil.

95% – 100% Resolution

The story concludes in a satisfying manner – this may be successful completion of the external and internal motivations (a happy ending), it may be a failure on external motivation but a success in the internal motivation (common in comedies, romance, or tales of self-discovery), success of external motivation but failure of internal motivation (common in tragedy and tales of self-discovery). It is not typical for a story to end with failure of both external and internal motivation – this is the total failure of the character to grow or succeed and makes an audience wonder why they spent their time with the character.

I use this whenever I write and it’s working out pretty well for me so far!

In the case of Fragments of Him, as with other stories I write, I began from a feeling and worked back to an inciting incident. The feeling was a person clearing away all of their belongings to create a minimalist living space – why would they do this? This question led back to the inciting incident – the objects were related to grief at the sudden death of a partner.

From there I worked through the template, filling in the gaps. For Fragments of him, it looks like this:

Before the inciting incident: ‘Save The Cat’

Scene – Park. Feeding ducks, narrator talks about how good life is.Two characters – protagonist is blue, the ex is yellow. All other objects are yellow too.The player clicks on objects (or parts of objects) in the scene, they turn transparent.

5% Inciting incident

End of first level – the player has been removing the polygons, when everything is transparent except for the main character – the partner dies.

25% Plot point one: state the external motivation

Scene – House. Protagonist wants to remove all traces of the ex from his life.

50% Plot point two: the low mid-point

Scene – Street . It appears impossible to remove everything – everywhere he goes there are reminders. He doesn’t want to go outside.

75% Plot point three: Hope

Scene – Office. If he can remove everything from his interior spaces he feels like he might be able to cope.

90 – 95% “The Black Moment”

Scene – back in the empty House. Protagonist is sitting on the floor. Ex appears behind… The protagonist feels like he will never be free of the memories, even when everything is gone.

95% – 100% Resolution

As the player clicks to remove the ex from the House scene, the protagonist’s colour changes, blending the two into green – the ex has become part of the protagonist. The player clicks through the scenes, where the transparent objects are back. As the player clicks on the transparent objects they turn green too. This goes much faster than the removal.The protagonist understands that the ex is part of him. Things are different, but life will go on in a new way.

As you can see, I’ve used the basic structure from the template to create a narrative arc that is satisfying and also that integrates with the gameplay – at each step I have made sure the interaction with the game adds to the plot.

I chose locations where it would be viable to have few characters in the scene because of the limitations on the art scope.

Fragments_of_Him-ParkMood

The park scene of Fragments of Him

 

Dialogue

There are three variations of most of the instances of dialogue in the game. These are chosen randomly during each play through, giving a slightly different experience for each time.

The dialogue was written with three key events in mind:

  • The start of a level
  • Removing a particular object or a percentage of objects
  • The end of a level

The start and end triggers would give the main plot points, and the objects would trigger smaller memories.

I recorded the audio in a make-shift audio booth constructed from a pair of curtains and a clothes rack on a €100 digital microphone. It’s not ideal, but it did the trick. I then edited the sound files into one or two sentence chunks with some antiquated software. This was very laborious and time consuming for me, but sometimes design requires this kind of repetitive grunt work to get the project done.

 

Other audio

Ambient audio and music is absolutely essential in selling any emotional experience: I believed this going into this project and now I am utterly convinced of this. In every scene there are several audio triggers built into the environment that work in a very subtle way to make the spaces feel more believable.

I have a suspicion that the audio space of a game may be more important than the visual style when it comes to creating emotional resonance. Most of this work will never be noticed by the player on a conscious level, but that it exists in the mix is important. In the apartment, did you hear the muffled footsteps of a neighbour going down the hall? Or in the office, did you notice the sound of typing outside the room, or the noise of a plane flying past overheard? Probably not, but they’re there, and they help you feel that the space you are in is alive.

For the music, I found a wonderful website of free music: Fragments_of_Him-4

Everything makes me think of Him

Multi-Threaded Programming 2: Communication

Original Author: Joseph Simons

The next step in learning about multi-threaded programming is going to involve seeing how threads communicate with each other. This is important so that it is understandable why many of the pitfalls when programming with multiple threads exist. I am going to stick with x86-64 based architectures for simplicity, but this is pretty applicable to all computing devices from my experience.

If you want to read up on other things that I have covered, then here is my previous post in this series:

 

Reading From Memory

So to start, let’s see what happens when a thread reads from memory. First, the processor will request a load from a specified memory address to store it in a local register. Since we are just dealing with a single process, I will avoid discussing the details behind TLB, so just consider this a physical memory address.

This will then hit various cache-miss. If it had, then it would be called a cache-hit and the data at that address would get to the core much faster.

4960X-Core

In this case the L1 and L2 caches are specific to each core.

 So after the address has made it all the way to the main memory of the computer, the data at that location begins its long trip back to the processor. Note that in the real world with virtual memory addresses, the location of the data could actually be on the hard drive, meaning that we need to wait for that slow thing to locate what we need before we can get access to it. Along the way back to our waiting core, each level of the cache is updated to have that data stored, so that any future access to it will be much, much faster.

Because each trip outside of the caches is slow, a single read will pull in data around the address requested. The size of this data is equal to the size of a single affinity correctly), this type of optimization can be difficult to actually implement.

 

intelgrafik

I stands for Instruction, AKA code. D is for Data.

Writing To Memory

A program is pretty useless without writing something back to memory, so let me briefly cover what happens when a thread does just that.

It starts by specifying what data  to write and specific memory address to write to, just like with the read earlier. The core will execute this write instruction (typically referred to as a store), which will be put on the accordingly. Only after the cache line that has the updated data needs to be replaced will the updated data finally make it to main memory. Fortunately, the thread that executed the write instruction doesn’t need to wait until the write completes before moving on to the next instruction.

One thing you do need to keep in mind is that with modern volatile, which in terms of multi-threading in C/C++ (which I will be using for examples in later posts) doesn’t help you accomplish this.

When multiple threads are writing to the same data location, whoever executes last is typically the winner (the actual behavior may depend on the underlying hardware). However, since threads are controlled by the operating system, guaranteeing which one that will be is nearly impossible. So you have to be extremely careful whenever you have multiple threads writing to the same area. Fortunately, we have a few tools that will help us do this, which I am going to cover next.

atom

Atomic Operations

Finally, we are going to touch on a vital piece of communicating between threads, and that is atomic operations. As I just mentioned, when dealing with multiple threads operating on the same data, guaranteeing the order of operations is nearly impossible. Even if one thread is executing ahead of another, that thread can be Atomic operations fill in this important role. These are implemented directly on CPUs as operations that cannot be interrupted (performing multiple operations in a single instruction with specific constraints), so they will operate in a serial manner regardless of other thread or operating system interference.

The fundamental atomic operation is the Compare and Swap. What this does (as the name implies) is that it performs a compare of the data before swapping it with different data. This is so you know that you are operating on data that has not changed (since another thread could have come by and beaten you to the punch).

Let’s look at a simple example of an increment using some pseudocode. Assume we have a Compare and Swap function called CAS that returns a boolean indicating whether it was successful:

// our function signature
 
  bool CAS(void* AddressToWrite, int CompareValue, int SwapValue);
 
  
 
  static int x = 0;
 
  local int y = x;
 
  while (CAS(&x, y, y + 1) == false)
 
  {
 
    y = x; // fetch the new value
 
  }

Assuming this is in a function that can be called by multiple threads, we need to protect against the possibility that any other thread can change the value of x from the time we read it into our local variable to the time we increment it. If that does happen, then we need to read in the new value and try again, keeping in mind that we can be behind yet again due to some other thread. However, we should eventually succeed unless we find ourselves in some situation where other threads are continuously hitting this section of code, which in a normal program would be neat impossible.

Also using our Compare and Swap function, we can implement a simple mutex lock. We can have a variable that will act as the lock value. Then we can attempt to acquire the lock by seeing if that value is 0 and then setting it to 1, and releasing the lock in a similar but opposite manner. Some psuedocode for those is below.

Lock:

static int lock = 0;
 
  while (CAS(&lock, 0, 1) == false);

 

Unlock:

// technically this should always succeed assuming
 
  // we successfully locked it in the first place
 
  while (CAS(&lock, 1, 0) == false);

 

Next Time…

For the next post in this series, we are going to look at a couple of different simple algorithms and see how we can use what we have learned here in order to make them operate in a multi-threaded (or concurrent) manner.

If you have any interest in other areas dealing with multi-threaded programming, please let me know in the comments and I will see if I can make a future post answering any questions or topics you would like covered. Thanks for reading!