Implementing a code generator with libclang

Original Author: Tamás Szelei

10.5708/szelei.2014.A.1]

Introduction

Click here to skip the introduction and see the code.

The following article covers the process of implementing a practical code generator for C++ in detail. You can find the full source code for the article Source Code Tales.

A code generator is a very useful asset in a larger C++ project. Due to the lack of introspection in the language, implementing the likes of reflection, script binding and serialization requires writing some sort of boilerplate that essentially keeps the data which is otherwise thrown away by the compiler. These solutions are either intrusive (heavily macro-based, thus hard to debug and require weird syntax in declarations) or fragile (the boilerplate must be constantly updated to follow the actual code, and might break without warning). One way to improve the robustness is to automate writing this boilerplate. In order to achieve this, we need to parse the code somehow, in other words, understand what information to keep. However, parsing C++ is an extremely complex task, and with the copious amount of weird corner cases, we are in for quite a ride if we attempt to do so.

Attempts to parse a “good enough” subset of C++ generally fail or require the project to follow strict coding guidelines. That is, to avoid syntax that the parser can’t understand – and may break at any time when someone commits code that doesn’t follow these guidelines. The excellent official documentation is pretty much only the Doxygen-generated reference, which is very useful, but not as an introduction to the usage of the library; due to the complex nature of the problem, it has a steep learning curve.

I am going to present the process of implementing a practical code generator for C++ using libclang and its Python bindings. “Practical” in the sense that it is not a general solution. What I’m presenting here is not meant to be taken as the concrete implementation I made but rather as one possible way of solving a problem, a detailed example. Using these ideas, it is possible to create an all-encompassing reflection solution or to generate code for existing libraries of any purpose. The aim is to write natural C++ syntax with minimal intrusive bits to provide the functionality. I encourage readers to experiment with the code and to try and implement a code generator from scratch.

I can’t go on without mentioning Eli Bendersky’s excellent post on the topic, which served as a great resource when I started gathering information.

Prerequisites

To understand everything in this article, I recommend that the reader knows:

  • Intermediate C++
  • Intermediate Python
  • What serialization and reflection is
  • What an AST (Abstract Syntax Tree) is

Required tools:

  • Python 2.7 (3.x will probably also work apart from little things)
  • LLVM 3.2+
  • libclang python bindings (I recommend installing with pip: pip install clang)
  • A text editor or IDE (mostly for editing Python code)

The example problem

In our example, we are implementing automatic script binding, so we don’t need to write and maintain binding code by hand. We also want to be able to omit certain parts of the source so that they are not taken into account when the binding boilerplate code is generated ((Such a feature is useful when the C++ layer is meant to serve as a low level implementatation of features tied together by scripts. In some cases we want to keep functions public while hiding them from the scripting interface (e.g. debug functionality).)). Keep in mind that his article is not about automatic script binding ((There are plenty of great solutions for automatic script binding, far more versatile than what we implement here (SWIG is one of them).)). It is just one thing that can be done with code generation and used as an example here.

In the example, we are going to work with the following C++ class declaration:

class TextComponent  
 
  {
 
  public:
 
      TextComponent();
 
   
 
      std::string text() const;
 
      void setText(const std::string& value);
 
   
 
      void superSecretFunction();
 
   
 
  private:
 
      std::string m_text;
 
  };

Adding scripting

Our goal is to be able to write the following in Python:

t = CodegenExample.TextComponent()
 
  t.setText("Hello")
 
  print t.text()
 
  t.superSecretFunction()

– with the expected output being:

 
  Hello
 
  Traceback (most recent call last):
 
    File "src/test.py", line 7, in 
 
      t.superSecretFunction()
 
  AttributeError: 'TextComponent' object has no attribute 'superSecretFunction'

Now let’s see the simple solution. We will utilize Boost.Python, a seasoned and battle-tried library, which allows us to write binding code in a very expressive manner. The following is the entire binding code in a separate source file which we will link with our executable. We also define an init_bindings() function, which does what its name says.

#include <boost/python.hpp>
 
  #include "textcomponent.h"
 
   
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(CodegenExample)
 
  {
 
      class_<TextComponent>("TextComponent")
 
          .def("text", &TextComponent::text)
 
          .def("setText", &TextComponent::setText)
 
      ;
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      initCodegenExample();
 
  }

After init_bindings is called, our TextComponent class is available to use in Python. The above code expresses exactly what we wanted to achieve: one constructor, and two member functions. We simply don’t bind the superSecretFunction, because we don’t want that to be available from Python.

All of the above is what we would do in a typical project to make a class scriptable. Our aim is to generate this code automatically.

Automation

Traversing the AST

Now we are going to inspect the abstract syntax tree (AST) of the header file and use that information to generate the above binding code.

Traversing the AST is performed with cursors. A cursor points to a node in the AST, and can tell what kind of node that is (for example, a class declaration) and what are its children (e.g. the members of the class), as well as numerous other information. The first cursor you need points to the root of the translation unit, that is, the file you are parsing.

To obtain this cursor, we need to do the following in Python:

clang.cindex.Config.set_library_file('/usr/local/lib/libclang.so')
 
  index = clang.cindex.Index.create()
 
  translation_unit = index.parse(sys.argv[1], ['-x', 'c++', '-std=c++11', '-D__CODE_GENERATOR__'])

The index object is our main interface to libclang, this is where we normally initiate “talking to the library”.

The parameters of the parse call

The parse function takes a filename and a list of compiler flags. We need to specify that we are compiling a C++ header (-x c++), because otherwise libclang will assume it is C, based on the .h extension (and consequently produce an AST that misses most parts of our header file). This option will cause libclang to preprocess our file (resolve macros and includes) and then treat it as a C++ source. The other options should be self-explanatory: setting the standard we use in the parsed source, and providing the __CODE_GENERATOR__ macro definition which will come handy in most implementations. Now, back to processing the AST – recursively. The AST of the TextComponent class can be dumped like this (see dump_ast.py):

 
  TRANSLATION_UNIT textcomponent.h
 
    +--CLASS_DECL TextComponent
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CONSTRUCTOR TextComponent
 
       +--CXX_METHOD text
 
       |  +--NAMESPACE_REF std
 
       |  +--TYPE_REF string
 
       +--CXX_METHOD setText
 
       |  +--PARM_DECL value
 
       |     +--NAMESPACE_REF std
 
       |     +--TYPE_REF string
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CXX_METHOD superSecretFunction
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--FIELD_DECL m_text
 
          +--NAMESPACE_REF std
 
          +--TYPE_REF string

Comparing this syntax tree dump to the source above should help with understanding what libclang does when parsing the source code.

Did you mean recursion?

The libclang C API has a visitor-based way of traversing the AST. This is also available from the Python bindings via the function clang.cindex.Cursor_visit, but we are going to utilize the more Pythonic, iterator-based approach, which is an addition of the Python bindings. The ‘T’ stands for tree in the abbreviation, and the most straightforward way to process such a data structure is to recursively traverse it. It is pretty simple:

def traverse(cursor):
 
      # ...
 
      # do something with the current node here, i.e. 
 
      # check the kind, spelling, displayname and act based on those
 
      # ...
 
      for child_node in node.get_children():
 
          traverse(child_node)

Useful properties of cursors

Cursors provide a rich set of information about each node in the AST. For our purposes, we will use the following:

  • kind: The kind of node we are looking at in the AST. Answers questions like: Is this a class declaration? Is this a function declaration? Is this a parameter declaration?
  • spelling: The literal text defining the token. For example, if the cursor points to a function declaration void foo(int x);, the spelling of this cursor will be foo.
  • displayname: Similar to spelling, but the displayname also contains some extra information which helps to distinguish between identically spelled tokens, such as function overloads. The displayname of the above example will be foo(int).
  • location.file: The source location where the node was found. This can be used to filter out included contents from the source file being parsed, because usually we are interested in that.

If you are implementing something different, you might find the following properties useful, too: location,extent. Sometimes the only way to get a particular string is to read the source directly. With location and extent you can find the exact point in the file that you need to read.

Poor man’s code model

online manner, I find it clearer (and more reusable) to actually build a code model in which the C++ classes, functions (and whatever else is interesting for your purposes) are objects.

The following piece of code illustrates what I mean here and also showcases how a (very thin) object model of a class is constructed:

class Function(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
   
 
  class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
   
 
          for c in cursor.get_children():
 
              if (c.kind == clang.cindex.CursorKind.CXX_METHOD):
 
                  f = Function(c)
 
                  self.functions.append(f)

It all really boils down to traversing a tree and filtering for certain elements. The for loop above does just that: if the current node is a C++ method (member function), then construct and store a Function object using the information found in that node.

This is a very simplistic code model: classes have names and member functions, and member functions have names. It is possible to gather much more than that, but for our purposes, this is mostly enough. By the way, that above is almost half of all the code we need in our code generator!

Now let’s see the other half: how the classes are built. Reusing the traversal approach:

def build_classes(cursor):
 
      result = []
 
      for c in cursor.get_children():
 
          if (c.kind == clang.cindex.CursorKind.CLASS_DECL
 
              and c.location.file.name == sys.argv[1]):
 
              a_class = Class(c)
 
              result.append(a_class)
 
          elif c.kind == clang.cindex.CursorKind.NAMESPACE:
 
              child_classes = build_classes(c)
 
              result.extend(child_classes)
 
   
 
      return result

One important step here is that we are checking the location of the node. This ensures that we are only taking the contents of the file being parsed into account. Otherwise, we would end up with a huge mess of an AST due to the includes being , well, included. To put it all together, we would call the above function with the translation unit cursor (see above), to find all classes and their functions in the source file:

classes = build_classes(translation_unit.cursor)

Code generation

Now that we have a code model, we can easily process it and generate the code we need. We could iterate over the list of classes, print the class_… part in the binding code, then iterate their member functions… etc. This approach can work and is easy to implement, albeit not very elegant. We are going to do something way cooler: we will use templates to generate our code.

Templates. Duh?

Of course you already saw we are using templates in the Boost.Python code.

What is so cool about that?

Oh, I didn’t mean C++ templates. Generating text from data is a well-understood problem, with lots of great solutions from the web programming world. Titen. Another approach could be to avoid template engines overall and just use Python string formatting with {variables_like_this}.)), with prominent users like reddit and python.org. Sceptical? Take a look at the template code we will use:

#include <boost/python.hpp>
 
  #include "${include_file}"
 
   
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(${module_name})
 
  {
 
  % for c in classes:
 
      class_<${c.name}>("${c.name}")
 
      % for f in c.functions:
 
          .def("${f.name}", &${c.name}::${f.name})
 
      % endfor
 
      ;
 
  % endfor
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      init${module_name}();
 
  }

This template directly uses the code model we defined above, c.name refers to the name of the Class object. It is easy to see how even this simple code model can be used to generate code for various purposes. You can register functions not only for script binding, but also reflection libraries which allow a huge variety of dynamic uses (serialization, RPC, thin layer for script binding etc.). Save that template to a file named bind.mako, and then using it is really just a few lines:

from mako.template import Template
 
  classes = build_classes(translation_unit.cursor)
 
  tpl = Template(filename='bind.mako')
 
  print tpl.render(
 
                  classes=classes,
 
                  module_name='CodegenExample',
 
                  include_file=sys.argv[1]))

If we process textcomponent.h with our full script, this is what we get:

#include <boost/python.hpp>
 
  #include "textcomponent.h"
 
   
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(CodegenExample)
 
  {
 
      class_<TextComponent>("TextComponent")
 
          .def("text", &text)
 
          .def("setText", &setText)
 
          .def("superSecretFunction", &superSecretFunction)  // oops, should not be generated!
 
      ;
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      initCodegenExample();
 
  }

That is almost what we wanted. The remaining problem is that our script also bound the superSecretFunction, which we meant to hide.

Hiding member functions

Now, to tell our code generator script that some parts of the AST are different than others (namely, we want to ignore them), we need to use some annotation magic ((It is also possible to simply use the __CODE_GENERATOR__ macro to hide some parts, but that is quite ugly and intrusive, one of the things we wanted to avoid)). When I was experimenting, I tried using the new(-ish) C++11 [[attributes]] to mark functions, but libclang seemed to omit unknown attributes from the AST. That is correct behavior, as far as the standard is concerned: compilers should ignore unknown (non-standard) attributes. Clang simply chooses to ignore them while building the AST, which is unfortunate for our case. Luckily, clang has a language extension which can be used to apply string attributes to many kinds of nodes, and that information is readily available in the AST. With a graceful macro definition, we can use this extension without losing compatibility with other compilers. The syntax is the following:

__attribute__((annotate("something"))) void foo(int x);

Admittedly, that is a bit lengthy. We do need to employ conditional compilation for portability anyway, so let’s do the following:

#ifdef __CODE_GENERATOR__
 
  #define HIDDEN __attribute__((annotate("hidden")))
 
  #else
 
  #define HIDDEN
 
  #endif

This way our code generator will see the annotations, but compilers won’t. To use the annotation, we will need to revisit some of the code above. First, we write the annotation where we want it in the source:

#pragma once 
 
   
 
  #include <string>
 
   
 
  #ifdef __CODE_GENERATOR__
 
  #define HIDDEN __attribute__((annotate("hidden")))
 
  #else
 
  #define HIDDEN
 
  #endif
 
   
 
  class TextComponent  
 
  {
 
  public:
 
      TextComponent();
 
   
 
      std::string text() const;
 
      void setText(const std::string& value);
 
   
 
      HIDDEN void superSecretFunction();
 
   
 
  private:
 
      std::string m_text;
 
  };

The AST produced from the above source:

 
  TRANSLATION_UNIT textcomponent.h
 
    +--CLASS_DECL TextComponent
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CONSTRUCTOR TextComponent
 
       +--CXX_METHOD text
 
       |  +--NAMESPACE_REF std
 
       |  +--TYPE_REF string
 
       +--CXX_METHOD setText
 
       |  +--PARM_DECL value
 
       |     +--NAMESPACE_REF std
 
       |     +--TYPE_REF string
 
       +--<b>CXX_METHOD superSecretFunction</b>
 
       |  +--<b>ANNOTATE_ATTR hidden</b>
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--FIELD_DECL m_text
 
          +--NAMESPACE_REF std
 
          +--TYPE_REF string

As you can see, the annotations are added as children to the annotated nodes. A little change of our script is needed where we build the code model:

def get_annotations(node):
 
      return [c.displayname for c in node.get_children()
 
              if c.kind == clang.cindex.CursorKind.ANNOTATE_ATTR]
 
   
 
  class Function(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.annotations = get_annotations(cursor)
 
   
 
  class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
   
 
          for c in cursor.get_children():
 
              if c.kind == clang.cindex.CursorKind.CXX_METHOD:
 
                  f = Function(c)
 
                  self.functions.append(f)

We can filter out the unwanted elements (classes from includes and functions that are meant to be hidden) in two places: during building the code model or when our mako template is rendered. Choosing either is a matter of taste, I voted for the latter (I feel it is better the keep the code model consistent with the source file). The modified template looks like this:

#include <boost/python.hpp>
 
  #include "textcomponent.h"
 
   
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(${module_name})
 
  {
 
  % for c in classes:
 
      % if "scripted" in c.annotations:
 
      class_<${c.name}>("${c.name}")
 
      % for f in c.functions:
 
          % if not "hidden" in f.annotations:
 
          .def("${f.name}", &${f.name})
 
          % endif
 
      % endfor
 
      ;
 
      % endif
 
  % endfor
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      init${module_name}();
 
  }

At this point, the generated binding code is almost identical to what we wrote by hand. There is one last issue we need to cover.

Hiding non-public members

Our example source only had public member functions. In a real-world scenario, this rarely happens. The code generator did not take access specifiers (public, private, protected) into account, but it would be very important to do so. The generated binding code would not compile if it would contain non-public members. Unfortunately, at the time this is written the Python bindings do not expose the access specifiers on the cursors. I recommend using my patched cindex.py to access this information ((I submitted a patch to the clang frontend project during writing the article and I will update the post if it gets approved)). The last revision of our Class constructor filters out the non-public members:

class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
          self.annotations = get_annotations(cursor)
 
   
 
          for c in cursor.get_children():
 
              if (c.kind == clang.cindex.CursorKind.CXX_METHOD and
 
                  c.access_specifier == clang.cindex.AccessSpecifier.PUBLIC):
 
                  f = Function(c)
 
                  self.functions.append(f)

With this, our solution is pretty much complete. The generated source will look like the one we wrote by hand. You can find the full, integrated project in the git repository.

Implementing a code generator with libclang

Original Author: Tamás Szelei

10.5708/szelei.2014.A.1]

Introduction

Click here to skip the introduction and see the code.

The following article covers the process of implementing a practical code generator for C++ in detail. You can find the full source code for the article Source Code Tales.

A code generator is a very useful asset in a larger C++ project. Due to the lack of introspection in the language, implementing the likes of reflection, script binding and serialization requires writing some sort of boilerplate that essentially keeps the data which is otherwise thrown away by the compiler. These solutions are either intrusive (heavily macro-based, thus hard to debug and require weird syntax in declarations) or fragile (the boilerplate must be constantly updated to follow the actual code, and might break without warning). One way to improve the robustness is to automate writing this boilerplate. In order to achieve this, we need to parse the code somehow, in other words, understand what information to keep. However, parsing C++ is an extremely complex task, and with the copious amount of weird corner cases, we are in for quite a ride if we attempt to do so.

Attempts to parse a “good enough” subset of C++ generally fail or require the project to follow strict coding guidelines. That is, to avoid syntax that the parser can’t understand – and may break at any time when someone commits code that doesn’t follow these guidelines. The excellent official documentation is pretty much only the Doxygen-generated reference, which is very useful, but not as an introduction to the usage of the library; due to the complex nature of the problem, it has a steep learning curve.

I am going to present the process of implementing a practical code generator for C++ using libclang and its Python bindings. “Practical” in the sense that it is not a general solution. What I’m presenting here is not meant to be taken as the concrete implementation I made but rather as one possible way of solving a problem, a detailed example. Using these ideas, it is possible to create an all-encompassing reflection solution or to generate code for existing libraries of any purpose. The aim is to write natural C++ syntax with minimal intrusive bits to provide the functionality. I encourage readers to experiment with the code and to try and implement a code generator from scratch.

I can’t go on without mentioning Eli Bendersky’s excellent post on the topic, which served as a great resource when I started gathering information.

Prerequisites

To understand everything in this article, I recommend that the reader knows:

  • Intermediate C++
  • Intermediate Python
  • What serialization and reflection is
  • What an AST (Abstract Syntax Tree) is

Required tools:

  • Python 2.7 (3.x will probably also work apart from little things)
  • LLVM 3.2+
  • libclang python bindings (I recommend installing with pip: pip install clang)
  • A text editor or IDE (mostly for editing Python code)

The example problem

In our example, we are implementing automatic script binding, so we don’t need to write and maintain binding code by hand. We also want to be able to omit certain parts of the source so that they are not taken into account when the binding boilerplate code is generated ((Such a feature is useful when the C++ layer is meant to serve as a low level implementatation of features tied together by scripts. In some cases we want to keep functions public while hiding them from the scripting interface (e.g. debug functionality).)). Keep in mind that his article is not about automatic script binding ((There are plenty of great solutions for automatic script binding, far more versatile than what we implement here (SWIG is one of them).)). It is just one thing that can be done with code generation and used as an example here.

In the example, we are going to work with the following C++ class declaration:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  
class TextComponent  
 
  {
 
  public:
 
      TextComponent();
 
   
 
      std::string text() const;
 
      void setText(const std::string&amp; value);
 
   
 
      void superSecretFunction();
 
   
 
  private:
 
      std::string m_text;
 
  };

Adding scripting

Our goal is to be able to write the following in Python:

1
 
  2
 
  3
 
  4
 
  
t = CodegenExample.TextComponent()
 
  t.setText("Hello")
 
  print t.text()
 
  t.superSecretFunction()

– with the expected output being:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  
<pre>
 
  Hello
 
  Traceback (most recent call last):
 
    File "src/test.py", line 7, in 
 
      t.superSecretFunction()
 
  AttributeError: 'TextComponent' object has no attribute 'superSecretFunction'</pre>

Now let’s see the simple solution. We will utilize Boost.Python, a seasoned and battle-tried library, which allows us to write binding code in a very expressive manner. The following is the entire binding code in a separate source file which we will link with our executable. We also define an init_bindings() function, which does what its name says.

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  
<pre>
 
  #include 
 
  #include "textcomponent.h"
 
  
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(CodegenExample)
 
  {
 
      class_("TextComponent")
 
          .def("text", &amp;TextComponent::text)
 
          .def("setText", &amp;TextComponent::setText)
 
      ;
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      initCodegenExample();
 
  }
 
  </pre>

After init_bindings is called, our TextComponent class is available to use in Python. The above code expresses exactly what we wanted to achieve: one constructor, and two member functions. We simply don’t bind the superSecretFunction, because we don’t want that to be available from Python.

All of the above is what we would do in a typical project to make a class scriptable. Our aim is to generate this code automatically.

Automation

Traversing the AST

Now we are going to inspect the abstract syntax tree (AST) of the header file and use that information to generate the above binding code.

Traversing the AST is performed with cursors. A cursor points to a node in the AST, and can tell what kind of node that is (for example, a class declaration) and what are its children (e.g. the members of the class), as well as numerous other information. The first cursor you need points to the root of the translation unit, that is, the file you are parsing.

To obtain this cursor, we need to do the following in Python:

1
 
  2
 
  3
 
  4
 
  5
 
  
<pre>
 
  clang.cindex.Config.set_library_file('/usr/local/lib/libclang.so')
 
  index = clang.cindex.Index.create()
 
  translation_unit = index.parse(sys.argv[1], ['-x', 'c++', '-std=c++11', '-D__CODE_GENERATOR__'])
 
  </pre>

The index object is our main interface to libclang, this is where we normally initiate “talking to the library”.

The parameters of the parse call

The parse function takes a filename and a list of compiler flags. We need to specify that we are compiling a C++ header (-x c++), because otherwise libclang will assume it is C, based on the .h extension (and consequently produce an AST that misses most parts of our header file). This option will cause libclang to preprocess our file (resolve macros and includes) and then treat it as a C++ source. The other options should be self-explanatory: setting the standard we use in the parsed source, and providing the __CODE_GENERATOR__ macro definition which will come handy in most implementations. Now, back to processing the AST – recursively. The AST of the TextComponent class can be dumped like this (see dump_ast.py):

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  
<pre>
 
  TRANSLATION_UNIT textcomponent.h
 
    +--CLASS_DECL TextComponent
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CONSTRUCTOR TextComponent
 
       +--CXX_METHOD text
 
       |  +--NAMESPACE_REF std
 
       |  +--TYPE_REF string
 
       +--CXX_METHOD setText
 
       |  +--PARM_DECL value
 
       |     +--NAMESPACE_REF std
 
       |     +--TYPE_REF string
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CXX_METHOD superSecretFunction
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--FIELD_DECL m_text
 
          +--NAMESPACE_REF std
 
          +--TYPE_REF string</pre>

Comparing this syntax tree dump to the source above should help with understanding what libclang does when parsing the source code.

Did you mean recursion?

The libclang C API has a visitor-based way of traversing the AST. This is also available from the Python bindings via the function clang.cindex.Cursor_visit, but we are going to utilize the more Pythonic, iterator-based approach, which is an addition of the Python bindings. The ‘T’ stands for tree in the abbreviation, and the most straightforward way to process such a data structure is to recursively traverse it. It is pretty simple:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  
<pre>
 
  def traverse(cursor):
 
      # ...
 
      # do something with the current node here, i.e. 
 
      # check the kind, spelling, displayname and act based on those
 
      # ...
 
      for child_node in node.get_children():
 
          traverse(child_node)</pre>

Useful properties of cursors

Cursors provide a rich set of information about each node in the AST. For our purposes, we will use the following:

  • kind: The kind of node we are looking at in the AST. Answers questions like: Is this a class declaration? Is this a function declaration? Is this a parameter declaration?
  • spelling: The literal text defining the token. For example, if the cursor points to a function declaration void foo(int x);, the spelling of this cursor will be foo.
  • displayname: Similar to spelling, but the displayname also contains some extra information which helps to distinguish between identically spelled tokens, such as function overloads. The displayname of the above example will be foo(int).
  • location.file: The source location where the node was found. This can be used to filter out included contents from the source file being parsed, because usually we are interested in that.

If you are implementing something different, you might find the following properties useful, too: location,extent. Sometimes the only way to get a particular string is to read the source directly. With location and extent you can find the exact point in the file that you need to read.

Poor man’s code model

online manner, I find it clearer (and more reusable) to actually build a code model in which the C++ classes, functions (and whatever else is interesting for your purposes) are objects.

The following piece of code illustrates what I mean here and also showcases how a (very thin) object model of a class is constructed:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  
<pre>
 
  class Function(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
   
 
  class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
   
 
          for c in cursor.get_children():
 
              if (c.kind == clang.cindex.CursorKind.CXX_METHOD):
 
                  f = Function(c)
 
                  self.functions.append(f)
 
  </pre>

It all really boils down to traversing a tree and filtering for certain elements. The for loop above does just that: if the current node is a C++ method (member function), then construct and store a Function object using the information found in that node.

This is a very simplistic code model: classes have names and member functions, and member functions have names. It is possible to gather much more than that, but for our purposes, this is mostly enough. By the way, that above is almost half of all the code we need in our code generator!

Now let’s see the other half: how the classes are built. Reusing the traversal approach:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  
<pre>
 
  def build_classes(cursor):
 
      result = []
 
      for c in cursor.get_children():
 
          if (c.kind == clang.cindex.CursorKind.CLASS_DECL
 
              and c.location.file.name == sys.argv[1]):
 
              a_class = Class(c)
 
              result.append(a_class)
 
          elif c.kind == clang.cindex.CursorKind.NAMESPACE:
 
              child_classes = build_classes(c)
 
              result.extend(child_classes)
 
   
 
      return result
 
  </pre>

One important step here is that we are checking the location of the node. This ensures that we are only taking the contents of the file being parsed into account. Otherwise, we would end up with a huge mess of an AST due to the includes being , well, included. To put it all together, we would call the above function with the translation unit cursor (see above), to find all classes and their functions in the source file:

1
 
  2
 
  3
 
  
<pre>
 
  classes = build_classes(translation_unit.cursor)
 
  </pre>

Code generation

Now that we have a code model, we can easily process it and generate the code we need. We could iterate over the list of classes, print the class_… part in the binding code, then iterate their member functions… etc. This approach can work and is easy to implement, albeit not very elegant. We are going to do something way cooler: we will use templates to generate our code.

Templates. Duh?

Of course you already saw we are using templates in the Boost.Python code.

What is so cool about that?

Oh, I didn’t mean C++ templates. Generating text from data is a well-understood problem, with lots of great solutions from the web programming world. Titen. Another approach could be to avoid template engines overall and just use Python string formatting with {variables_like_this}.)), with prominent users like reddit and python.org. Sceptical? Take a look at the template code we will use:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  
<pre>
 
  #include 
 
  #include "${include_file}"
 
  
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(${module_name})
 
  {
 
  % for c in classes:
 
      class_("${c.name}")
 
      % for f in c.functions:
 
          .def("${f.name}", &amp;${c.name}::${f.name})
 
      % endfor
 
      ;
 
  % endfor
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      init${module_name}();
 
  }
 
  </pre>

This template directly uses the code model we defined above, c.name refers to the name of the Class object. It is easy to see how even this simple code model can be used to generate code for various purposes. You can register functions not only for script binding, but also reflection libraries which allow a huge variety of dynamic uses (serialization, RPC, thin layer for script binding etc.). Save that template to a file named bind.mako, and then using it is really just a few lines:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  
<pre>
 
  from mako.template import Template
 
  classes = build_classes(translation_unit.cursor)
 
  tpl = Template(filename='bind.mako')
 
  print tpl.render(
 
                  classes=classes,
 
                  module_name='CodegenExample',
 
                  include_file=sys.argv[1]))
 
  </pre>

If we process textcomponent.h with our full script, this is what we get:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  
<pre>
 
  #include 
 
  #include "textcomponent.h"
 
  
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(CodegenExample)
 
  {
 
      class_("TextComponent")
 
          .def("text", &amp;text)
 
          .def("setText", &amp;setText)
 
          .def("superSecretFunction", &amp;superSecretFunction)  // oops, should not be generated!
 
      ;
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      initCodegenExample();
 
  }</pre>

That is almost what we wanted. The remaining problem is that our script also bound the superSecretFunction, which we meant to hide.

Hiding member functions

Now, to tell our code generator script that some parts of the AST are different than others (namely, we want to ignore them), we need to use some annotation magic ((It is also possible to simply use the __CODE_GENERATOR__ macro to hide some parts, but that is quite ugly and intrusive, one of the things we wanted to avoid)). When I was experimenting, I tried using the new(-ish) C++11 [[attributes]] to mark functions, but libclang seemed to omit unknown attributes from the AST. That is correct behavior, as far as the standard is concerned: compilers should ignore unknown (non-standard) attributes. Clang simply chooses to ignore them while building the AST, which is unfortunate for our case. Luckily, clang has a language extension which can be used to apply string attributes to many kinds of nodes, and that information is readily available in the AST. With a graceful macro definition, we can use this extension without losing compatibility with other compilers. The syntax is the following:

1
 
  2
 
  3
 
  
<pre>
 
  __attribute__((annotate("something"))) void foo(int x);
 
  </pre>

Admittedly, that is a bit lengthy. We do need to employ conditional compilation for portability anyway, so let’s do the following:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  
<pre>
 
  #ifdef __CODE_GENERATOR__
 
  #define HIDDEN __attribute__((annotate("hidden")))
 
  #else
 
  #define HIDDEN
 
  #endif
 
  </pre>

This way our code generator will see the annotations, but compilers won’t. To use the annotation, we will need to revisit some of the code above. First, we write the annotation where we want it in the source:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  
<pre>
 
  #pragma once 
 
  
 
  #include 
 
  
 
  #ifdef __CODE_GENERATOR__
 
  #define HIDDEN __attribute__((annotate("hidden")))
 
  #else
 
  #define HIDDEN
 
  #endif
 
  
 
  class TextComponent  
 
  {
 
  public:
 
      TextComponent();
 
   
 
      std::string text() const;
 
      void setText(const std::string&amp; value);
 
   
 
      HIDDEN void superSecretFunction();
 
   
 
  private:
 
      std::string m_text;
 
  };
 
  </pre>

The AST produced from the above source:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  
<pre>
 
  TRANSLATION_UNIT textcomponent.h
 
    +--CLASS_DECL TextComponent
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--CONSTRUCTOR TextComponent
 
       +--CXX_METHOD text
 
       |  +--NAMESPACE_REF std
 
       |  +--TYPE_REF string
 
       +--CXX_METHOD setText
 
       |  +--PARM_DECL value
 
       |     +--NAMESPACE_REF std
 
       |     +--TYPE_REF string
 
       +--&lt;b&gt;CXX_METHOD superSecretFunction&lt;/b&gt;
 
       |  +--&lt;b&gt;ANNOTATE_ATTR hidden&lt;/b&gt;
 
       +--CXX_ACCESS_SPEC_DECL 
 
       +--FIELD_DECL m_text
 
          +--NAMESPACE_REF std
 
          +--TYPE_REF string</pre>

As you can see, the annotations are added as children to the annotated nodes. A little change of our script is needed where we build the code model:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  
<pre>
 
  def get_annotations(node):
 
      return [c.displayname for c in node.get_children()
 
              if c.kind == clang.cindex.CursorKind.ANNOTATE_ATTR]
 
   
 
  class Function(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.annotations = get_annotations(cursor)
 
   
 
  class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
   
 
          for c in cursor.get_children():
 
              if c.kind == clang.cindex.CursorKind.CXX_METHOD:
 
                  f = Function(c)
 
                  self.functions.append(f)
 
  </pre>

We can filter out the unwanted elements (classes from includes and functions that are meant to be hidden) in two places: during building the code model or when our mako template is rendered. Choosing either is a matter of taste, I voted for the latter (I feel it is better the keep the code model consistent with the source file). The modified template looks like this:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  
<pre>
 
  #include 
 
  #include "textcomponent.h"
 
  
 
  using namespace boost::python;
 
   
 
  BOOST_PYTHON_MODULE(${module_name})
 
  {
 
  % for c in classes:
 
      % if "scripted" in c.annotations:
 
      class_("${c.name}")
 
      % for f in c.functions:
 
          % if not "hidden" in f.annotations:
 
          .def("${f.name}", &amp;${f.name})
 
          % endif
 
      % endfor
 
      ;
 
      % endif
 
  % endfor
 
  }
 
   
 
  void init_bindings()
 
  {
 
      Py_Initialize();
 
      init${module_name}();
 
  }</pre>

At this point, the generated binding code is almost identical to what we wrote by hand. There is one last issue we need to cover.

Hiding non-public members

Our example source only had public member functions. In a real-world scenario, this rarely happens. The code generator did not take access specifiers (public, private, protected) into account, but it would be very important to do so. The generated binding code would not compile if it would contain non-public members. Unfortunately, at the time this is written the Python bindings do not expose the access specifiers on the cursors. I recommend using my patched cindex.py to access this information ((I submitted a patch to the clang frontend project during writing the article and I will update the post if it gets approved)). The last revision of our Class constructor filters out the non-public members:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  
<pre>
 
  class Class(object):
 
      def __init__(self, cursor):
 
          self.name = cursor.spelling
 
          self.functions = []
 
          self.annotations = get_annotations(cursor)
 
   
 
          for c in cursor.get_children():
 
              if (c.kind == clang.cindex.CursorKind.CXX_METHOD and
 
                  c.access_specifier == clang.cindex.AccessSpecifier.PUBLIC):
 
                  f = Function(c)
 
                  self.functions.append(f)
 
  </pre>

With this, our solution is pretty much complete. The generated source will look like the one we wrote by hand. You can find the full, integrated project in the git repository.

Vessel Post Mortem: Part 1

Original Author: Tony Albrecht

(This article has also been posted at Overbyte.com.au and is the first part of a 3 part series)

I started looking at Vessel in January 2013 – initially just in my evenings. In my ‘spare’ time. I was looking at it because John and Martin (the founders of Strange Loop Games and the creators of Vessel) had asked me to consider porting it to PS3. We knew each other from our time together at Pandemic Studios in Brisbane where I had worked closely with Martin in the Engine team while John worked on AI. I was keen to do the port but I wanted to be as sure as possible that I knew how much work was involved in it before committing.

Vessel Screenshot

Vessel in action.

I found the idea of porting such a game daunting – it had custom rendering engine, a bespoke fluid physics system with major parts of the game code written in Lua – but I figured it was time to step up and take on such a challenge. I have an immense amount of respect for John and Martin – they are both incredibly smart coders and I relished the chance of working with them again. I wanted to do it but I didn’t want to let them down. They were going to be my clients, but they were also friends.

I started by looking at their code – in order to give a quote on the porting process I needed to have the game running so that I could profile it. Vessel had already been partially ported to the PS3 by a third party, but wasn’t compiling when I first got my hands on it. So I spent a few weeks of long evenings fixing the FileIO, the template compilation errors (GCC and MSVC disagree on some of the finer points of templates), and just commenting out anything I didn’t think was relevant in order to get the game up and running. Once it was working, I ran my trusty profiler over the game and tried to guess how much time it would take to get this baby to a shippable state. What I found was a little scary.


Visualisation of the code changes made during the porting of Vessel to PS3.

Vessel runs predominantly on three threads. The game thread which runs all the Lua scripts which control the majority of the game logic and in game objects, the render thread which takes the output of the game thread and converts it to a format that can be fed to the SPUs which then build the graphics command buffer and feed it to the GPU in parallel, plus the physics thread which manages all the physics processing – fluid dynamics, rigid body collisions, everything. John had built the physics system himself and it was pretty impressive (and complicated). It was too much to grok initially, so all I could do was look at the game’s performance on some of the worst levels, profile it there and see how much I needed to speed it up.

I investigated a few of the levels on the PS3 build and picked two that I thought were the worst. What they told me was that the physics thread was the primary cause of slowdown and so I concentrated my analysis on that thread. The game was running at about 60ms per frame in the worst levels I was testing – about 15 to 20 frames per second. From discussions with Strange Loop, I knew that the physics thread had to run at 60fps so this meant that I needed a factor of 4 speed up. From experience, I knew that this was something that was achievable. The game thread was running at around 30ms per frame and the Render thread was clocking in at around 5ms per frame. SPU rendering was pretty fast, and I figured that wouldn’t be much of a problem.

Months of fannying about took place and a contract was finally signed in June. So I did the only appropriate thing one could do in that situation – I did two frantic weeks of work on Vessel and then went on a 6 week family holiday to the UK.

DragonPetting2

The kids and I carefully feeding an English Dragon.

In anticipation of the Vessel contract I’d hired an experienced programmer, Ben Driehuis. Ben came highly recommended and had worked on the Bioshock series of games, plus, most importantly, he was living in the same city as me. Having spent the last 6 years working remotely I knew that my best bet of delivering this game on time was to be in the same physical space as my team. I threw Ben in the deep end. He was hired and then given work on platforms and engines that he’d never had to deal with before. Then when I buggered off to the UK he had to manage by himself, dealing with a performance contract on a big name title and then kicking on with Vessel in my absence. Poor bugger.

I also had a friend who was interested in the QA side of things, so James (an old Uni mate) was responsible for playing Vessel over and over and over again, reporting bugs to us and telling us that the performance improvements we’d just made hadn’t made any difference. James also helped us out on the testing for the TRC (Technical Requirements Checklist – a massive list of tests that need to be checked off in order to get your game through submission to Sony) and the actual submission of the game.

Once I got back from my holiday we moved into the new office and started working in earnest on the Vessel port. Our goal was to finish Vessel by the end of August – we had about four man-months to do the work, and I had carefully calculated how much time each section of the port would take. Surely that was enough time?

Ben and I divided up the work between us: I took on the optimisation of the physics and Ben was to deal with pretty much everything else – audio was broken, some of the in game objects didn’t animate, loading needed to be optimised, memory usage had to be reduced, we needed save games and trophies and a whole swathe of PS3 specific features implemented. Ben has written about some of the fun he had during the port here.

From my initial inspection of the code, I had noticed a lot of memory allocations taking place, so I was optimistic that I could reduce (or remove) these and scrape back a lot of frame time. I also figured that rearranging the memory usage to be more cache friendly would help the PlayStation 3’s main processor so I could leave the core of the physics code alone. The last thing I wanted to do was to port all of the physics to SPU.

Three weeks later I started porting the physics to SPU. The changes I was making were incrementally improving the performance, but I could see that we’d never hit the speed we needed without dramatic changes. Porting code to SPU isn’t like porting code from one platform to another. You can program SPUs in C++, but they have very limited memory available to them – only 256kb each. So, in order to port any significantly complex task, you need to understand how that task uses memory – where it is, what order it reads it, where it writes it – the things that most programmers of high level languages take for granted. Not only that, but you have to consider how the task will parallelise. Running on one SPU is good, but running on 6 is awesome. (I’ll go into more detail on the SPU porting process in later articles.)

The first SPU task I built was one that looked for drops that were near a set of objects. My first shot at this shaved almost 2ms off the frame time – the function in question dropped from a peak of 2.4ms to 0.6ms! That meant that in order to hit 16ms per frame I just needed to do this 21 more times!

A perilous journey

A perilous journey awaits.

Still, Ben and I kept hammering away at the code. By the end of the first week in August I’d shaved 15ms off the Physics thread and yet that thread remained the bottleneck. Ben had freed up over 200MB of memory and we were close to running on a retail PS3. But we’d started to notice a few things – some of the levers and triggers in the game weren’t working, fluid shaders weren’t working properly, we were getting spikes of over 200ms in some cases and the game still wasn’t playable due to not just the levers not working but the AI of the fluros seemed to be broken (some would teleport away).

It was evident at this point that we weren’t going to hit the end of August deadline. Other projects had taken some time from both Ben and myself, so we still had billable time available going into September, so I let Strange Loop know that things were going to take a little longer than I had expected. They were very understanding and comfortable with the release date pushing out a little more.

Things were progressing well though – by the end of August the physics thread was running at better than 33ms per frame – we were over halfway there! Save games were in place, loading was optimised, and trophies were functioning. Unfortunately, the game thread was slowing down – as Ben was fixing problems with the code and data, more things were being processed. For example, audio wasn’t working before and now there was a lot of audio being triggered so this obviously took more processing time.

It was the second week in September when I realised that I’d made a horribly flawed assumption about the game thread in Vessel. I discovered that it had to function in lock step with the physics thread and so it too had to run at 16ms per frame. Reducing the framerate of the physics wasn’t an option as it was fixed to 16ms per frame – experimentation with varying that resulted in fluid collisions breaking horribly.

So, at that point we had the physics running at sub-30ms per frame, and the game thread running at 30ms and above. We were close, but we still had a long way to go – we now had to figure out how to optimise the Lua heavy game thread down to 16ms per frame. Actually, it had to be faster than that – as the PS3 has only two hardware threads, the 3 software threads have to share those two hardware threads which meant that the game thread plus the render thread had to be faster than 16ms per frame.

Then Ben discovered the reason for the teleporting fluros – a compiler bug triggered only in optimised builds was causing fluros to incorrectly calculate their trajectory during jumps, making them disappear. Turning off optimisation for that function fixed the problem. That was a tricky bug to find, so we were pretty stoked – until we realised that this meant that the number of fluros in view was now higher that it was before. Much higher. More fluros means more processing – more physics, more drops, more rendering, more audio. This one fix resulted in the Physics thread jumping back up to 50ms from around 24ms, the render thread jumping up to 12ms and the Game thread climbing to 40ms.

I was horrified. We were effectively back to where we had started. We now needed to save a total of 60ms over three threads. I wasn’t sure we could do it. It was like that bit in horror movies where the protagonist has killed the evil monster, only to have it rise up, stronger than it was before.

Revision visualisation of the code and data for the Vessel port.

The initial optimisations on a project are the easiest. As time goes on and all the low hanging fruit has been picked, the remaining optimisations are harder and harder to perform. We really had our work cut out for us – and we’d come too far to go back now.

To be continued…

Fragments of Him prototype – designer’s post-mortem: Scope, sexuality, and scripts

Original Author: Tino van der Kraan

<foreword>Hello dear #AltDev community. Thank you for including me into this community and hopefully your trust will not be misplaced in allowing me to submit my contributions to this blog. My name is Tino and I write from the perspective of a co-founder and creative at a young independent game development studio called SassyBot Studio. What you can read below is a look back on the narrative experience ‘Fragments of Him’, a well received Ludum Dare game that is currently in development to be a full title. Hopefully, the contents of this blog will be valuable to you.</foreword>

A little over nine months ago, Elwin (lead programmer) and I put our faith in Mata Haggis as he proposed that we should make a narrative game for the, back then, upcoming Ludum Dare game jam challenge. Ludum Dare is an online game jam event where developers around the world create a game, either alone (48 hour time limit) or in a small team (72 hour time limit), under great time pressure. These games are encouraged to follow a theme that is announced at the start of the event.

The game that Mata had in mind was to be a narrative game. With the little game jam experience we had back then, we knew the scope had to be manageable if we were to finish the game at all. Mata explained we needed 3D characters, fully decorated indoor and outdoor scenes, several written and voiced dialogue lines for multiple playthroughs, and shader magic for some of the interactions. As you can imagine, we told him he was mad.

As with any mad doctor, he convinced us that this could be done and we were willing to give him the benefit of the doubt. In retrospect, the faith has not been misplaced. As of this writing, Fragments of Him has been played over 28.000 times on Fragments of Him funded it might be a good idea to reflect on how the initial prototype came to be.

FoH8

Fragments of Him (Work in progress)

We are still very early in development and we do not want to reveal too much. What we can share is the complete post-mortem of Dr. Mata Haggis (@MataHaggis) , written roughly a week after the game jam. It takes you through the process that Mata used to create the prototype that is the foundation for the upcoming full version.

All text onwards is written by Dr. Mata Haggis.

 

Introduction

My name is Dr. Mata Haggis and I was the narrative & game designer/producer on Fragments of Him, our entry to Ludum Dare 26.

I had never done a game jam before, so this was a new experience for me. I was fortunate enough to have two talented developers ask me to join their team. Between us we created a narrative game experience with one programmer, one artist (and his 3D modeller girlfriend for an evening), and myself in only 72 hours.

The game jam that enabled us to conceive Fragments of Him

Below you will find out what a narrative designer does on a game jam, what a producer can do, and a little more about the decisions that I made when creating the title.

Before we go any further you should play the game:

Play Fragments of Him on Kongregate

Seriously. I mean it. This post will have a lot of spoilers very soon, so go play the game now and come back in ten minutes.

Done that?

Really?

Okay, let’s talk about the process.

 

Starting

The theme was minimalism, and I wanted to do a game with a very prominent story in it. I worked on several ideas in my head, but the one that stuck was investigating why a person would choose to live in a minimalist style in their house. The result of this was the idea that the lead character had lost their partner and couldn’t stand to be reminded of the loss.

Fragments_of_Him-Gameplay

Gameplay of the Fragments of Him prototype

In narrative terminology, the loss of the partner would be referred to as ‘the inciting incident’. The process of creating minimalist spaces gave me a gameplay mechanic too.

I pitched the concept to the team – one artist and one programmer – and I was fortunate that they were enthusiastic about it. There were several key decisions that are worth examining here:

 

Scope

The theme was going to need a lot of art. Specifically, the world was going to need a lot of models in it. I immediately said that the world was going to be stylised and colour coded: the protagonist (the ‘hero’ of the story) would be blue, and yellow would be used to indicate the dead partner, along with objects related to that partner. Everything else in the game was going to be white – this would allow the artist to focus on building the space and objects without worrying about texturing any of it.

In terms of showing figures in the environment, again I needed to keep a close eye on the scope of the project. I asked the artist to create one generic character model which would be used for both the protagonist and the dead partner. In every scene, these figures would be posed in a tableau (a static pose that suggests narrative action). In this way we could give a powerful idea of character relationships without the difficulty of animating figures.

For the programmer, there were a few challenges that I had to consider. The audio and subtitles would need to be displayed, there would need to be highlighting and signposting of affordances, and the most difficult task was probably going to be the final scenes where the gameplay mechanic (clicking to remove objects) is reversed. By keeping these interactions very simple, I could limit the complexity of the task that I was giving to the programmer.

 

Sexuality

I voiced the lead character of the game, and I am male. In the game, the character talks about his dead boyfriend. I felt that the story would work perfectly with either a male or female partner character, but I also feel that non-heterosexual relationships are under-represented in gaming, and so I had a preference for making both characters male.

I described the outline of the story to the team and they had no feelings either way on the gender of the partner, and so we ended up with a story about coping with grief, where the lead characters happen to both be male.

When stories are told about the death of a non-heterosexual man (it is not defined if he was gay or bisexual), they often focus on stereotypical perceptions of gay lifestyle choices: drugs, promiscuity, clubbing, and of course HIV/AIDS. I didn’t want to tell a gay love story; I just wanted to tell a love story.

Fragments_of_Him-ApartmentMood

The reminders hurt too much.

I know that the audience for any story will be predominantly heterosexual, with then lower proportions of gay, bisexual, transgender, and other queer-identifying players. Part of my goal was to ignore the non-heterosexual elements and to write a story that anyone could relate to; in doing this, I wanted to completely normalise the lives of the men. To do this, I choose to focus on the small and relatable things in people’s lives.

I suspect that anyone who has had a break-up, especially after living with a partner, can relate to the quiet sadness of removing one towel from the bathroom.  I think that feeling of sorrow is a universal experience that has nothing to do with sexual preferences, and that is what I wanted to convey in this game. In doing this, the sexuality of the characters became irrelevant in the sea of everyday memories.

As a side-benefit to the choice of going for a homosexual relationship, the artist only needed to make one body and animation rig, saving him a lot of time in the game jam time constraints!

 

How to write a good story quickly

I’ve been writing for several years and have cobbled together a system which works well for me. I’ve based it on several sources, but the main ones are ‘Will Write for Shoes’ by Cathy Yardley.

Neither of these are high-brow books on how to create your epic masterpiece, but they are very focussed on creating a tight, enjoyable story.

I put ideas from the books together and now here’s what I use whenever I start writing:

Before the inciting incident: ‘Save The Cat’

Show the life of the character before life goes wrong. The character does something that makes you like them (‘save the cat’).

5% Inciting incident

 Something changes that forces the protagonist to act.

25% Plot point one: state the external motivation

What forces the protagonist to make this clear statement of their objective?

50% Plot point two: the low mid-point

It appears impossible to complete the external motivation, protagonist loses hope

75% Plot point three: Hope

The protagonist is given hope that they can fulfil their external motivation goal, but only if they truly dedicate themselves to it.

90 – 95% “The Black Moment”

The external motivation appears impossible to fulfil.

95% – 100% Resolution

The story concludes in a satisfying manner – this may be successful completion of the external and internal motivations (a happy ending), it may be a failure on external motivation but a success in the internal motivation (common in comedies, romance, or tales of self-discovery), success of external motivation but failure of internal motivation (common in tragedy and tales of self-discovery). It is not typical for a story to end with failure of both external and internal motivation – this is the total failure of the character to grow or succeed and makes an audience wonder why they spent their time with the character.

I use this whenever I write and it’s working out pretty well for me so far!

In the case of Fragments of Him, as with other stories I write, I began from a feeling and worked back to an inciting incident. The feeling was a person clearing away all of their belongings to create a minimalist living space – why would they do this? This question led back to the inciting incident – the objects were related to grief at the sudden death of a partner.

From there I worked through the template, filling in the gaps. For Fragments of him, it looks like this:

Before the inciting incident: ‘Save The Cat’

Scene – Park. Feeding ducks, narrator talks about how good life is.Two characters – protagonist is blue, the ex is yellow. All other objects are yellow too.The player clicks on objects (or parts of objects) in the scene, they turn transparent.

5% Inciting incident

End of first level – the player has been removing the polygons, when everything is transparent except for the main character – the partner dies.

25% Plot point one: state the external motivation

Scene – House. Protagonist wants to remove all traces of the ex from his life.

50% Plot point two: the low mid-point

Scene – Street . It appears impossible to remove everything – everywhere he goes there are reminders. He doesn’t want to go outside.

75% Plot point three: Hope

Scene – Office. If he can remove everything from his interior spaces he feels like he might be able to cope.

90 – 95% “The Black Moment”

Scene – back in the empty House. Protagonist is sitting on the floor. Ex appears behind… The protagonist feels like he will never be free of the memories, even when everything is gone.

95% – 100% Resolution

As the player clicks to remove the ex from the House scene, the protagonist’s colour changes, blending the two into green – the ex has become part of the protagonist. The player clicks through the scenes, where the transparent objects are back. As the player clicks on the transparent objects they turn green too. This goes much faster than the removal.The protagonist understands that the ex is part of him. Things are different, but life will go on in a new way.

As you can see, I’ve used the basic structure from the template to create a narrative arc that is satisfying and also that integrates with the gameplay – at each step I have made sure the interaction with the game adds to the plot.

I chose locations where it would be viable to have few characters in the scene because of the limitations on the art scope.

Fragments_of_Him-ParkMood

The park scene of Fragments of Him

 

Dialogue

There are three variations of most of the instances of dialogue in the game. These are chosen randomly during each play through, giving a slightly different experience for each time.

The dialogue was written with three key events in mind:

  • The start of a level
  • Removing a particular object or a percentage of objects
  • The end of a level

The start and end triggers would give the main plot points, and the objects would trigger smaller memories.

I recorded the audio in a make-shift audio booth constructed from a pair of curtains and a clothes rack on a €100 digital microphone. It’s not ideal, but it did the trick. I then edited the sound files into one or two sentence chunks with some antiquated software. This was very laborious and time consuming for me, but sometimes design requires this kind of repetitive grunt work to get the project done.

 

Other audio

Ambient audio and music is absolutely essential in selling any emotional experience: I believed this going into this project and now I am utterly convinced of this. In every scene there are several audio triggers built into the environment that work in a very subtle way to make the spaces feel more believable.

I have a suspicion that the audio space of a game may be more important than the visual style when it comes to creating emotional resonance. Most of this work will never be noticed by the player on a conscious level, but that it exists in the mix is important. In the apartment, did you hear the muffled footsteps of a neighbour going down the hall? Or in the office, did you notice the sound of typing outside the room, or the noise of a plane flying past overheard? Probably not, but they’re there, and they help you feel that the space you are in is alive.

For the music, I found a wonderful website of free music: Fragments_of_Him-4

Everything makes me think of Him