Q-003: What is one mistake you made recently…?

Original Author: Mike Acton

#2 Not fighting hard enough for ideas/features/etc.

Tue Aug 21 16:05:17 +0000 2012

@Fersis
Fersis

@mike_acton as an indie studio, if you run across some $ pay for work as you go, not for ‘future’ work that might not happen…

Tue Aug 21 16:16:01 +0000 2012

@ZookeyTK
Joshua Hughes

@mike_acton I read the comments section of a games article with mild feminist tones. Unfortunately I never seem to learn 🙁

Tue Aug 21 16:28:03 +0000 2012

@Wertle
Lisa Brown

@mike_acton: Forgetting scaling for a tool. “Wait, we have to think of when it’s going to be used by 50 people with plenty of assets!”

Tue Aug 21 16:29:38 +0000 2012

@Zavie
Julien Guertault

@Wertle Ouch. I think you should seriously consider adding masochist among your list of qualities.

Tue Aug 21 16:30:37 +0000 2012

@mike_acton
Mike Acton

@mike_acton Recursive behavior: waiting too long to make a hard decision I know, I had to make. Break your behavior patterns?!

Tue Aug 21 16:38:15 +0000 2012

@init2
Alex

@Zavie Haha. Or the opposite: “Wait. What? You’re fucking telling me that only ONE person will use this? I would’ve just used a text file!”

Tue Aug 21 17:03:23 +0000 2012

@mike_acton
Mike Acton

@mike_acton

Tue Aug 21 17:03:52 +0000 2012

@bentrem
Ben Tremblay

@mike_acton: but it sounded fun to write!

Tue Aug 21 17:07:25 +0000 2012

@Zavie
Julien Guertault

@mike_acton: but seriously, there are plenty of things perfectly fine if people work on their own, but that won’t do with a team.

Tue Aug 21 17:09:24 +0000 2012

@Zavie
Julien Guertault

@mike_acton: and as a developer testing while writing the actual thing, it’s easy to forget the real use case, say one year from now.

Tue Aug 21 17:10:27 +0000 2012

@Zavie
Julien Guertault

@bentrem of course, it’s not. I was using “recursive” in a wrong/metaphorical way.”Repetitive” was the accurate word. DisruptiveWritingStyle

Tue Aug 21 22:06:12 +0000 2012

@init2
Alex

@mike_acton saying and thinking “no, no. I’ve got this!” when you haven’t. Illness + creative development = stress = more illness – progress

Wed Aug 22 02:32:58 +0000 2012

@DoctorMikeReddy
Dr. Mike Reddy

#1 is one of my own biggest problems. And I’m pretty sure I still do it a ton *less* than most.

Wed Aug 22 06:32:20 +0000 2012

@mike_acton
Mike Acton

@ozzy_at_work Your mistake was to read the manual? 🙂

Wed Aug 22 06:32:47 +0000 2012

@mike_acton
Mike Acton

Q-002: What’s one article or book you found particularly influential…?

Original Author: Mike Acton

Recently I’ve found myself returning to Anne Bogart’s A Director Prepares multiple times for thoughts related to direction of live theater, which I’ve found to be extremely relevant to game development. It’s a book I’ve recommended now many times!

The conversation from Twitter:

@mike_acton Steve McConnell’s Rapid Development is one.

Tue Aug 21 11:04:04 +0000 2012

@JurieOnGames
Jurie Horneman

@mike_acton Not really an article but the Arkham Asylum DICE talk by Sefton Hill was great. Second (for me) is Ed Catmull’s Stanford talk.

Tue Aug 21 11:07:07 +0000 2012

@Stitched
Peter Saumur

@SteveSwink is the one design book I always keep at work for reference & inspiration.

Tue Aug 21 11:07:35 +0000 2012

@lightbombmike
Mike Jungbluth

@mike_acton What kind of topic are you looking for?

Tue Aug 21 11:08:36 +0000 2012

@rupazero
Zoya Street

@mike_acton Edsger Dijkstra’s Turing Lecture

Tue Aug 21 11:10:41 +0000 2012

@brett_douville
brett_douville

@mike_acton I wish I liked the second game. Too complex.

Tue Aug 21 11:15:16 +0000 2012

@locust9
David Goldfarb

Tue Aug 21 11:34:15 +0000 2012

@pixelmager
Mikkel Gjoel

@mike_acton Computer Graphics Principles and Practice. Foley. van Dam et al might be old but stuff worth a place on the shelf

Tue Aug 21 11:36:34 +0000 2012

@DeanoC
Deano Calver

@mike_acton Blinn’s Through the graphics pipeline (and 2 others), I lost my set years ago and often still want to refer to bits.

Tue Aug 21 11:42:16 +0000 2012

@DeanoC
Deano Calver

@mike_acton Nothing in the last 5-6 years. Most things things just get a single read (if that).

Tue Aug 21 14:48:19 +0000 2012

@noel_llopis
Noel Llopis

Tue Aug 21 14:51:03 +0000 2012

@noel_llopis
Noel Llopis

@mike_acton Good question. I know two books that I will re-read and that is “The Pragmatic Programmer” and “Thinking and Leaning”

Tue Aug 21 14:51:44 +0000 2012

@daniel_collin
Daniel Collin

Tue Aug 21 14:55:06 +0000 2012

@noel_llopis
Noel Llopis

@mike_acton One of my favorites is “The Pragmatic Programmer” by Hunt & Thomas.

Tue Aug 21 15:14:43 +0000 2012

@niklasfrykholm
Niklas Frykholm

@mike_acton Looking forward to finishing this game so I can write more articles. Way too busy but I have a lot of material 🙂

Tue Aug 21 16:01:09 +0000 2012

@gafferongames
Glenn Fiedler

@mike_acton Abrash’s Black Book. Far beyond the graphics content I think it really nails the mindset required when optimizing code.

Tue Aug 21 16:04:31 +0000 2012

@m18e
Max Burke

@pixelmager Haven’t referred to the Foley & van Dam for years, now that I think of it. May need to flip it open again just to see.

Tue Aug 21 16:27:02 +0000 2012

@mike_acton
Mike Acton

@rupazero No particular topic. Just curious. I think a person’s choice says a lot. 🙂

Tue Aug 21 16:28:23 +0000 2012

@mike_acton
Mike Acton

@DeanoC I left my copy in the library at splash – had only opened it once since uni, fruitlessly.

Tue Aug 21 16:34:06 +0000 2012

@pixelmager
Mikkel Gjoel

Q-001: Where do you feel like you are being held back by the status quo in #gamedev?

Original Author: Mike Acton

This was the question that started my twitter #gamedev Q&A series. As of right now we’re just shy of having discussed 100 questions on twitter. Feel free to join in!

#gamedev?

— Mike Acton (@mike_acton) August 21, 2012

The conversation from Twitter:

@mike_acton There are a lot of industry beliefs about “what the player wants” that are limiting a lot of design choices.

Tue Aug 21 13:37:54 +0000 2012

@IADaveMark
Dave Mark

@IADaveMark Hard to enumerate common assumptions, but it’d be interesting to try.

Tue Aug 21 16:24:12 +0000 2012

@mike_acton
Mike Acton

@mike_acton From my own discipline “Players don’t want good AI.” While they don’t want to get *crushed* by AI, that doesn’t indict ALL AI.

Tue Aug 21 16:25:53 +0000 2012

@IADaveMark
Dave Mark

@mike_acton At some point “good AI” got defined as “AI that plays flawlessly” instead of “AI that plays like a human”.

Tue Aug 21 16:46:46 +0000 2012

@CastIrony
Joel Bernstein

@mike_acton if you think players don’t want good AI, your definition of what makes an AI good is flawed.

Tue Aug 21 17:07:11 +0000 2012

@slicedlime
Mikael Hedberg

@mike_acton I assume that wasn’t directed at me but rather at the people whose mentality I was paraphrasing?

Tue Aug 21 19:51:29 +0000 2012

@IADaveMark
Dave Mark

@mike_acton correct

Tue Aug 21 19:53:30 +0000 2012

@slicedlime
Mikael Hedberg

.@IADaveMark to be fair, most players have no clue what AI is, nor what GAME AI is, or is supposed to do (more than one job, etc)?

Tue Aug 21 22:14:00 +0000 2012

@init2
Alex

@init2 however, these are actual designers and producers etc that are making these statements.

Tue Aug 21 22:14:58 +0000 2012

@IADaveMark
Dave Mark

.@IADaveMark everytime s/b utters “dumb AI” (& leaves it at that) I stop listening/reading the review.

Tue Aug 21 22:16:32 +0000 2012

@init2
Alex

@IADaveMark oops! 😉

Tue Aug 21 22:17:11 +0000 2012

@init2
Alex

@mike_acton‘s thread — what mentalities are holding us back in game dev?

Tue Aug 21 22:46:56 +0000 2012

@IADaveMark
Dave Mark

Showcase Your Student Projects at AltDev Student Summit

Original Author: AltDevConf

The AltDev Student Summit is a chance for students, regardless of location, to engage with veteran game developers from some of the best companies around the world and learn first-hand what being a part of our industry is really like. It will take place November 10th and 11th, and is all held online, through Google’s Hangouts On Air system.

As part of this exciting event, we are offering you the chance to present your current project to a panel of industry experts and receive feedback live during the conference.

We are looking for projects being undertaken by students or teams of students. People have asked how we’re defining students, so what we’re going with is that we’re opening this up to any person registered as a full-time undergraduate, graduate or doctoral (or equivalent) student. Projects should be sufficiently developed that they can be demonstrated in some fashion – this is not intended for concepts, but for in-progress projects.

The format will be similar to the “Dragon’s Den” show, so you will have the opportunity to describe your project and team and make your “pitch” to the panel. They will then ask questions to learn more about specific aspects of your project and give you some advice on how you can improve. After they have heard all the pitches, the panel will each have the opportunity to “invest” an imaginary $10,000 in any of the projects they’ve seen, and will be asked to justify their choices.

This is an excellent opportunity for you as students to get your current project in front of a lot of spectators and to make an impression on people already within the industry. Similar events have seen these fictious offers of funding quickly turn into real business opportunities for the participants, but even if it doesn’t, it’s an excellent chance to practice pitching your projects.

The deadline for submission is the 13th of October. We hope you will consider applying to be a part of this!

For members of industry, we are still recruiting for judges for this event – please fill in this form!

Flow – A Coroutine Kernel for .Net

Original Author: Christian Schladetsch

Introduction

This post will present a small library called Flow that abuses .Net’s IEnumerable functionality, providing a Kernel for cooperative multitasking based on the concept of coroutines.

The concepts of Timer, Future<T>, Channel<T>, Barrier and Trigger, are introduced as well as process Nodes and Groups.

All these ideas are wrapped within the context of a real-time cooperative Kernel.

Coroutines here are first-class objects that can be passed as arguments and returned as results.

No Singletons were harmed, or used, in the creation of this library.

The library is tiny at 30k, and builds when targeting .Net 2.0.

Complete Boost License.

Please send any bugs or feature requests to the author.

Motivational Example

To get started, let’s have a look at some code that uses the Flow library. See also the test source code for this example:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  
public IEnumerator Consumer(IGenerator self, IChannel<int> channel)
 
  {
 
  	while (true) 
 
  	{
 
  		IFuture<int> next = channel.Extract();       // get a future value from the channel
 
  		yield return self.ResumeAfter(next);         // wait for the future to exist, or fail
 
  		if (!next.Available)               
 
  			yield break;                         // there was no value, Complete
 
  		Sum += next.Value;                           // consume it and continue
 
  	}
 
  }

Here we have the classic consumer/producer flow. We will get to the details later, but for now let’s get acquainted with the look and feel. Note that IGenerator, IChannel<T> and IFuture<T> are all in namespace Flow.

The consumer is the Coroutine passed as the first argument, and the producer is modelled as a Channel of integers passed as the second argument. The consumer repeatedly extracts the next element from the Channel until the Channel is Completed.

This is done by using the yield command. Line 6: ‘yield return self.ResumeAfter(next)’ is to be read as “wait here until we get a value, or the channel is deleted”.

After flow returns to the Coroutine, we test to see if the Future has been set by testing its Available property. If the Future is not Available, the flow is halted and the Coroutine is Completed by the yield break statement.

Otherwise, we consume the new value by summing it with all previous values, then the flow continues.

If this seems very strange, don’t worry, there’s a lot of new concepts introduced here. We will be discussing these concepts in depth in the following sections, but first we will talk a little about the overall structure of the library and the core ideas.

Architecture

Before we delve into the implementation, we will take a birds-eye view of the library and how it is organised. It is quite straight-forward.

All of the publicly visible elements of the library are exposed as interfaces. The implementations are internal to the library. This decouples the client code from the implementation, and will make future changes easier to roll out. More importantly however, the use of interfaces allows us to designate a precise level of constraint for objects as they are passed around the system and as process flows merge and diverge.

Core Concepts

At its heart, the system is based on the core idea of a Transient object. A Transient object is Active as soon as it is created, and remains so until it is Completed. When we Complete a Transient object, the object will fire its Completed event and set its Active property to false. A Transient that has been Completed will do no more work of its own accord. It will remain in effective limbo until collected by the .Net runtime.

We can chain the Completion of two Transient objects A and B by writing A.CompleteAfter(B). We can also delay Completion of a Transient by writing A.CompleteAfter(TimeSpan).

Almost all objects in the library implement ITransient, including IFuture<T>. This interface represents a potential Future value that has not Arrived yet. When the value is eventually set, the future will fire its Arrived event, then Complete itself. This is what was going on in Line 6 in the first example, where the consumer Coroutine was waiting for the Future to be Completed. Completing a Transient object multiple times does nothing – only the first Completion will fire the Completed event.

Another key concept is Generator, which is also a Transient. A Generator can be Suspended and Resumed. Generators do some work every time their Step method is called, unless it is Suspended or Completed. The result of that work is stored in its Value property.

Subroutine derive from Generator. The key difference between them is how work is done during the Step. For a Subroutine, the work is simply a method call. For a Coroutine, the work is to resume the Coroutine from the point of its last yield – or from the start of its method if it hasn’t been Stepped before.

Groups and Nodes

So far we have spoken about Transients, Futures, and Generators, but to manage them we need a few more concepts. First we have the idea of a Group, which contains a collection of other Transients, and fires events (Added, Removed) when the contents of the group changes. A Group is also a Generator, and when the Group is Suspended, it Suspends all Generators that it contains. Similarly, when a Group is Resumed, it Resumes all Generators within it. Stepping a Group does no work.

Then we have a implementation, it also Steps all Generators within it. Note that Nodes are themselves Transients, so they can form a process flow hierarchy.

We also have Triggers, both of which are also Groups. A Barrier is a Group that Completes itself when all added Transients have been Completed. A Trigger Completes itself when any of the objects in it are Completed.

You would use a Barrier when you want to pause execution until a collection of Transients have been Completed. An example may be waiting to start a game:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  28
 
  29
 
  30
 
  31
 
  32
 
  33
 
  34
 
  
IEnumerator<bool> StartGame(IGenerator self, IEnumerable players, TimeSpan waitTime)
 
  {
 
       ITimedBarrier barrier = self.Factory.NewTimedBarrier(waitTime);
 
       var acceptance = new List<IFuture>();
 
   
 
       // add each players' acceptance into a barrier
 
       foreach (var player in players)
 
       {
 
            IFuture<bool> accept = player.RequestAccept(self);  // send a request up to the UI
 
            acceptance.Add(accept);                             // keep a record of the future
 
            barrier.Add(accept);                                // add it to the barrier
 
       }
 
   
 
       // wait for the barrier to Complete: this may be due to a timeout, or all elements being Completed
 
       yield return self.ResumeAfter(barrier);
 
   
 
       // if the barrier timed out, not all players accepted in time
 
       if (barrier.HasTimedOut) 
 
            yield break;
 
   
 
       // if any player did not accept, do not start
 
       foreach (var accept in acceptance)
 
            if (!accept.Available || !accept.Value) 
 
                 yield break;
 
   
 
       // run the game
 
       yield return self.ResumeAfter(RunGame());
 
   
 
       // end the game
 
       yield return self.ResumeAfter(EndGame());
 
   
 
       // reset game for next start
 
       Reset();
 
  }

Say you want to pause game flow until any player presses a button:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  
IEnumerator<bool> WaitForAnyPlayerPress(IGenerator self, IEnumerable players)
 
  {
 
      var trigger = self.Factory.NewTrigger();
 
      foreach (var player in players)
 
      {
 
          trigger.Add(player.RequestPress());       // push request to present and push button up to the user interface
 
      }
 
   
 
      yield return self.ResumeAfter(trigger);       // wait until any player presses a button
 
      var firstPlayer = trigger.Reason;
 
      // do something with knowledge that 'firstPlayer' was the first to press the button
 
  }

Timers

Then we have a one-shot Periodic Timer that regularly fires its Elapsed event.

Summary

There are other little bits and pieces, but these are the core concepts in the framework. I realise that’s a lot of information, but some examples are coming! In the meantime, you can always just read the test suite.

In summary so far, the Flow Library consists of a set of interfaces based on ITransient. From this we have Generators that can be Suspended and Resumed (Coroutines and Subroutines), Groups that contain other Transients (Barriers, Triggers and Nodes), and two timers: a one-shot and a periodic.

Communications

implementation.

Channels are used for inter-Coroutine communication. See the test suites for Channels for more details.

At the Top

This is all wrapped up in a top-level Factory for making new objects, and a Root Node that is Stepped when the Kernel is Stepped.

To make a new Kernel, use var kernel = Flow.Create.NewKernel();

From there, the Factory is available via kernel.Factory. Each ITransient also has access to the Kernel and Factory that made it.

To tick things over, simply call kernel.Step(). This will give every Generator that has been created by the Kernel a chance to do some work.

Why Bother?

Transients, Generators, Nodes, Barriers, Channels… Oh My! Is all this stuff really needed?

Programming real-time systems such as games or distributed networked object models requires dealing with asynchronous events. These events may be user input, the result of other software processes, network input, or other hardware-based events. This is probably a good time to plug a previous article of mine on C++ events.

However, it’s not just spontaneous events that we are interested in. In order to reduce complexity and improve readability, we also need to be able to defer continuation of the current flow until some other process has completed. Here’s another motivational example, this time completely hypothetical:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  
IEnumerator RollDice(IGenerator self, IPlayer player)
 
  {
 
       IFuture<int> roll = player.RequestRoll();  // push the request up to the user interface - the result is a future value
 
       yield return self.ResumeAfter(roll);       // wait for result
 
   
 
       if (!roll.Available) yield break;          // player cancelled the roll, or otherwise the roll didn't happen
 
   
 
       var action = game.ProcessRoll(player, roll.Value); // business logic on the roll result - return value is possibly another corotuine
 
       yield return self.ResumeAfter(action);     // wait for action to complete- maybe other players can interject, play other cards, who knows
 
   
 
       if (!action.Available) yield break;        // action was cancelled
 
   
 
       if (action.Value.Processed) RedrawCards(); // if the action was processed, then redraw the ui
 
  }

This example shows the general gist of how the Flow library is intended to be used. If you need something external to happen, you resume after it has been completed.

You do not have to preserve state between Update calls because there is no Update. You do not need switch statements to find out what state you were in when you left the last Update. The process just… flows.

Have you noticed that a lot of work in your Update() methods is done just to determine where you were when you last left the Update method? Tired of managing what ‘State’ you are in?

Here’s another example, this time for a hypothetical network model:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  
private IEnumerator AbortJob(IGenerator self, string machine, int jobNumber)
 
  {
 
      ITimedFuture<IPeer> peer = _peer.Connect(machine);        // connect to remote machine.
 
      yield return self.ResumeAfter(peer);                      // wait till we connect to the remote machine, or time-out
 
      if (!peer.Available || !peer.Value.Connected) 
 
           yield break;                                         // we failed to connect
 
   
 
      IFuture<IWorkerProxy> worker = peer.Value.CreateProxy<IWorkerProxy>("Worker"); // get a local proxy to a remote instance
 
      yield return self.ResumeAfter(worker);                    // wait for a response
 
   
 
      IFuture<IJobProxy> job = worker.Value.GetJob(jobNumber);  // query on job number
 
      yield return self.ResumeAfter(job);                       // wait for a response
 
   
 
      job.Value.Abort();                                        // kill the remote job
 
  }

Basically, this connects to a remote machine, queries it for an instance called ‘Worker’ of type IWorkerProxy, and uses that proxy to find a job with a given number, and then aborts that job.

As an exercise to the reader, I ask that you imagine what this would look like using threads.

You may be wondering ‘yeah but I’m not making a distributed game’ – well, perhaps that’s true, but even so the idea of retaining context between asynchronous calls, or between successive calls to Update(), is very expressive and makes code far easier to write and read.

What about Threads?

Writing non-trivial multi-threaded applications is hard. They are hard to write, read, test and maintain. Again, try to imagine what the previous examples would look like  without the ability to suspend local flow until a remote process or event completes.

Threads do not scale to tens of thousands. You may have 12 cores, but you can’t successfully deploy a system that has tens of thousands of threads. There is too much overhead per thread.

Threads can be inefficient – because you need to guard against resource contention, any shared data is expensive.

Have I mentioned that writing a non-trivial multi-threaded application that works is really hard?

Now, I realise that some of you will just think “yeah well, Christian just doesn’t understand how to write multi-threaded applications! It’s not all that hard”.

Yes, it really is hard to write correct multi-threaded applications. In any case, if you have a system such as a game that has thousands of entities, you cannot put each on its own thread, so you are stuck with convoluted flow-control and manual state management between Update calls. Sure, you can use callbacks and state machines and so on, but they become gnarly very quickly, and brittle, and error prone. And at best, what you will end up with is a poor-man’s Coroutine-based Kernel, even if you don’t realise it. As they say, within every large C program is a poorly written Lisp interpreter. And similarly, I claim that within any large interactive application is a poorly written Coroutine Kernel.

Coroutines are not a replacement for threads. One of the main advantages of coroutines is that they allow writing entity-logic as if each entity was on a thread – much state is stored in local variables, instead of storing/restoring state between update calls. But they avoid the race conditions that come with threads. Paraphrased from Bruce Dawson, Cygnus Software, in the author list for #AltDev in a thread on this article:

“I’d previously said that the reason to avoid threads was to avoid their cost, and this is part of the reason, but probably not the main one. The fact that coroutines are not threaded, and therefore don’t need to worry about race conditions, locks, concurrent access, etc., is a significant part of their benefit. It’s really enormously huge. Unfortunately that means that if you put coroutines on multiple threads (to get more throughput) you lose one of their main advantages”.

If you are happy using threads, I wish you well on your way.

For those of us that seek sanity, readability, testability, repeatability and efficiency, let’s have a look at how a first-class coroutine library can be implemented in .Net.

Implementation

The implementation of the library is quite simple. I encourage you to pull a copy of the source and just browse around the test suites and read some of the code. It is quite small and  readable. The best way to understand it really is to just read the code.

Here’s Transient.cs:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  28
 
  29
 
  30
 
  31
 
  32
 
  33
 
  34
 
  35
 
  36
 
  37
 
  38
 
  39
 
  40
 
  41
 
  42
 
  43
 
  44
 
  45
 
  
internal class Transient : ITransient
 
  {
 
      public event TransientHandler Completed;
 
   
 
      public IKernel Kernel { get; internal set; }
 
   
 
      public IFactory Factory  { get { return Kernel.Factory; } }
 
   
 
      public bool Active { get; private set; }
 
   
 
      internal Transient()
 
      {
 
          Active = true;
 
      }
 
   
 
      public void Complete()
 
      {
 
          if (!Active)
 
              return;
 
   
 
          if (Completed != null)
 
              Completed(this);
 
   
 
          Active = false;
 
      }
 
   
 
      public void CompleteAfter(ITransient other)
 
      {
 
          if (!Active || other == null)
 
              return;
 
   
 
          if (!other.Active) 
 
          {
 
              Complete();
 
              return;
 
          }
 
   
 
          other.Completed += tr => Complete();
 
      }
 
   
 
      public void CompleteAfter(TimeSpan span)
 
      {
 
          CompleteAfter(Factory.NewTimer(span));
 
      }
 
  }

I hope this is all very obvious. Pay attention however to the CompleteAfter method. Here, if we are given a non-null Transient that has already been Completed, then we immediately Complete ourself and move on. Otherwise, we add a hook into the other’s Completed event, which when fired will Complete this Transient as well.

Basically, not very interesting and I hope almost boring. There are very few tricks in the library in general, just a build up of core concepts within a solid framework. It may be alien at first, but rest assured if something goes wrong, since you have the source, it will be easy to debug.

Note though that you may well need to add a Debug Trace system to the raw source. I didn’t do so for brevity and clarity, but despite what I said above, when you have nested Nodes and Barriers and Futures, unwinding an error can be tedious without logging information. If I was going to extend this library further, the very next thing I would add would be a logging system.

As another example, here’s the default implementation for a Future<T> value:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  
internal class Future<T> : Transient, IFuture<T>
 
  {
 
  	public event FutureHandler<T> Arrived;
 
   
 
  	public bool Available { get; private set; }
 
   
 
  	public T Value 
 
  	{
 
  		get 
 
  		{
 
  			if (!Available)
 
  				throw new FutureNotSetException();
 
  			return _value;		
 
  		}
 
  		set 
 
  		{
 
  			if (Available)
 
  				throw new FutureAlreadySetException();
 
  			_value = value;
 
  			Available = true;
 
  			if (Arrived != null)
 
  				Arrived(this);
 
  			Complete();
 
  		}
 
  	}
 
  	private T _value;
 
  }

It is what it is, I am not sure how I can add anything by talking about it. Perhaps the implementation of Coroutine will be juicier?

Making Coroutines

This is the implementation for Coroutines (see source):

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  28
 
  29
 
  30
 
  31
 
  32
 
  
internal class Coroutine : Generator, ICoroutine
 
  { 
 
      public override void Step() 
 
      { 
 
          if (!Running || !Active) 
 
              return; 
 
          if (_enumerator == null) 
 
          { 
 
              if (Start == null) 
 
                  CannotStart(); 
 
              _enumerator = Start(); 
 
              if (_enumerator == null) 
 
                  CannotStart(); 
 
           } 
 
           if (!_enumerator.MoveNext()) 
 
           { 
 
               Complete(); 
 
               return; 
 
           } 
 
           Value = _enumerator.Current; 
 
           base.Step(); 
 
      } 
 
   
 
      void CannotStart() 
 
      { 
 
          throw new Exception("Coroutine cannot start"); 
 
      } 
 
   
 
      private IEnumerator_enumerator; 
 
   
 
      internal Func<IEnumerator> Start; 
 
  }

Ok so let’s try to understand what is going on in the Step method. First, we do nothing if the Coroutine doesn’t Exist. This means that it has been previously Completed. We also do nothing if we are not Running, that is, if the Coroutine has been Suspended. So far so good.

Then we test if we have an _enumerator. This is like a program counter for coroutines. It manages the state we are in when we yield. If we do not have one, we see if we can make one from the strange-looking Start member field.

Its type is Func<IEnumerator>, which is a delegate that when invoked with no arguments returns an IEnumerator. This is then used to do work in the Coroutine.

An obvious question is where is this Start member set? It’s not here, in the Coroutine class. So, let’s see what the source for Factory.cs. Here’s one case, of a Coroutine with an extra argument:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  
public ICoroutine<TR> NewCoroutine<TR>(Func<IGenerator, T0, IEnumerator<TR>> fun, T0 t0) 
 
  { 
 
      var coro = new Coroutine<TR>(); 
 
      coro.Start = () => fun(coro, t0); 
 
      return Prepare(coro); 
 
  }

Now we are getting somewhere – we see that the Factory sets the Start field of the Coroutine to be a function object that invokes the function passed to the Factory, and also passing the arguments that will be pushed through to the Coroutine instance when it sets its _enumerator = Start();

Note also the pattern for all Coroutine signatures:

Func<IGenerator, T0, T1, ..., Tn, IEnumerator<TR>>

All Coroutine methods take as their first argument the Coroutine instance itself. It may be surprising that this is an IGenerator and not an ICoroutine<T> – the reason for this is ease of use. IGenerator does not include the return type information T, but otherwise contains all the functionality we need to control Subroutines as well. It’s just easier to use IGenerator as the ‘self’ argument to both Subroutines and Coroutines, without having to also include the return type, such as:

IEnumerator<string> Coro(ICoroutine<string> self) { ... }

Examples

The best place to see some examples is the test suite.

Future Work

During the writing of this article, it became clear that there is more to these ideas and this library than can be successfully covered in a single post.

So, while typically here I would point out future work to be done on the library itself, I will instead promise to write more about Coroutines and this library in the future. Pun intended.

Conclusion

This article presented a Coroutine-based Kernel for .Net, including a number of useful concepts for flow control, including:

  • Transient
  • Future<T>
  • Node
  • Barrier
  • Channel<T>
  • Kernel

The source code is freely available and can be used without permission in commercial products.

Want to donate to support indie #gamedev Greenlight fee?

Original Author: Mike Acton

Dejobaan Will Loan You $100 to Submit to Steam Greenlight

benkuchera I WILL LOAN A BUDDING INDIE DEVELOPER THE $100 TO SUBMIT TO GREENLIGHT if they are wonderful people with a wonderful idea.

— Dejobaan Games (@dejobaan) September 5, 2012

I’m not offering a loan, just a gift. I’m just happy to give back in some small way to a fellow #gamedev that both really deserves it and really needs it. I’m not rich, but I can do that.

To help make things simpler, if you would also like to just gift $100 to a #gamedev in need feel free to use the button below and I will find someone that needs it. You’ll have to trust my judgment though!




https://www.paypalobjects.com/en_US/i/scr/pixel.gif

If you are an indie gamedev and this is for you, you can email me macton@gmail.com (Subject: “My Greenlight” so I can filter) and use the same guidelines as @dejobaan’s post


Want to get Kickstarted? Here’s what you need to know…

Original Author: Jason-Swearingen

Disclaimer:  This “guest” article was written by Eddie, Novaleaf’s Game Dev Lead.   

What does it take to run a successful Kickstarter campaign? This is the question that we asked ourselves a few months ago when we decided to use kickstarter to fund our new game, God of Puzzle. Since then, we did everything that we could think of to ensure the success of the campaign, and we would like to share our stories about this little adventure with you. We hope that this information will help some projects get the backing that it deserves.

A Disclaimer

Please note that we have just launched our kickstarter, so while we don’t know for sure that this approach will work, this post is the first of a series.  We’ll be sure to inform you of our progress (good or bad) bi-weekly as the data comes in.

The Research

We’ve looked at several hundreds of projects on Kickstarter over the past few months. We also didn’t have thekickbackmachine in the beginning, so finding failed project wasn’t an easy thing to do. In addition to just looking at them, we also kept track of projects that has similar scope so that we can observe the progress of it over time. Finally, we analyzed the data and listed out why a project failed or succeeded.

The List

So, we’ve ended up with a list that we thought was crucial information on what to do to give our project the highest chance of success. I’ve listed the most important points below, but keep in mind that there are plenty of projects that didn’t do everything right but still able to generate tons of money. So, following or not following the guidelines in this list doesn’t ensure success or failure; but doing everything this list will definitely maximizes your chance of success:

1. Make sure your project is awesome

Seem really obvious right? Well, from my observation, most of the project that failed are projects with mediocre/not so original ideas. When you have a project with an ingenious idea, you really don’t have to do much to get funded.

2. Present a team that can get the job done

If the scope of your project is big, make sure you have the team to back it up. Be honest and realistic with what you’re trying to achieve. If you’re a couple of college students, you’re unlikely to get funded if you’re asking to fund a huge project, like an MMO.

3. Show ONLY the cool stuff

This is extremely important, and you need to think carefully about this one. If you have some work-in-progress stuff that’s still pretty rough and doesn’t really represent the quality of the final product. Do not show it. People will assume that what you’re showing is more or less what the final product will be. Instead, show only the best stuff you currently have. If your best stuff isn’t at a presentable stage, then maybe it’s too early to start a Kickstarter campaign.

4. Leave your audience wanting for more

If you’re trying to fund a game by showing a playable alpha/beta. Make sure you don’t show the whole game. Most people decide to back a project because they want to see the final product. If most of the final product is already here, available to play for free, they’ll lose the craving and may not fund the project.

5. Be honest and positive

People appreciate honesty, and people don’t like whiners. If you show any of these negative qualities in your pitch, some backers will be more reluctant to support you. For example, giving statements along the lines of “the mainstream game industry are for losers” will surely offend a lot of people. As many people working in the mainstream game industry are actively looking at kickstarter projects.

6. Make important information easy to find

Don’t assume that people will read everything. Don’t assume that people will watch the entire video. Put the most important information first, if possible put it in the video or use pictures instead of plain texts. Research by looking at other successful projects will give you plenty of ideas how to make important information stand out.

7. Fair Rewards

This is another obvious one but difficult one to get right, so make sure you put a lot of thought into this. People who’re thinking about backing the project will look at the rewards before they make the final decision. Give back as much as possible and give something that people will value. Research and see what kind of rewards successful projects are offering and try to see if you can do better, or at least do the same.

8. Find your audience, and bring them to your project

This is probably the hardest part of running a kickstarter project. During my research, we came across many information confirming that traffic of people visiting kickstarter just to browse the available projects is extremely low. You have to do everything within your power to promote your project and bring people to it. Your goal is to make it viral, make people share links to your project on their facebook, make news sites and bloggers aware of your projects, make people in forums starting threads about your project. If you’re good at this, and your project is interesting, you will get an unusually high number of backers.

9. Give updates and Interact with your backers

Don’t forget that a backer can change their mind at anytime. They could raise their pledge amount, or back out completely. So, when you get a new backer, you want them to at least stick around until the end of the project. Providing frequent interesting updates with new information is important. If a backer is excited about the project, he or she might tell their friends about your project or even increase their pledge level. Start discussions, give some backers-only updates, make them feel great for backing your project.

10. Create a professional looking pitch, but don’t over do-it…

This one is not very obvious to see; according to our research, your pitch could be perceived as one of the following: poorly done, indie, professional, and “too much”. It’s obvious that your pitch should be perceived as either indie or professional depending on if you’re a small team or a professional game studio. And you definitely don’t want people to see your pitch as poorly done. So, what is “too much”? Too much means your audience see that you’re wasting your money on creating an expensive pitch (especially the video), so they’ll think that you’re not really low on cash. However, everyone don’t think the same way, so some people may actually appreciate the extra investment you put into the pitch. Use your judgement.

Until Next Time…

A lot of these are pretty obvious, right? Anyone who spent the time to properly research should ended up with a similar looking list, but I believe that forgetting to do just one or two items from this list could mean the difference between failure and success.

Finally, I can’t say that all of these information are “proven”, because as I’m writing this our own project is just published on Kickstarter.  So, as we gain more experience, we’ll make sure to give updates on any new and important information we discovered.

 

C/C++ Low Level Curriculum Part 9: Loops

Original Author: Alex Darby

Welcome to the 9th post in this C/C++ low level curriculum series I’ve been doing. It’s been a long time since post 8 (way longer than I thought it was), a fact I can only apologise for. My 3 year old son stopped having a nap in the afternoon in late April and it’s totally ruined my productivity…

This post covers the 3 built-in looping control structures while, do-while, and for as well as the manual if-goto loop (old school!); as usual, we look in some detail at the assembly generated by the compiler looks like. Did I forget about the new range-based-for loop that was added in the C++11 standard? Nope. If you have access to a C++11 compliant compiler you’re more than welcome to look at that yourself – think of it as homework…

Here are the backlinks for preceding articles in the series (warning: it might take you a while, the first few are quite long):

  1. /2011/11/09/a-low-level-curriculum-for-c-and-c/
  2. /2011/11/24/c-c-low-level-curriculum-part-2-data-types/
  3. /2011/12/14/c-c-low-level-curriculum-part-3-the-stack/
  4. /2011/12/24/c-c-low-level-curriculum-part-4-more-stack/
  5. /2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/
  6. /2012/03/07/c-c-low-level-curriculum-part-6-conditionals/

 

A brief history of looping

It occurred to me that a sensible order to cover the looping constructs of the C/C++ language might be to address them in the order in which they were introduced into the language.

A couple of years back a friend showed me a brilliant website / article that covered the evolution of the C programming language. It was very interesting, and from what I can remember, contained information on the order in which the various features of the C compiler were added – including which looping construct came first. I tried to find it on t’ internet, but failed. Feel free to link me up in a comment if you happen to know where it is…

Since I couldn’t find the article /website in question I’ve decided to cover them in the order of the amount of work they do automatically for the programmer, which in my opinion is: if-goto, while, do-while, and finally for.

This seems to me to be a sensible order for 2 reasons; firstly because it’s likely to be the order in which they were introduced into programming languages, and secondly because the concepts encapsulated by these constructs sort of build on each other in that order.

 

if-goto

From our previous excursions into the land of assembly we are already familiar with the concept of jumping the execution address, and with the concept of ‘conditional jumping’ (i.e. conditionally changing the execution address). The most direct way to loop the execution of a piece of code several times (as opposed to the simplest to type) is to use the high level keywords that correspond to these assembly level concepts.

We are already familiar with the keyword if, but we’ve not really covered gotopossibly the most maligned of all the language features of C/C++, and almost certainly the most banned by corporate coding standards.

Personally I don’t think that goto is inherently the Wikipedia page which contains a fair amount of detail (and links to) on the arguments for and against it.

The purpose of this article is not to discuss the merits of goto or, for that matter, operator overloading so let’s get on with it.

Here’s the first code snippet (see the previous article for how to set up a project that will just accept this code…)

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  
#include "stdafx.h"
 
   
 
  #define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))
 
   
 
  int main(int argc, char* argv[])
 
  {
 
      int k_aiData[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
 
      int iSum       = 0;
 
      int iLoop      = 0;
 
   
 
  LoopStart:
 
      if( iLoop < ARRAY_SIZE(k_aiData) )
 
      {
 
          iSum += k_aiData[ iLoop ];
 
          ++iLoop;
 
          goto LoopStart;
 
      }
 
   
 
      return 0;
 
  }

You should be able to see that this code is simply looping over the values in the array k_aiData and summing them, other than the use of if and goto it’s essentially a standard loop to iterate an array.

The pre-processor macro ARRAY_SIZE that I’ve used here is a simple way to make dealing with statically allocated arrays less error prone. Essentially we could initialise the array k_aiData with any number of elements we wanted to and the rest of the code would still just work. There are simple ways to achieve this in a type safe manner using templates too, but I chose to use a macro here because a readable version of the code takes up less vertical space than the template.

If you are wondering why I am not incrementing iLoop inside the square brackets, this is so that the high level code that is doing the work of the loop is identical across all code snippets.

If you are also wondering why I am using the prefix as opposed to postfix version of operator++ then well done to you – award yourself 6.29 paying attention points. In this case it makes no difference to the assembly generated, but in these days of operator overloading it’s generally better to use the prefix version as a point of good practice – unless of course you require postfix behaviour (the first comment on the first answer to this question on Stack Overflow should prove illuminating if you don’t know what implications of the different behaviours are).

Since we’re using two keywords that have a very clear relationship to assembly level concepts, it’s reasonable to assume that the disassembly for this code will be pretty much as we wrote it at the high level. As we all know, we should never assume; so let’s check our assumptions.

Here is the debug x86 disassembly for the looping section:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  
    11: LoopStart:
 
      12:     if( iLoop < ARRAY_SIZE(k_aiData) )
 
  00BB1299  cmp         dword ptr [ebp-2Ch],8  
 
  00BB129D  jae         LoopStart+1Eh (0BB12B7h)  
 
      13:     {
 
      14:         iSum += k_aiData[ iLoop ];
 
  00BB129F  mov         eax,dword ptr [ebp-2Ch]  
 
  00BB12A2  mov         ecx,dword ptr [ebp-28h]  
 
  00BB12A5  add         ecx,dword ptr [ebp+eax*4-24h]  
 
  00BB12A9  mov         dword ptr [ebp-28h],ecx  
 
      15:         ++iLoop;
 
  00BB12AC  mov         eax,dword ptr [ebp-2Ch]  
 
  00BB12AF  add         eax,1  
 
  00BB12B2  mov         dword ptr [ebp-2Ch],eax  
 
      16:         goto LoopStart;
 
  00BB12B5  jmp         LoopStart (0BB1299h)  
 
      17:     }
 
      18: 
 
      19:     return 0;
 
  00BB12B7  xor         eax,eax

As expected, the disassembly for this is very straightforward, and you should be familiar with almost all of it from previous posts.

As we saw in the first article on conditionals, the assembly code (lines 3 & 4) that maps to the if statement (line 2) tests the logical opposite of the high level code. This is because the high level if conceptually ‘steps into’ the curly brackets it controls if its test passes, whereas the assembly has to jump past the assembly code generated by the content of the if in order to not execute it (remember: curly brackets are a high level convenience for programmers!).

In this case, line 3 compares iLoop (at address [ebp-2Ch]) to 8 (the size of the array obtained from ARRAY_SIZE is a compile time constant), and (line 4) uses jae (jump if above or equal) to conditionally jump execution to LoopStart+1Eh (0BB12B7h) – which is the memory address immediately after the assembly generated by the content of the curly brackets controlled by the if statement.

The next block of assembly adds the iLoop-th element of k_aiData to iSum. By this point, we should all be familiar with the assembly for adding two integers, and the way in which the elements of k_aiData are accessed is the only real new assembly code idiom that we’re seeing in this disassembly.

The instruction that accesses the iLoop-th element from the array is doing a surprising amount of work for an assembly instruction; certainly this is the first time that we’ve seen any significant computation being performed within a single line of assembly code, and it’s all occurring in the square brackets in the place that usually contains the address of the value we wish to access.

So, let’s look at it in detail:

9
 
  
add ecx,dword ptr [ebp+eax*4-24h]

When line 9 is executed, the eax register holds the value of iLoop and [ebp-24h] is the address of the array k_aiData.

Since k_aiData is an array of int, the address of k_aiData[ 0 ] is [ebp-24h] and sizeof( int ) is 4 on the x86, it should be pretty obvious that the computation [ebp+eax*4-24h] on line 9 equates to the memory address of the iLoop-th element of k_aiData.

If you’re having trouble seeing it, here is the address computation seen in the disassembly rearranged step by step so that we can swap out the registers and memory addresses for the high level variables:

ebp+eax*4-24h

= ebp + ( eax*4 ) + (-24h)

= ebp + (-24h) + ( eax*4 )

= ( epb – 24h ) + (eax * 4 )

 = &k_aiData[ 0 ] + ( iLoop * sizeof( int ) )

Now we’ve examined the new elements of the disassembly we’ve not seen before, the rest of this post should clip along fairly quickly 🙂

So, after the value stored in the iLoop-th element of k_aiData has been added to iSum, all that remains is to ++iLoop ( lines 12-14) and then jump back to the label at the start of the loop (line 16).

Clearly this will continue until iLoop >= 8, and so we can see that the assembly is isomorphic with the high level code.

 

Why add Looping Constructs?

Since looping behaviour can simply be achieved using the if-goto, this begs the question “Why did Dennis Ritchie (sadly no longer with us) bother with the rest of the looping constructs available in C?”

There are three main reasons that spring to my mind, the first is efficiency (of typing rather than execution), the second is robustness, and the third is clarity of intent.

Writing a loop using the if-goto idiom involves a fair amount of typing, and loops are very common in most code bases. No-one likes to type more than they have to – especially programmers. Since the programmers using the language were probably originally the programmers of the language it was more or less an inevitability that a more textually terse method of writing loops would come about.

Secondly, and more importantly, the code involved in any writing two given if-goto loops is very similar and doing it by hand would be more prone to error (as well as tedious) than using a code construct specifically made to handle looping which removes the need for the explicit goto and associated label.

Thirdly, and possibly even more importantly, an explicit looping construct makes the intent of the code far more clear, if and goto both have plenty of other uses as well as looping, and so any programmer coming along later to read code containing an if-goto loop would have to expend significant mental effort just to get to the point where they can see that the code is in fact a loop; which would clearly be very bad.

Taken together, these three reasons mean that you will almost certainly never write a loop using if-goto for any reason other than just for fun; and you certainly won’t need to write one. The only reason I am covering it is because I feel that it’s worth considering as a step in the evolution of looping constructs in languages.

 

while

So, we come to while. The while loop is basically an automatic if-goto, and we will see this when we look at the disassembly (which is essentially why I covered the if-goto in the first place).

Here’s the code snippet upgraded to use while

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  
#include "stdafx.h"
 
   
 
  #define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))
 
   
 
  int main(int argc, char* argv[])
 
  {
 
      int k_aiData[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
 
      int iSum       = 0;
 
      int iLoop      = 0;
 
   
 
      while( iLoop < ARRAY_SIZE(k_aiData) )
 
      {
 
          iSum += k_aiData[ iLoop ];
 
          iLoop++;
 
      }
 
   
 
      return 0;
 
  }

Clearly the high level code looks neater already, and (more importantly) the manual elements of putting the if and goto in the right places have been removed; so it’s a lot harder to do something wrong as a result of human error, and it’s instantly obvious that the code is looping over the content of the array k_aiData.

Much better – well done programming language designers of yesteryear!

Now let’s have a look at the (dis)assembly that it generates…

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  
    11:     while( iLoop < ARRAY_SIZE(k_aiData) )
 
  013E1299  cmp         dword ptr [ebp-2Ch],8  
 
  013E129D  jae         main+77h (13E12B7h)  
 
      12:     {
 
      13:         iSum += k_aiData[ iLoop ];
 
  013E129F  mov         eax,dword ptr [ebp-2Ch]  
 
  013E12A2  mov         ecx,dword ptr [ebp-28h]  
 
  013E12A5  add         ecx,dword ptr [ebp+eax*4-24h]  
 
  013E12A9  mov         dword ptr [ebp-28h],ecx  
 
      14:         iLoop++;
 
  013E12AC  mov         eax,dword ptr [ebp-2Ch]  
 
  013E12AF  add         eax,1  
 
  013E12B2  mov         dword ptr [ebp-2Ch],eax  
 
      15:     }
 
  013E12B5  jmp         main+59h (13E1299h)  
 
      16: 
 
      17:     return 0;
 
  013E12B7  xor         eax,eax

Almost entirely unsurprisingly, the assembly that has been generated from the while is essentially identical to that generated for the if-goto we just looked at – only the addresses that are being jumped to have changed.

This is the sort of thing that restores my faith in humanity; well, in compiler programmers specifically but they’re still human. I assume.

 

do-while

Let’s move swiftly on with the code snippet for the next type of loop, the do-while.

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  
#include "stdafx.h"
 
   
 
  #define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))
 
   
 
  int main(int argc, char* argv[])
 
  {
 
      int k_aiData[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
 
      int iSum       = 0;
 
      int iLoop      = 0;
 
   
 
      do 
 
      {
 
          iSum += k_aiData[ iLoop ];
 
          ++iLoop;
 
      } 
 
      while( iLoop < ARRAY_SIZE(k_aiData) );
 
   
 
      return 0;
 
  }

Essentially the same code, but now we’re testing the loop’s exit condition at the end of each loop rather than at the beginning.

All being sane in the universe, I think it would be reasonable to expect the assembly generated for this code to turn out very similar to the previous two loops – except that the testing code is likely to be after the body of the loop rather than before it….

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  
    11:     do 
 
      12:     {
 
      13:         iSum += k_aiData[ iLoop ];
 
  00CC1299  mov         eax,dword ptr [ebp-2Ch]  
 
  00CC129C  mov         ecx,dword ptr [ebp-28h]  
 
  00CC129F  add         ecx,dword ptr [ebp+eax*4-24h]  
 
  00CC12A3  mov         dword ptr [ebp-28h],ecx  
 
      14:         ++iLoop;
 
  00CC12A6  mov         eax,dword ptr [ebp-2Ch]  
 
  00CC12A9  add         eax,1  
 
  00CC12AC  mov         dword ptr [ebp-2Ch],eax  
 
      15:     } 
 
      16:     while( iLoop < ARRAY_SIZE(k_aiData) );
 
  00CC12AF  cmp         dword ptr [ebp-2Ch],8  
 
  00CC12B3  jb          main+59h (0CC1299h)  
 
      17: 
 
      18:     return 0;
 
  00CC12B5  xor         eax,eax

As expected then, the code doing the work of the loop and incrementing iLoop is basically identical.

Also as expected, the conditional jump that keeps the loop going is a little different – it’s using the jump instruction jb (jump if below) so, unlike pretty much all the other assembly code we’ve looked at generated by high level conditionals, this is testing the same condition as the high level code – but why?

As discussed earlier, the high level language concept of ‘curly bracket scope’ doesn’t exist at the assembly level. Despite this, the compiler has to generate assembly code that is logically isomorphic with the high level code; so in order to satisfy the high level behavioural constraint of ‘stepping into’ the curly bracketed code if a pre-condition is met, the assembly skips over the code within the curly brackets if the condition isn’t met.

So, since the looping condition is a post-condition in a do-while loop (i.e. at the end of the ‘curly bracket scope’ it controls) the high level code and assembly code both need to jump back to the start of the loop if the looping condition is met, and so the test in the assembly code is the same as that at the high level.

 

for

So, we come to the for loop, the loop you probably use the most often.

The for loop was the looping construct that worked the hardest for you until the new C++11 ANSI standard introduced the ‘range-based’ for to the language this time last year (not counting the various template based solutions). Unfortunately (although it’s obviously supported in the recently released VC2012) support for the C++11 standard is patchy at best on most video game platforms so the for loop is still the default solution.

Let’s take a second to look at the ‘anatomy of a loop’. More or less any looping code it has 3 responsibilities in addition to the work it does per iteration of the loop:

a) declare and/or initialise loop state variables

b) test loop exit condition

c) update state variables for the next loop

These 3 responsibilities define the scope and manner of the iteration the loop is doing, and therefore can be seen as the ‘fingerprint’ of that iteration.

The for loop is a ‘language level refactoring’ that gathers these three responsibilities into one construct giving them textual adjacency, thus making the entire fingerprint visible in one place.

Whilst this is pretty obvious when you stop to examine it, the importance of explicitly stating this should not be underestimated.

Why? Let’s look at for compared to while, replacing the code with the corresponding a, b, or c from the list above.

1
 
  2
 
  3
 
  4
 
  
for( a; b; c)
 
  {
 
      //do work
 
  }

as opposed to:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  
a;
 
  while( b )
 
  {
 
      //do work
 
      c;
 
  }

So, the for loop takes up less vertical space than the while (in this instance at least) but what, if anything, are the other advantages:

  • variables declared by a in the for are scoped to the loop. Smaller scope == less entropy == less bugs.
  • c is obviously distinct from the work code of the loop in the for, but not so in the while (be honest; how many times have you accidentally done an infinite while because you forgot to increment at the end?)
  • the adjacency of a, b, and c in the for allows possible bugs with loop conditions to be spotted more easily

Whoever invented the for loop deserves a pat on the back, because for takes the improvements made by the while and do-while loops to the next level – by reducing human error and increasing the clarity of intent even further.

I looked him up and it turns out that the earliest equivalent to for I found by googling is the John Backus at IBM. Since that’s about as close to an answer as I feel I need to get, I now invite you to join me in a posthumous air high-five to John to celebrate his team’s sterling work.

Let’s look at one now shall we? Here’s the code snippet:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  
#include "stdafx.h"
 
   
 
  #define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0]))
 
   
 
  int main(int argc, char* argv[])
 
  {
 
      int k_aiData[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
 
      int iSum       = 0;
 
   
 
      for( int iLoop = 0; iLoop < ARRAY_SIZE(k_aiData); ++iLoop )
 
      {
 
          iSum += k_aiData[ iLoop ];
 
      } 
 
   
 
      return 0;
 
  }

…and here’s the disassembly (n.b. I un-ticked the ‘Show symbol names’ check box in the disassembly display options for this…)

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  
    10:     for( int iLoop = 0; iLoop < ARRAY_SIZE(k_aiData); ++iLoop )
 
  00DC1292  mov         dword ptr [ebp-2Ch],0  
 
  00DC1299  jmp         00DC12A4  
 
  00DC129B  mov         eax,dword ptr [ebp-2Ch]  
 
  00DC129E  add         eax,1  
 
  00DC12A1  mov         dword ptr [ebp-2Ch],eax  
 
  00DC12A4  cmp         dword ptr [ebp-2Ch],8  
 
  00DC12A8  jae         00DC12B9  
 
      11:     {
 
      12:         iSum += k_aiData[ iLoop ];
 
  00DC12AA  mov         eax,dword ptr [ebp-2Ch]  
 
  00DC12AD  mov         ecx,dword ptr [ebp-28h]  
 
  00DC12B0  add         ecx,dword ptr [ebp+eax*4-24h]  
 
  00DC12B4  mov         dword ptr [ebp-28h],ecx  
 
      13:     } 
 
  00DC12B7  jmp         00DC129B  
 
      14: 
 
      15:     return 0;
 
  00DC12B9  xor         eax,eax

Sooooo … this one looks a little different, right? It’s not very different though, just re-organised a little:

  1. Line 2-3: is initialising iLoop (i.e. [ebp-2Ch]) to 0, and then jumping over lines 4-6
  2. Lines 4-6: are incrementing iLoop
  3. Lines 7-8: are comparing iLoop with 8 and exits the loop by jumping to line 19 if iLoop >= 8 (n.b. pre-condition check so opposite of high level)
  4. Lines 11-14: indexing the array and accumulating the sum of element values (should look very familiar by now)
  5. Line 16: loops back to line 4

So, the assembly in each of steps 1, 2, and 3 implements one of the semi-colon separated parts of the for loop’s ‘parameters; in fact, steps 1 to 3 correspond to a (initialise), c (increment), and b (test exit condition) respectively in our  ‘anatomy of a loop’ list above.

Only steps 1 and 3 are executed on the first iteration of the loop, and only steps 2 and 3 on all other iterations.

Also note that steps 2 and 3 are in the opposite order in the assembly compared to the high level code – this is, again, down to the disparity between high level nicety and low level execution.

So, the assembly that is generated from a for loop is more or less as you might expect. We’ve covered all the (non-templated-non-C++11) looping constructs now, end of story – next article. Move along please.

 

Wait! I’m not quite finished!

Hold on! The reason the last post was about how to look at optimised assembly is mostly because I  wanted to look at the optimised assembly generated by the C++ looping constructs in this post.

So, rather than re-compile all the snippets one by one let’s set up the project CPPLLC_Part9MoreLoops.

This file contains a simple program that has 4 functions in addition to main – they are:

  • SumGoto – sums the elements of an array using an if-goto loop
  • SumWhile – sums the elements of an array using a while loop
  • SumDo – sums the elements of an array using a do-while loop, and
  • SumFor – sums the elements of an array using a for loop

All very straightforward really. The only unusual thing you might notice is that main looks like this:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  
int main( int argc, char* argv[] )
 
  {
 
      // array and a nice const for the size
 
      const int k_iArraySize = 8;
 
      int       k_aiData[ k_iArraySize ] = { 0, 1, 2, 3, 4, 5, 6, 7 };
 
   
 
      int iSumGoto  = SumGoto ( k_aiData, atoi( argv[ 1 ] ) );
 
      int iSumWhile = SumWhile( k_aiData, atoi( argv[ 2 ] ) );
 
      int iSumDo    = SumDo   ( k_aiData, atoi( argv[ 3 ] ) );
 
      int iSumFor   = SumFor  ( k_aiData, atoi( argv[ 4 ] ) );
 
   
 
      std::cout << iSumGoto << iSumWhile << iSumDo << iSumFor;
 
      return 0;
 
  }

So it’s using command line arguments as input, and printing to stdout for output. This is a relatively simple way to prevent the overzealous optimising compiler from removing all the code – we force it to keep it in there by doing input and output at runtime.

Before we compile and run it, you’ll also need to make a couple of changes in your project’s property pages – make sure you have the ‘Release’ build configuration selected…

The first is to pass some command line arguments to the code – apart from any other reasons, this is shockingly naive code and will crash if it doesn’t get the arguments it expects, so add the following (which will make it iterate k_aiData fully for each function):

screenshot of adding command line parameters in the project's property pages

 

We also need to turn off function inlining or the compiler will optimise away all the function calls making the disassembly much harder to follow:

screenshot of propertioes page showing how to turn off function inlining

Final pre-launch check: add a breakpoint to the C++ line in each loop that sums the loop’s elements (i.e. ‘iSum += xxxx’), and off we go!

 

Optimised Disassembly O’clock!

Build and run the code and you should end up with your debugger stopped on the breakpoint you have put in SumGoto.

Right click and choose ‘Go To Disassembly’, you should see something like the image below – but before we look at it in detail, a brief aside is needed:

The code in main that calls SumGoto looks like this:

00DB191A  push        eax  
 
  00DB191B  lea         esi,[ebp-24h]  
 
  00DB191E  call        SumGoto (0DB1880h)

eax (which contains k_iArraysize) is pushed onto the stack, but the address of k_aiData[ 0 ] (which is stored at [ebp-24h]) is stored into esi rather than being pushed onto the stack.

“Wait!” I hear you say “They just did who in a whatnow? I thought we covered calling conventions, and no-one said anything about using esi for parameter passing!

Don’t worry about this for now, just accept that – for whatever reason – in this case the address of k_aiData[ 0 ] is being passed via the esi register (I investigate this in the article’s epilogue if you’re really interested).

So, here’s the disassembly for SumGoto:

make sure you have the same view options checked in the context menu, or your disassembly may look very different!

Interestingly this bears little visible relation to the debug disassembly we looked at for the if-goto earlier. So let’s pick it apart to see what it’s doing differently:

  1. 00DB1880 to 00DB1884 – function prologue of SumGoto.
  2. 00DB1885moving function parameter iDataCount (i.e. the number of loops) into the edi register.
  3. 00DB1888 to 00DB188E – initialising registers ecx, edx, ebx, and eax to 0 (n.b. anything XOR itself is 0).
  4. ooDB1890 to 00DB1893 – compare edi (number of loops remaining) with 2; if less jump to 00DB18A7 (2nd instruction in step 9) otherwise continue.
  5. ooDB1895 – another new assembly instruction; dec decreases its register operand by 1 – in this case edi (iDataCount).
  6. 00DB1896 to 00DB1899 – we know that the address of k_aiData[0] is in esi, so from the address calculation in the square brackets it is pretty obvious that these two lines are indexing into k_aiData and summing the odd and even elements into edx and ecx respectively.
  7. 00DB189D – is incrementing eax by two. eax clearly contains the count of elements that have been looped over so far – because…
  8. 00DB18A0 to ooDB18A2 – …are comparing eax to edi. If eax < edi execution jumps back to step 6.
  9. 00DB18A4 to 00DB18AB – this ties in with the decrement to edi made at step 5. Since the code is looping and summing 2 elements at a time, this code checks if iDataCount was odd or even. If odd it jumps to step 11, if even it jumps to step 12.
  10. 00DB18AD – leaves ecx unchanged. What is it for? It’s essentially a nop instruction (no operation), nop instructions are used in assembly code for various reasons such as memory maintaining alignment of certain instructions (the 1st answer to this question on Stack Overflow explains sufficiently for our requirements at this point). In any case, both possible code paths through step 9 will skip this instruction entirely.
  11. 00DB18B0 – if iDataCount was odd, this code moves the value of the array element that would have been missed by iterating 2 elements at a time into ebx.
  12. 00DB18B3 – this uses lea to add the sums of odd and even elements of k_aiData that have been accumulating in edx and ecx and store them in eax (remember, eax is used to return integer values from functions).
  13. 00DB18B6 – this is actually the start of the epilogue of SumGoto – restoring edi to the value it stored before SumGoto was called. There’s no particular reason for this to have been put in before the next instruction. Optimising compilers do this sort of thing relatively often, as long as the code it generates is correct it’s not worth worrying about too much.
  14. 00DB18B7 – this line adds the value from ebx (see step11) to the sum to be returned in eax.
  15. 00DB18B9 to 00DB18BB – function epilogue of SumGoto.

Ouch. That seems far more complex than the debug assembly code for the if-goto loop. You may have to read through it a few times before you satisfy yourself about how it works – I recommend stepping through it in the debugger looking at the registers in a watch window.

Somewhat surprisingly, SumWhile and SumFor look pretty much exactly like SumGoto, but SumDo is way smaller:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  
SumDo:
 
  00DB1830  xor         eax,eax  
 
  00DB1832  xor         ecx,ecx  
 
  00DB1834  add         eax,dword ptr [esi+ecx*4]  
 
  00DB1837  inc         ecx  
 
  00DB1838  cmp         ecx,edx  
 
  00DB183A  jl          SumDo+4 (0DB1834h)  
 
  00DB183C  ret

This is incredibly simple to follow, and intuitively much more the sort of thing I would have intuitively expected to see for all of the looping constructs, but there is method to the compiler’s seeming madness…

Summing two elements per iteration of the loop like the assembly of SumGoto, SumWhile, and SumFor are doing is actually a form of loop unrolling. Although (in this code) the compiler doesn’t know how many iterations of the loop it will end up doing, it can still improve the overall ‘looping instructions to working instructions’ ratio of the loop by this pairwise unrolling. Over a large enough array it should be faster than code that is not unrolled in the same way.

By changing the compiler options (under C/C++ -> Optimisation) from ‘Maximize Speed (/02)’ to ‘Minimize Size (/01)’ you can generate assembly that looks a lot more as you would expect. Since /02 is the default for release build configurations under Visual Studio 2010 I thought I should explain this assembly, and I leave looking at the assembly generated by /01 as an exercise for you dear reader 🙂

Conclusions

So, there we have it. looping constructs, and a genuine taste of the differences between optimised and debug assembly – albeit in a massively simplified scenario compared to real code.

What should we take away from this? Well, I guess primarily the point of this was to demonstrate that whilst the optimising compiler is constrained to generate assembly code that is isomorphic with your high level code, you should never take it for granted that the code it generates will look how you expect it to.

This should, I think, about finish up the program control / structural aspects of C/C++ and leave us free to move on to look at the way other mechanics of the language work at the assembly level.

I feel that there might possibly be a post on the range based for and on recursion at some point, but we’ll see – feel free to leave a comment if you think there’s something glaring that I’ve left out and I’ll try to rectify that before moving on…

Finally, a hearty thank you to all the AltDevAuthors who chipped in with sage advice on this post – Tony, Paul, Ted, Bruce, Ignacio, and Rich.

 

Epilogue

Notes on [ebp+eax*4-XXh]

this article on Wikipedia for a summary of the limits.

Regardless of this, it is commonly seen used in conjunction with another x86 assembly instruction called lea (load effective address) (as seen in the optimised SumGoto assembly) which will load the result of the address computation (rather than the value at that address) into a specific register.

When I’ve seen the mnemonic lea in the disassembly window it has most often been used for this purpose – though don’t assume that it is! Since we’re not (necessarily) assembly programmers, we don’t need to worry about this too much but I thought I’d mention it.

Notes on Using esi to pass parameters to functions

So, this is certainly not what we’d expect given the coverage of calling conventions we did earlier in the series.

I googled for at least 10 minutes (clearly not exhaustive, but usually long enough to find a trail to an answer) and couldn’t find any specific information pertaining to the use of esi to pass parameters in a documented calling convention; however I did find several other people who had observed this behaviour and were looking for answers about it.

So, in the spirit of discovery I decided to see what happened if I compiled the looping functions (SumGoto, SumWhile, SumDo, and SumFor) into a separate library and then linked to that library instead of having them compile inside the same logical compilation unit as main. As anticipated, this sorted out the parameter passing so that it conformed to the cdecl calling convention, no more kooky use of esi to pass the array.

StackOverflow.

Final take away point: if something makes no sense when you’re debugging, don’t assume anything – put on your Deerstalker and Sherlock Holmes your way to the bottom of it.

A final note on the genesis of loops

I’ve already mentioned that I didn’t find the page on the history of C that I was looking for, so I can’t say with any degree of certainty which order the various looping constructs were actually added to the language.

history of looping – my personal gut feeling on this matter is that whoever first coined the use of Sigma in mathematical notation is probably the father (or mother) of programmatic looping, but whoever invented knitting is the true originator 😉

 

 

 

Bringing Regal OpenGL to Native Client

Original Author: John McCutchan

Porting your game to Native Client and Android just got a lot easier. The new OpenGL portability library ‘Regal’ emulates legacy GL features such as immediate mode and fixed function pipeline. Regal is the ‘Write Once, Run Everywhere’ GL library. Read on for more details on Regal and how it got ported to Native Client.

At the end of July I joined Google to help game developers make amazing games for the web using open technologies like Nigel Stewart. When an application linked with Regal makes GL calls they don’t go directly to the GPU driver, instead they are handled in different ways inside Regal. Most GL calls are forwarded directly to the native GL library while others are emulated by Regal (when possible given GPU hardware restrictions). This approach allows Regal to become the ‘Write Once Run Everywhere’ GL library. Regal offers an impressive set of features:

  • Consistent OpenGL API that runs on all major platforms: Windows, Linux, Mac, iOS, Chrome (Native Client), and Android.
  • Emulation of deprecated OpenGL functionality. Some examples are immediate mode, the fixed function pipeline, and GL_QUADS.
  • Emulation of modern OpenGL functionality, for example, Direct State Access (DSA) and Vertex Array Object (VAO) are emulated when your native GL doesn’t support them.
  • Licensed under the 2 clause BSD license. In other words: you get the source, you can change the source,  and you are not obligated to share your changes even if you use them for commercial purposes. If you need to extend Regal but aren’t able to share your changes with the public, it’s cool.
  • Enhanced GL debuggability. Regal keeps an internal copy of the GL state so it’s possible to set a breakpoint and inspect the GL state machine directly. Regal also has extensive logging which enables API call traces. API traces are a powerful source of data that, for example, when analyzed can be used to eliminate redundant GL state changes, reducing application CPU usage.

Presently, NaCl supports WebGL is based on ES 2.0 and the two have equivalent feature sets. ES 2.0 has a subset of GL 2.0 features. Think of it as GL 2.0-lite. Most games are written against OpenGL 2.0 proper. What makes it difficult to port from GL 2.0 to GL ES 2.0? Well, let’s look at one example in depth: ES 2.0 removed support for the fixed function pipeline (FFP). FFP allows primitives to be drawn without writing custom vertex and fragment shader programs. Removing it introduces two problems to the porter: First, any calls to FFP functions will fail to compile because the function definitions have been removed. Second, developers must re-implement features from the fixed function pipeline they depend on. This kind of development is akin to pulling on a loose string, you often end up unravelling tonnes of code that subtly depended on legacy GL functionality. It’s messy and near impossible to do piecemeal. As I’ve already mentioned Regal emulates FFP along with other legacy GL functionality. Porting the game has gone from a tedious exercise to a simple recompile.

Porting Regal

As mentioned above, when a game linked with Regal makes a GL call it is either forwarded to native GL or emulated by Regal. The first step in porting Regal to NaCl was finding the code that forwards the calls to the native GL library. In order to do that, I needed to understand how Regal worked internally.

Internally Regal keeps track of multiple dispatch tables. The Regal dispatch table is a structure containing many function pointers, one for each OpenGL entry point.

struct DispatchTable {

// GL_VERSION_1_0

void (REGAL_CALL *glAccum)(GLenum op, GLfloat value);

void (REGAL_CALL *glAlphaFunc)(GLenum func, GLclampf ref);

void (REGAL_CALL *glBegin)(GLenum mode);

void (REGAL_CALL *glBitmap)(GLsizei width, GLsizei height, GLfloat xorig, GLfloat yorig, GLfloat xmove, GLfloat  ymove, const GLubyte *bitmap);

void (REGAL_CALL *glBlendFunc)(GLenum sfactor, GLenum dfactor);

}

When a new Regal OpenGL context is initialized each Regal dispatcher overrides certain function pointers in its dispatch table. Dispatchers include: logging, debugging, emulation, NaCl, and a dynamic loader.

For the logging, debugging, and emulation Regal keeps track of a stack of dispatch tables. This stack is used to allow, for example, the logging dispatcher to call the real GL call after it has logged the call. The flow looks something like this:

Note: The logging and debug layers add function calls between you and the native GL functions. Emulation of legacy GL functionality can have non-trivial performance impact. Luckily, each of these Regal features can be compiled out or disabled at runtime.

When I started my port there was no NaCl dispatcher so I added support to the Regal dynamic loader The dynamic loader works by querying for the address of each GL function the first time they are called. It does this by calling the platform equivalent of:

void* getProcAddress(const char* functionName);

Which under NaCl is implemented by calling dlsym. I didn’t need to go the full dynamic loading route so I just created a table mapping GL function names to wrapper functions I wrote which call into the native NaCl GL interface. By implementing a NaCl specific getProcAddress I was able to watch Regal startup and respond to calls from my GL demo. A little while later I had added enough wrapper functions to get a triangle on the screen. Not just any triangle, but a triangle built from immediate mode that used the fixed function pipeline. Something not possible with OpenGL ES 2.0.

 

Success! Now that I had OpenGL 1.0 features running on top of OpenGL ES 2.0 I wanted to see about OpenGL 3.0 features. I pinged the Regal guys and they suggested I try out Direct State Access. DSA worked like a charm and I started to see how powerful Regal will be when put in the hands of Native Client developers. Cut down, the code looked like this:

{

// direct state access

glMatrixLoadIdentityEXT(GL_PROJECTION);

glMatrixLoadIdentityEXT(GL_MODELVIEW);

// immediate mode

glBegin(GL_TRIANGLES);

glColor3f(1.0, 0.0, 0.0);

glVertex3f(0, 0, 0);

glColor3f(0.0, 1.0, 0.0);

glVertex3f(1, 0, 0);

glColor3f(0.0, 0.0, 1.0);

glVertex3f(0, 1, 0);

glEnd();

}

Doing it right

At this point I reached out to Nigel Stewart by sending him a pull request on GitHub. Nigel was great about pulling in my changes even if they were not perfect. Being relaxed about work-in-progress pull requests and iterating with contributing developers is a great way to build momentum for your project.

Nigel and I both wanted the NaCl port to be a first class platform supported by Regal. The next day I woke up and noticed that Regal now contained a proper NaCl dispatcher built using Regal’s code generation system. Over the week Nigel and I iterated back and forth until the NaCl port became seamlessly integrated the rest of Regal.

Getting things small

Regal has multiple implementations of every function from every version of OpenGL. One for logging, one for debugging, etc. “With great power, comes large binary sizes.” I may have messed that quote up. NaCl executables are downloaded over the web so every byte counts both for the end user and the person who pays the bandwidth bill. Now that the port was functional I needed to get the binary size down. The initial weigh in was 16MB and that was just the .text segment which means even stripping the binary wouldn’t help. Most features in Regal are optional and many are only useful during development. RegalDispatchLog.cpp, I’m looking at you. After throwing logging off the island the .text segment slimmed down to 9.5MB, a 40% drop. Nigel spent some time making it possible to build Regal with many features turned off at compile time. After flipping all those switches, the .text segment weighed in at only 0.2MB, a 98% drop in binary size.

Built-in Shrink Ray

By correctly configuring the web server hosting the NaCl application we can get things even smaller. Web browsers support receiving data that has been gzipped by the server. Make sure this is enabled, it will save you money and make your users happier!

Using Regal in your NaCl projects

Using Regal in your NaCl game is the same as using any other GL library. After you’ve created a NaCl GL context and interface you call one function to initialize the Regal GL context:

// Initialize Regal using the NaCl GL context and interface

RegalMakeCurrent(opengl_context, ppb_opengl_interface);

Optionally, you can capture logging output by making this call:

// Initialize Regal logging system

glLogMessageCallbackREGAL(regalLogCallback);

The callback function should have the following definition:

typedef void (*RegalLogCallback)(GLenum stream, GLsizei length, const GLchar *message, GLvoid *context);

You must add the following to your link line:

-lRegal

What about Android?

Android games are also limited to OpenGL ES 2.0. Lucky for Android developers, Regal is already ported to Android. If you are doing game development on Android and are porting, consider targeting Regal to accelerate your porting efforts.

Conclusion

Porting your game to NaCl and Android just got a lot easier. Regal removes many of the hurdles and will get you up and running quickly. Once you’re up and running you can whittle your usage of desktop OpenGL features down to zero. The key here is that with Regal it can be a gradual move away from desktop OpenGL to OpenGL ES.

Regal offers some compelling features for all OpenGL games, even desktop GL games. For example, I’m sure Regal’s built in logging and debug dispatchers will save some frustration.

Getting Started

  1. If you haven’t already go and download the NaCl SDK for your platform.
  2. Get the latest Regal from GitHub
  3. Pick your favourite GL game and start porting it to NaCl.
  4. Get involved in the NaCl community by joining the forums.

Bugs

Note- Regal is still in early development. Use it with caution. If you run into a bug with the NaCl port, shoot me an email and I will help fix it.


A new way of organizing header files

Original Author: Niklas Frykholm

Recently, I’ve become increasingly dissatisfied with the standard C++ way of organizing header files (one .h file and one .cpp file per class) and started experimenting with alternatives.

I have two main problems with the ways headers are usually organized.

First, it leads to long compile times, especially when templates and inline functions are used. Fundamental headers like array.h and vector3.h get included by a lot of other header files that need to use the types they define. These, in turn, get included by other files that need their types. Eventually you end up with a messy nest of header files that get included in a lot more translation units than necessary.

Sorting out such a mess once it has taken root can be surprisingly difficult. You remove an #include statement somewhere and are greeted by 50 compile errors. You have to fix these one by one by inserting missing #include statements and forward declarations. Then you notice that the Android release build is broken and needs additional fixes. This introduces a circular header dependency that needs to be resolved. Then it is on to the next #include line — remove it, rinse and repeat. After a day of this mind-numbingly boring activity you might have reduced your compile time by four seconds. Hooray!

Compile times have an immediate and important effect on programmer productivity and through general bit rot they tend to grow over time. There are many things that can increase compile times, but relatively few forces that work in the opposite direction.

It would be a lot better if we could change the way we work with headers, so that we didn’t get into this mess to begin with.

My second problem is more philosophical. The basic idea behind object-oriented design is that data and the functions that operate on it should be grouped together (in the same class, in the same file). This idea has some merits — it makes it easier to verify that class constraints are not broken — but it also leads to problems. Classes get coupled tightly with concepts that are not directly related to them — for example things like serialization, endian-swapping, network synchronization and script access. This pollutes the class interface and makes reuse and refactoring harder.

Class interfaces also tend to grow indefinitely, because there is always “more useful stuff” that can be added. For example, a string class (one of my pet peeves) could be extended with functionality for tokenization, path manipulation, number parsing, etc. To prevent “class bloat”, you could write this code as external functions instead, but this leads to a slightly strange situation where a class has some “canonized” members and some second-class citizens. It also means that the class must export enough information to allow any kind of external function to be written, which kind of breaks the whole encapsulation idea.

In my opinion, it is much cleaner to organize things by functionality than by type. Put the serialization code in one place, the path manipulation code in another place, etc.

My latest idea about organization is to put all type declarations for all structs and classes in a single file (say types.h):

struct Vector3 {
 
  	float x, y, z;
 
  };
 
   
 
  template <class T>
 
  class Array<T> {
 
  public:
 
  	Array() : _capacity(0), _size(0), _data(0) {}
 
  	~Array() {free(_data);}
 
  	unsigned _capacity;
 
  	unsigned _size;
 
  	T *_data;
 
  };
 
   
 
  class IFileSystem;
 
  class INetwork;

Note that types.h has no function declarations, but it includes the full data specification of any struct or class that we want to use “by value”. It also has forward declarations for classes that we want to use “by reference”. (These classes are assumed to have pure virtual interfaces. They can only be created by factory functions.)

Since types.h only contains type definitions and not a ton of inline code, it ends up small and fast to compile, even if we put all our types there.

Since it contains all type definitions, it is usually the only file that needs to be included by external headers. This means we avoid the hairy problem with a big nest of headers that include other headers. We also don’t have to bother with inserting forward declarations in every header file, since the types we need are already forward declared for us in types.h.

We put the function declarations (along with any inline code) in the usual header files. So vector3.h would have things like:

inline Vector3 operator+(const Vector3 &a, const Vector3 &b)
 
  {
 
  	Vector3 res;
 
  	res.x = a.x + b.x;
 
  	res.y = a.y + b.y;
 
  	res.z = a.z + b.z;
 
  	return res;
 
  }

.cpp files that wanted to use these operations would include vector3.h. But .h files and other .cpp files would not need to include the file. The file gets included where it is needed and not anywhere else.

Similarly, array.h would contain thinks like:

template <class T>
 
  void push_back(Array<T> &a, const T &item)
 
  {
 
  	if (a._size + 1 > a._capacity)
 
  		grow(a);
 
  	a._data[a._size++] = item;
 
  }

Note that types.h only contains the constructor and the destructor for Array<T>, not any other member functions.

Furthermore, I prefer to design classes so that the “zero-state” where all members are zeroed is always a valid empty state for the class. That way, the constructor becomes trivial, it just needs to zero all member variables. We can also construct arrays of objects with a simple memset().

If a class needs a more complicated empty state, then perhaps it should be an abstract interface-class instead of a value class.

For IFileSystem, file_system.h defines the virtual interface:

class IFileSystem
 
  {
 
  	virtual bool exists(const char *path) = 0;
 
  	virtual IFile *open_read(const char *path) = 0;
 
  	virtual IFile *open_write(const char *path) = 0;
 
  	...
 
  };
 
   
 
  IFileSystem *make_file_system(const char *root);
 
  void destroy_file_system(IFileSystem *fs);

Since the “open structs” in types.h can be accessed from anywhere, we can grop operations by what they do rather than by what types they operate on. For example, we can put all the serialization code in serialization.h and serialization.cpp. We can create a file path.h that provides path manipulation functions for strings.

An external project can also “extend” any of our classes by just writing new methods for it. These methods will have the same access to the Vector3 data and be called in exactly the same way as our built-in ones.

The main drawback of this model is that internal state is not as “protected” as in standard object-oriented design. External code can “break” our objects by manipulating members directly instead of using methods. For example, a stupid programmer might try to change the size of an array by manipulating the _size field directly, instead of using the resize() method.

Naming conventions can be used to mitigate this problem. In the example above, if a type is declared with class and the members are preceded by an underscore, the user should not manipulate them directly. If the type is declared as a struct, and the members do not start with an underscore, it is OK to manipulate them directly. Of course, a stupid programmer can still ignore this and go ahead and manipulate the members directly anyway. On the other hand, there is no end to the things a stupid programmer can do to destroy code. The best way to protect against stupid programmers is to not hire them.

I haven’t yet written anything really big in this style, but I’ve started to nudge some files in the Bitsquid codebase in this direction, and so far the experience has been positive.

This has also been posted to The Bitsquid blog.