Iterate Better, Iterate Faster

Original Author: Tom Gaulton

“Iteration means the act of repeating a process usually with the aim of approaching a desired goal or target or result.” (wikipedia)

In the game development business, if you’ve ever talked to anyone about “iteration” there’s a good chance you were discussing iteration times – how long it takes to perform one iteration, i.e. the time it takes to get from making a change to seeing the effect of that change.

Usually you’ll be either moaning about how painfully long that iteration time is, or gushing about how amazingly fast it is, depending on the tools in question. I do too, I think we all do. It’s universally acknowledged that long iteration times are bad. Not only does it mean that tasks take longer, but there’s a good chance that during all the waiting around you’ll switch tasks to fill the time, and then get completely distracted and forget what you were doing in the first place. Hands up if you’ve ever waited a minute for your game to load then clicked straight past the thing you were meant to be testing.. then rebooted and done it again. Yep, me too. What about staring at an empty e-mail on your screen and wondering why you clicked New Message in the first place? There must have been a reason, before you got sidetracked onto that other thing.

I once read a study (I can’t find the link now, so you’ll have to take my word for it) which showed that women multi-task slightly more effectively than men, but that both women and men multi-task incredibly badly and should really just stick to doing one thing at a time. So if long iteration times are leading to more multi-tasking and less focus, that’s just a double helping of lost productivity.

There’s plenty of documentation and discussion about reducing iteration times and optimising workflow, including some articles right here on #altdevblogaday (for example /2011/04/23/optimizing-workflow-2/). These provide valuable insights, from optimising your C++ headers, to ensuring you can boot your game straight to any point in any level without having to click through 5 menus and fight a horde of enemies. But iteration time isn’t the whole story, there’s a forgotten variable in the iteration formula – and that’s the number of iterations required.

There’s no point using method A, where the iteration time is 20 seconds and it takes 20 attempts to get the right result, when method B takes 1 minute per iteration but only 3 goes to get the result you desire. Method B is far from perfect, but clearly better in this case.

The holy grail of development then, is a blisteringly fast iteration time , coupled with a low number of iterations. Which is why I encourage you to iterate better, not just faster.

So what factors lead to an increase in the number of iterations? And how can we tackle them?

1. Complicated processes. The more complex a process is, the more likely that a mistake will occur, and that’s an iteration wasted. To be honest, this point applies just as much to iteration times (a more complicated process naturally takes longer) but I wanted to throw it out there anyway. The solution? Simplify your processes wherever possible.

2. Unreliable processes. If the tool you’re using crashes 50% of the time, you’ll have to make the change twice as many times to get it right. Demand the tools that you need to get your job done effectively.

3. Poor feedback. Not knowing why your change didn’t give the desired result is a recipe for endless iteration and often, even when you do get to the result you wanted, you haven’t really understood why – which means you’ll be back again later when your change goes wrong (cue more iteration). Invest time in improving the feedback, or heckling the person that can. I came across this a while back when I switched from C++ to Python for a slice of my development, my iteration times went down massively but the number of iterations went up initially because I didn’t have any debugging tools (printf doesn’t count!). Investing time in finding good debugging aids paid dividends in the long run.

4. Infinite monkey syndrome. If your iteration time is really low, the incentive to think about the change you’re making will drop. Why bother calculating the amount of memory you need to allocate when you can just try a few values until you hit the right one? In certain situations there is no right or wrong result and in these cases iterating fast and often is the best way to go, but for programmers and designers especially, making changes based purely on empirical evidence is asking for trouble. It seems quicker initially, but it rarely turns out that way. I’d wager that even the most dedicated and hard working amongst us will fall into this trap from time to time, so keep an eye out for it. If you need to, write “I am not a random monkey” on a post-it and attach it to your monitor 🙂

5. Multitasking. I said it before and I’ll say it again, don’t multitask if you can avoid it. Even if you’re waiting for a build process to complete, try to keep focussed on the change you’re making – consider ways in which the change might not give the desired result, so that when something does turn out wrong you might already know why. I regularly cancel a code build part way through when I realise I’ve overlooked something, which feels a lot more efficient than switching to my e-mail for a bit, then running the broken code and having to debug it when I’ve already started to lose focus.

The conclusion that I’ve drawn from my experience, and hope you will too, is that the number iterations required to achieve a result is just as important as the speed of one iteration. Of course you should analyse your processes to find ways of speeding them up, but also step back for a moment and look for ways you can reduce the number of iterations you need to do in the first place. Aim to get things right first time – you won’t always manage it, but if you set yourself that goal then you’ll get things right in less time than you expect.


Game Loops on IOS

Original Author: Kwasi Mensah

Over at originally from our site that gives a high level overview on how we manage our game loop.

Introduction

(This post assumes you’re familiar with C, Objective-C and Cocoa.)

Just like TV shows and films a video game’s visual representation can be broken down into frames, single snapshots of the state of the game world at any given time. By showing these frames many times per second we give players the illusion of continuous motion.

The loop below is a high level description of what the game has to do each frame to draw the current snapshot.

//Basic game loop
 
  while(1)
 
  {
 
      ProcessInput();
 
      DoLogic();
 
      Render();
 
  }

Even though the iOS API doesn’t make this clear there’s a similar loop going on, we just have less control over it.

All iOS apps start with a call to UIApplicationMain which kicks off a NSRunLoop which for our purposes is a fancy event queue. Whenever something interesting but non-critical happens (input for example) it gets put on the back of this queue and processed in the order it’s received.

So how do we emulate our loop above using this? We know with UIApplicationDelegate we don’t need ProcessInput anymore. We can care of input events as soon as they come in with touchesBegan, touchesEnd and touchesCanceled. In general, input is going to require some type platform specific callbacks so as long as we leave breathing room in our game loop for processing these we should be fine.*1

Naive Try at a Game Loop

We know we want to calculate a new frame fairly frequently. Let’s pull the number 60 times per second out of thin air for now.

//Bad iPhone game loop
 
  -(void) doFrame:(id)data
 
  {
 
      DoLogic();
 
      Render();
 
  }
 
  
 
  - (BOOL)application:(UIApplication *)application
 
      didFinishLaunchingWithOptions:(NSDictionary*)launchOptions
 
  {
 
      doFrame();
 
      [NSTimer scheduledTimerWithTimeInterval:(1.0/60.0) target:self
 
          selector:@selector(doFrame) userInfo:nil repeats:YES];
 
  }

The above won’t cut it. The resolution of an NSTimer is 50-100 milliseconds. *2 1/60th of a second = 16 milliseconds. Our resolution, at best, is thrice the frequency at which we want our function to be called. There are also a host of other problems with using repeating NSTimer‘s *2. We could try messing with the NSObject performSelector family of functions but you’re going into run into other problems, especially with our friend v-sync.

The Basics of Vertical Sync(v-sync)

Monitors take a real (but very small) amount of time to draw the images on our screens. Furthermore, there’s a gap between when we finish drawing one image and when we’re ready to draw another.

Draw–Rest–Draw–Rest–Draw

The vertical refresh rate of a monitor is how often the monitor draws a frame per second. This number varies in different monitors in different regions. Since we’re only concerned about iOS in this post, we’re going to focus on 60 frames per second = 16 milliseconds (remember the number I pulled out of thin air) because that’s the vertical refresh rate of iOS devices.*3

Vertical synchronization (v-sync) refers to waiting for the screen to finish being draw i.e. waiting for the start of a “Rest” state.

With this information, let’s take a closer look at our Render function:

void Render()
 
  {
 
      InitDrawing();*6
 
  ...
 
      FinishDrawing();*7
 
  }

InitDrawing() waits for v-sync, if it hasn’t happened yet, before letting us draw any objects.tightens the graphics on level 3.

So what happens if we don’t take v-sync into account when we draw? We’ll be stuck inside of InitDrawing() waiting for v-sync when we could be doing other useful things like dealing with touches. In fact, it was debugging the lagginess of the VoiceOver (Apple’s screen reader tech) in my game that made me realize that I was stuck inside of InitDrawing().

Note, even if we were in a perfect environment and able to call doFrame every 16 msecs, if it’s not synchronized with vsync we’ll still waste a lot of time waiting for it.

We’re Halfway There

So how do we make sure doFrame gets called 60 times a second and we won’t have to wait for v-sync? Fortunately, CADisplayLink comes to our rescue.*8 This functions like the NSTimer above but it has a good resolution and is timed with v-sync (sort of, we’ll talk about it later).

//Better but not quite right yet.
 
  -(void) doFrame:(id)data
 
  {
 
      DoLogic();
 
      Render();
 
  }
 
  
 
  - (BOOL)application:(UIApplication *)application
 
      didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
 
  {
 
      mFrameLink = [CADisplayLink displayLinkWithTarget:self
 
          selector:@selector(doFrame:)];
 
      mFrameLink.frameInterval = 1;
 
      [mFrameLink addToRunLoop:[NSRunLoop mainRunLoop]
 
          forMode:NSDefaultRunLoopMode];
 
  }

If we can assure that every call to doFrame will always take less than 16 msecs than the above will work and you go home and get back to coding. And dear God, blog to the rest of us on how you’re doing that. However, if any of our frames take even a little bit longer you’re gonna be in for a shocking surprise. Let’s draw some pretty pictures.

Let’s say we have a frame that takes 21 msecs (needed a little extra time to decode your awesome background music).

Timeline: 0---16---32---48
 
  Frame 1:  0----21

Since we took longer than our alloted 16msecs, CADisplayLink won’t wait untl the 32msec mark to call doFrame. It will call it immediately.*9 It’s easy think “that’s ok. I normally only take 10 msecs so this will be straightened out by the next frame”. But remember our section on v-sync. By calling doFrame right away we’re going to be stuck in InitDrawing() until the 32msec mark. So our timeline after the second frame will look like*13:

Timeline: 0---16---32---48
 
  Frame 1:  0-----21
 
  Frame 2:        21----42

Our second frame has to wait until the 32 msec mark before it can start rendering and then it will take 10 msec to draw. And remember, you passed the 32 msec mark while drawing this second frame which means doFrame is going to be called immediately again.

Timeline: 0---16---32---48---64
 
  Frame 1:  0-----21
 
  Frame 2:        21----42
 
  Frame 3:              42---58

The timelines above only take into account calling doFrame. There’s no breathing room for input processing which will make your app unresponsive and laggy until the storm settles. We know this will settle itself out because the amount of time spent waiting for vsync decreases with each frame, but the fact remains that one frame of suckiness can cause an avalanche of woe.

It’s hard to track down if you don’t know what you’re looking for. Instruments will still tell you that you’re rendering at 60 frames per second. But you’ll see a lot of time spent in some combination of semwait_signal and usleep because you’re stuck waiting for v-sync.

We could’ve put a quick end to this by enforcing that if we miss a v-sync we just suck it up and wait for the next one i.e. don’t get run over chasing the bus you missed.

Timeline: 0---16---32---48
 
  Frame 1:  0-----21
 
  Frame 2:           32--42

This gives us the breathing room we need to still process input and makes sure that one errant frame doesn’t screw us over in the long run. Fortunately, this is eerily similar to problems physics engines have run into and Glenn Fiedler has an excellent article that’s the basis of how we’re going to deal with this. *10

Conclusion

We’re going to remember how long the past frame took and bank it.*11 Then the next time we call doFrame we subtract 16 msecs from the bank. If the bank has no more time left in it then we know we won’t have to wait for v-sync. Otherwise, we do nothing because we know we’re being triggered for a v-sync that we missed.

Also note, the call to fmod if we take longer than a frame. If we miss multiple v-sync (using a breakpoint for example) there’s only one call to doFrame *9. If we didn’t have this, we can get a huge amount of time our bank and we won’t render until its all subtracted out, one frame at a time.

//Psuedo-code for what I'm currently using.
 
  -(void) doFrame:(id)data
 
  {
 
      static double bank = 0;
 
      double frameTime = mFrameLink.duration * mFrameLink.frameInterval;
 
      bank -= frameTime;
 
      if( bank > 0 )
 
      {
 
          return;
 
      }
 
      bank = 0;
 
      PlatformHiResTimer timer;*12
 
      timer.Start();
 
      DoLogic();
 
      Render();
 
      double elapsed = timer.ElapsedTime()*PlatformHiResTimer::sNanoSecToSec;
 
      bank = elapsed;
 
      if( elapsed > frameTime )
 
      {
 
          bank = frameTime + fmod( elapsed, frameTime );
 
      }
 
  }

This should take account sporadic frames being longer than usual. This isn’t meant to protect against a stream of frames that consistently take longer that 16 msecs. If that’s the case you need to lower your frame rate.

It took a fair amount of experimenting to figure this out. While this discussion focuses on iOS its meant to serve as a baseline for setting up game loops for any platform.

Notes

Note 1: If you’re really set on delaying input handling you can use the -touches* functions to queue them until you call ProcessInput. Be careful of lag, especially if you’re game is supposed to work with VoiceOver( Apple’s screen reader tech). The VoiceOver cursor relies on touch events being processed off of the run loop even if the events aren’t passed down to the app.

here.

Note 3: Our final solution actually isn’t dependent on knowing vertical refresh rate. It’s just easier to discuss with a firm number in mind.

Note 4: EAGlContext presentRenderBuffer returns before v-sync happens. It’s the next render command that will be forced to wait for v-sync if it’s called too early. A lot of the google results for glClear being slow on the iPhone are probably related to not waiting for v-sync.

this wiki article.

Note 6:

void InitDrawing()
 
  {
 
      [EAGLContext setCurrentContext: sGLContext];
 
      glColor4f( 0, 0, 0, 1 );
 
      glClear(GL_COLOR_BUFFER_BIT);
 
  
 
      //You can also draw a screen filling black quad. This hasn't been
 
      //a bottleneck for me.
 
  }

Note 7:

void FinishDrawing()
 
  {
 
      glBindRenderbufferOES(GL_RENDERBUFFER_OES, sColorRenderBuffer);
 
      [sGLContext presentRenderbuffer:GL_RENDERBUFFER_OES];
 
  }

Note: it seems like iOS always uses double buffering.

Reference doc.

Note 9: CADisplayLink‘s behavior is actually more complicated. If your frame misses multiple v-sync events the selector seems to only be called once for all of the missed v-sync events. I also believe that the CADisplayLink object’s duration property is for the most recent v-sync event. Furthermore, the CADisplayLink object’s duration property can vary significantly from the time between calls to the selector even when we don’t miss a frame. Performing the selector seems to be put off if the NSRunLoop is busy when v-sync happens. The moral of the story: While CADisplayLink gives us a much better resolution that a NSTimer, calls to doFrame aren’t guaranteed to line up with v-sync.

here.

Note 11: Glenn would use the term accumulator but I think “bank” is more intuitive.

this Mac OSX Q&A

Note 13: For the timelines we’re assuming DoLogic() doesn’t take a noticeable amount of time. It makes drawing the timelines easier. Note that without this assumption, you could get lucky and have the work in DoLogic() push the call to Render() past v-sync. But you’re playing with fire as it’s going to be very hard to make sure that happens on a consistent basis.

We could also call Render() before DoLogic() to make sure we start rendering as soon as v-sync is done. However, we’ll be rendering a frame behind the game. This can cause wonkiness, especially if we’ve processed input that’s changed the state of the game but we haven’t called DoLogic to validate/fix it.

It makes more sense to me to always call DoLogic() first. When doFrame() takes less than 16 msecs it doesn’t make a difference. When doFrame() takes longer than 16 msecs I think the final version of doFrame() handles it better than the trickeration needed to call Render() first. I’ve seen a fair amount of game programming books advocate calling Render() first so I might be missing out on something.


Defining a SIMD Interface: Redux

Original Author: Don Olmstead

In a previous article I discussed implementing a SIMD interface which could then be used to build a math library. By abstracting the underlying SIMD architecture the math library can work on multiple platforms without requiring a platform specific rewrite. When moving to a different platform the interface is the only portion that needs to be satisfied and the rest of the mathematics code should just work.

Math library diagram

Math library diagram

Now that all sounds good on paper, but how does it perform? This was the most glaring omission of the previous article, and one I hope to rectify in this article by providing an actual performance comparison.

One of the articles I referenced, GitHub.

SIMD Implementations

The performance tests compare five different SIMD vector implementations. The underlying SIMD code for each implementation is equivalent, they all use the same intrinsics for each function/method. The actual comparison is on how well the compiler generates the assembly for the various levels of indirection.

Implementation #1 – VMath

 
  namespace VMATH
 
  {
 
  	typedef __m128	Vec4;
 
  
 
  	inline Vec4	VLoad(float *pVec)
 
  	{
 
  		return(_mm_load_ps(pVec));
 
  	};
 
  
 
  	inline Vec4 VReplicate(float f)
 
  	{
 
  		return _mm_set_ps1(f);
 
  	}
 
  
 
  	inline Vec4 VLoad(float x, float y, float z, float w)
 
  	{
 
  		return(_mm_set_ps(x, y, z, w));
 
  	}
 
  
 
  	inline Vec4 VAdd(Vec4 va, Vec4 vb)
 
  	{
 
  		return(_mm_add_ps(va, vb));
 
  	};
 
  
 
  	inline Vec4 VSub(Vec4 va, Vec4 vb)
 
  	{
 
  		return(_mm_sub_ps(va, vb));
 
  	};
 
  
 
  	inline Vec4 VMul(Vec4 va, Vec4 vb)
 
  	{
 
  		return(_mm_mul_ps(va, vb));
 
  	};
 
  
 
  	inline Vec4 VDiv(Vec4 va, Vec4 vb)
 
  	{
 
  		return(_mm_div_ps(va, vb));
 
  	};
 
  
 
  	inline void VStore(float *pVec, Vec4 v)
 
  	{
 
  		_mm_store_ps(pVec, v);
 
  	};
 
  
 
  	inline Vec4 VBc(Vec4 v)
 
  	{
 
  		return(_mm_shuffle_ps(v, v, _MM_SHUFFLE(3,3,3,3)));
 
  	}
 
  
 
  	inline Vec4 Dot(Vec4 va, Vec4 vb)
 
  	{
 
  		Vec4 t0 = _mm_mul_ps(va, vb);
 
  		Vec4 t1 = _mm_shuffle_ps(t0, t0, _MM_SHUFFLE(1,0,3,2));
 
  		Vec4 t2 = _mm_add_ps(t0, t1);
 
  		Vec4 t3 = _mm_shuffle_ps(t2, t2, _MM_SHUFFLE(2,3,0,1));
 
  		Vec4 dot = _mm_add_ps(t3, t2);
 
  		return (dot);
 
  	}
 
  
 
  	inline Vec4 Sqrt(Vec4 v)
 
  	{
 
  		return(_mm_sqrt_ps(v));
 
  	}
 
  
 
  	inline void GetX(float *p, Vec4 v)
 
  	{
 
  		_mm_store_ss(p, v);
 
  	}
 
  }
 
  

This is Gustavo’s vector implementation. The SIMD type is declared purely, via a typedef. It also forgoes operator overloading, preferring to use a procedural interface.

Implementation #2 – XNAMath

This is the vector implementation that comes with the DirectX SDK, and can be found in the include directory within the SDK. Like VMath it declares the SIMD type purely. It contains both a procedural interface and operator overloading.

Implementation #3 – VClass

 
  namespace VCLASS
 
  {
 
  	class Vec4
 
  	{
 
  	public:
 
  		inline Vec4() {}
 
  
 
  		inline Vec4(float *pVec)
 
  			: xyzw(_mm_load_ps(pVec))
 
  		{ }
 
  
 
  		inline Vec4(float f)
 
  			: xyzw(_mm_set_ps1(f))
 
  		{ }
 
  
 
  		inline Vec4(const __m128& qword)
 
  			: xyzw(qword)
 
  		{ }
 
  
 
  		inline Vec4(float x, float y, float z, float w)
 
  			: xyzw(_mm_set_ps(x, y, z, w))
 
  		{ }
 
  
 
  		inline Vec4(const Vec4& copy)
 
  			: xyzw(copy.xyzw)
 
  		{ }
 
  
 
  		inline Vec4& operator= (const Vec4& copy)
 
  		{
 
  			xyzw = copy.xyzw;
 
  
 
  			return *this;
 
  		}
 
  
 
  		inline Vec4& operator+=(const Vec4 &rhs)
 
  		{
 
  			xyzw = _mm_add_ps(xyzw, rhs.xyzw);
 
  			return *this;
 
  		}
 
  
 
  		inline Vec4& operator-=(const Vec4 &rhs)
 
  		{
 
  			xyzw = _mm_sub_ps(xyzw, rhs.xyzw);
 
  			return *this;
 
  		}
 
  
 
  		inline Vec4& operator*=(const Vec4 &rhs)
 
  		{
 
  			xyzw = _mm_mul_ps(xyzw, rhs.xyzw);
 
  			return *this;
 
  		}
 
  
 
  		inline Vec4 operator+(const Vec4 &rhs) const
 
  		{
 
  			return Vec4(_mm_add_ps(xyzw, rhs.xyzw));
 
  		}
 
  
 
  		inline Vec4 operator*(const Vec4 &rhs) const
 
  		{
 
  			return Vec4(_mm_mul_ps(xyzw, rhs.xyzw));
 
  		}
 
  
 
  		inline Vec4 operator-(const Vec4 &rhs) const
 
  		{
 
  			return Vec4(_mm_sub_ps(xyzw, rhs.xyzw));
 
  		}
 
  
 
  		inline Vec4 operator/(const Vec4 &rhs) const
 
  		{
 
  			return Vec4(_mm_div_ps(xyzw, rhs.xyzw));
 
  		}
 
  
 
  		inline void Store(float *pVec) const
 
  		{
 
  			_mm_store_ps(pVec, xyzw);
 
  		}
 
  
 
  		inline void Bc()
 
  		{
 
  			xyzw = _mm_shuffle_ps(xyzw, xyzw, _MM_SHUFFLE(3,3,3,3));
 
  		}
 
  
 
  		static inline Vec4 Dot(const Vec4& va, const Vec4& vb)
 
  		{
 
  			const __m128 t0 = _mm_mul_ps(va.xyzw, vb.xyzw);
 
  			const __m128 t1 = _mm_shuffle_ps(t0, t0, _MM_SHUFFLE(1,0,3,2));
 
  			const __m128 t2 = _mm_add_ps(t0, t1);
 
  			const __m128 t3 = _mm_shuffle_ps(t2, t2, _MM_SHUFFLE(2,3,0,1));
 
  
 
  			return Vec4(_mm_add_ps(t3, t2));
 
  		}
 
  
 
  		static inline Vec4 Sqrt(const Vec4& va)
 
  		{
 
  			return Vec4(_mm_sqrt_ps(va.xyzw));
 
  		}
 
  
 
  		static inline void GetX(float *p, const Vec4& v)
 
  		{
 
  			_mm_store_ss(p, v.xyzw);
 
  		}
 
  
 
  	private:
 
  		__m128	xyzw;
 
  	};
 
  }
 
  

This is a vector implementation where the SIMD type is wrapped in a class. It uses operator overloading.

As an aside for those who have read Gustavo’s article the reason the original VClass implementation performed so poorly was because of the copy constructor being generated by Visual Studio. It was not being inlined and resulted in the performance deficiency. The copy constructor was added as well as the assignment operator. The operators were also modified to not create a temporary instance.

Implementation #4 – VClassTypedef

 
  namespace VCLASS_TYPEDEF
 
  {
 
  	///////////////////////////////////////////
 
  	// SIMD TYPEDEF
 
  	///////////////////////////////////////////
 
  
 
  	typedef __m128 simd_type;
 
  	typedef const __m128& simd_param;
 
  
 
  	inline simd_type VLoad(float *pVec)
 
  	{
 
  		return _mm_load_ps(pVec);
 
  	}
 
  
 
  	inline simd_type VLoad(float f)
 
  	{
 
  		return _mm_set_ps1(f);
 
  	}
 
  
 
  	inline simd_type VLoad(float x, float y, float z, float w)
 
  	{
 
  		return _mm_set_ps(x, y, z, w);
 
  	}
 
  
 
  	inline simd_type VBAdd(simd_param va, simd_param vb)
 
  	{
 
  		return _mm_add_ps(va, vb);
 
  	}
 
  
 
  	inline simd_type VBSub(simd_param va, simd_param vb)
 
  	{
 
  		return _mm_sub_ps(va, vb);
 
  	}
 
  
 
  	inline simd_type VBMul(simd_param va, simd_param vb)
 
  	{
 
  		return _mm_mul_ps(va, vb);
 
  	}
 
  
 
  	inline simd_type VBDiv(simd_param va, simd_param vb)
 
  	{
 
  		return _mm_div_ps(va, vb);
 
  	}
 
  
 
  	inline void VBStore(float *pVec, simd_param v)
 
  	{
 
  		return _mm_store_ps(pVec, v);
 
  	}
 
  
 
  	inline simd_type VBc(simd_param v)
 
  	{
 
  		return _mm_shuffle_ps(v, v, _MM_SHUFFLE(3,3,3,3));
 
  	}
 
  
 
  	inline simd_type VBDot(simd_param va, simd_param vb)
 
  	{
 
  		const simd_type t0 = _mm_mul_ps(va, vb);
 
  		const simd_type t1 = _mm_shuffle_ps(t0, t0, _MM_SHUFFLE(1,0,3,2));
 
  		const simd_type t2 = _mm_add_ps(t0, t1);
 
  		const simd_type t3 = _mm_shuffle_ps(t2, t2, _MM_SHUFFLE(2,3,0,1));
 
  
 
  		return _mm_add_ps(t3, t2);
 
  	}
 
  
 
  	inline simd_type VBSqrt(simd_param v)
 
  	{
 
  		return _mm_sqrt_ps(v);
 
  	}
 
  
 
  	inline void VBGetX(float *p, simd_param v)
 
  	{
 
  		_mm_store_ss(p, v);
 
  	}
 
  
 
  	///////////////////////////////////////////
 
  	// Vec4
 
  	///////////////////////////////////////////
 
  
 
  	template <typename Real, typename Rep>
 
  	class vector4
 
  	{
 
  	public:
 
  		inline vector4() { }
 
  
 
  		inline vector4(Real *pVec)
 
  			: _rep(VLoad(pVec))
 
  		{ }
 
  
 
  		inline vector4(Real f)
 
  			: _rep(VLoad(f))
 
  		{ }
 
  
 
  		inline vector4(Real x, Real y, Real z, Real w)
 
  			: _rep(VLoad(x, y, z, w))
 
  		{ }
 
  
 
  		inline vector4(const simd_type& rep)
 
  			: _rep(rep)
 
  		{ }
 
  
 
  		inline vector4(const vector4& copy)
 
  			: _rep(copy._rep)
 
  		{ }
 
  
 
  		inline vector4& operator= (const vector4& copy)
 
  		{
 
  			_rep = copy._rep;
 
  
 
  			return *this;
 
  		}
 
  
 
  		inline vector4& operator+= (const vector4& rhs)
 
  		{
 
  			_rep = VBAdd(_rep, rhs._rep);
 
  
 
  			return *this;
 
  		}
 
  
 
  		inline vector4& operator-= (const vector4& rhs)
 
  		{
 
  			_rep = VBSub(_rep, rhs._rep);
 
  
 
  			return *this;
 
  		}
 
  
 
  		inline vector4& operator*= (const vector4& rhs)
 
  		{
 
  			_rep = VBMul(_rep, rhs._rep);
 
  
 
  			return *this;
 
  		}
 
  
 
  		inline vector4 operator+ (const vector4& rhs) const
 
  		{
 
  			return vector4(VBAdd(_rep, rhs._rep));
 
  		}
 
  
 
  		inline vector4 operator- (const vector4& rhs) const
 
  		{
 
  			return vector4(VBSub(_rep, rhs._rep));
 
  		}
 
  
 
  		inline vector4 operator* (const vector4& rhs) const
 
  		{
 
  			return vector4(VBMul(_rep, rhs._rep));
 
  		}
 
  
 
  		inline vector4 operator/ (const vector4& rhs) const
 
  		{
 
  			return vector4(VBDiv(_rep, rhs._rep));
 
  		}
 
  
 
  		inline void Store(Real *pVec) const
 
  		{
 
  			VBStore(pVec, _rep);
 
  		}
 
  
 
  		inline void Bc()
 
  		{
 
  			VBc(_rep);
 
  		}
 
  
 
  		static inline vector4 Dot(const vector4& va, const vector4& vb)
 
  		{
 
  			return vector4(VBDot(va._rep, vb._rep));
 
  		}
 
  
 
  		static inline vector4 Sqrt(const vector4& va)
 
  		{
 
  			return vector4(VBSqrt(va._rep));
 
  		}
 
  
 
  		static inline void GetX(Real *p, const vector4& v)
 
  		{
 
  			VBGetX(p, v._rep);
 
  		}
 
  
 
  	private:
 
  		Rep _rep;
 
  	} ;
 
  
 
  	typedef vector4<float, simd_type> Vec4;
 
  }
 
  

This is a vector implementation where the SIMD type is passed as a template parameter for the class. The SIMD type being passed is declared purely using a procedural interface.

Implementation #5 – VClassSIMDType

 
  namespace VCLASS_SIMDTYPE
 
  {
 
  	///////////////////////////////////////////
 
  	// SIMD CLASS (Same as VCLASS)
 
  	///////////////////////////////////////////
 
  
 
  	class simd_type
 
  	{
 
  	public:
 
  		inline simd_type() {}
 
  
 
  		inline simd_type(float *pVec)
 
  			: xyzw(_mm_load_ps(pVec))
 
  		{ }
 
  
 
  		inline simd_type(float f)
 
  			: xyzw(_mm_set_ps1(f))
 
  		{ }
 
  
 
  		inline simd_type(const __m128& qword)
 
  			: xyzw(qword)
 
  		{ }
 
  
 
  		inline simd_type(float x, float y, float z, float w)
 
  			: xyzw(_mm_set_ps(x, y, z, w))
 
  		{ }
 
  
 
  		inline simd_type(const simd_type& copy)
 
  			: xyzw(copy.xyzw)
 
  		{ }
 
  
 
  		inline simd_type& operator= (const simd_type& copy)
 
  		{
 
  			xyzw = copy.xyzw;
 
  
 
  			return *this;
 
  		}
 
  
 
  		inline simd_type& operator+=(const simd_type &rhs)
 
  		{
 
  			xyzw = _mm_add_ps(xyzw, rhs.xyzw);
 
  			return *this;
 
  		}
 
  
 
  		inline simd_type& operator-=(const simd_type &rhs)
 
  		{
 
  			xyzw = _mm_sub_ps(xyzw, rhs.xyzw);
 
  			return *this;
 
  		}
 
  
 
  		inline simd_type& operator*=(const simd_type &rhs)
 
  		{
 
  			xyzw = _mm_mul_ps(xyzw, rhs.xyzw);
 
  			return *this;
 
  		}
 
  
 
  		inline simd_type operator+(const simd_type &rhs) const
 
  		{
 
  			return simd_type(_mm_add_ps(xyzw, rhs.xyzw));
 
  		}
 
  
 
  		inline simd_type operator*(const simd_type &rhs) const
 
  		{
 
  			return simd_type(_mm_mul_ps(xyzw, rhs.xyzw));
 
  		}
 
  
 
  		inline simd_type operator-(const simd_type &rhs) const
 
  		{
 
  			return simd_type(_mm_sub_ps(xyzw, rhs.xyzw));
 
  		}
 
  
 
  		inline simd_type operator/(const simd_type &rhs) const
 
  		{
 
  			return simd_type(_mm_div_ps(xyzw, rhs.xyzw));
 
  		}
 
  
 
  		inline void Store(float *pVec) const
 
  		{
 
  			_mm_store_ps(pVec, xyzw);
 
  		}
 
  
 
  		inline void Bc()
 
  		{
 
  			xyzw = _mm_shuffle_ps(xyzw, xyzw, _MM_SHUFFLE(3,3,3,3));
 
  		}
 
  
 
  		static inline simd_type Dot(const simd_type& va, const simd_type& vb)
 
  		{
 
  			const __m128 t0 = _mm_mul_ps(va.xyzw, vb.xyzw);
 
  			const __m128 t1 = _mm_shuffle_ps(t0, t0, _MM_SHUFFLE(1,0,3,2));
 
  			const __m128 t2 = _mm_add_ps(t0, t1);
 
  			const __m128 t3 = _mm_shuffle_ps(t2, t2, _MM_SHUFFLE(2,3,0,1));
 
  				
 
  			return simd_type(_mm_add_ps(t3, t2));
 
  		}
 
  
 
  		static inline simd_type Sqrt(const simd_type& va)
 
  		{
 
  			return simd_type(_mm_sqrt_ps(va.xyzw));
 
  		}
 
  
 
  		static inline void GetX(float *p, const simd_type& v)
 
  		{
 
  			_mm_store_ss(p, v.xyzw);
 
  		}
 
  
 
  	private:
 
  		__m128	xyzw;
 
  	};
 
  
 
  	///////////////////////////////////////////
 
  	// Vec4
 
  	///////////////////////////////////////////
 
  
 
  	template <typename Real, typename Rep>
 
  	class vector4
 
  	{
 
  	public:
 
  		inline vector4() { }
 
  
 
  		inline vector4(Real *pVec)
 
  			: _rep(pVec)
 
  		{ }
 
  
 
  		inline vector4(Real f)
 
  			: _rep(f)
 
  		{ }
 
  
 
  		inline vector4(Real x, Real y, Real z, Real w)
 
  			: _rep(x, y, z, w)
 
  		{ }
 
  
 
  		inline vector4(const simd_type& rep)
 
  			: _rep(rep)
 
  		{ }
 
  
 
  		inline vector4(const vector4& copy)
 
  			: _rep(copy._rep)
 
  		{ }
 
  
 
  		inline vector4& operator= (const vector4& copy)
 
  		{
 
  			_rep = copy._rep;
 
  
 
  			return *this;
 
  		}
 
  
 
  		inline vector4& operator+= (const vector4& rhs)
 
  		{
 
  			_rep += rhs._rep;
 
  
 
  			return *this;
 
  		}
 
  
 
  		inline vector4& operator-= (const vector4& rhs)
 
  		{
 
  			_rep -= rhs._rep;
 
  
 
  			return *this;
 
  		}
 
  
 
  		inline vector4& operator*= (const vector4& rhs)
 
  		{
 
  			_rep *= rhs._rep;
 
  
 
  			return *this;
 
  		}
 
  
 
  		inline vector4 operator+ (const vector4& rhs) const
 
  		{
 
  			return vector4(_rep + rhs._rep);
 
  		}
 
  
 
  		inline vector4 operator- (const vector4& rhs) const
 
  		{
 
  			return vector4(_rep - rhs._rep);
 
  		}
 
  
 
  		inline vector4 operator* (const vector4& rhs) const
 
  		{
 
  			return vector4(_rep * rhs._rep);
 
  		}
 
  
 
  		inline vector4 operator/ (const vector4& rhs) const
 
  		{
 
  			return vector4(_rep / rhs._rep);
 
  		}
 
  
 
  		inline void Store(Real *pVec) const
 
  		{
 
  			_rep.Store(pVec);
 
  		}
 
  
 
  		inline void Bc()
 
  		{
 
  			_rep.Bc();
 
  		}
 
  
 
  		static inline vector4 Dot(const vector4& va, const vector4& vb)
 
  		{
 
  			return vector4(simd_type::Dot(va._rep, vb._rep));
 
  		}
 
  
 
  		static inline vector4 Sqrt(const vector4& va)
 
  		{
 
  			return vector4(simd_type::Sqrt(va._rep));
 
  		}
 
  
 
  		static inline void GetX(Real *p, const vector4& v)
 
  		{
 
  			simd_type::GetX(p, v._rep);
 
  		}
 
  
 
  	private:
 
  		Rep _rep;
 
  	} ;
 
  
 
  	typedef vector4<float, simd_type> Vec4;
 
  }
 
  

This is a vector implementation where the SIMD type is passed as a template parameter for the class. The SIMD type is wrapped in a class.

Test Setup

The application was built using Visual Studio 2010 utilizing the June 2010 DirectX SDK. The project file was converted from the original Visual Studio 2008 file with all the options kept intact.

The computer running the application is a MacBook Pro running Windows 7 with a 2.2 GHz Core 2 Duo with a Nvidia GeForce 8600M GT. The number of calculations done per frame is set to the maximum per each test.

Results

Cloth test settings

Cloth test settings

Implementation Average Time (ms)
VMath 77.66
XNAMath 75.85
VClass 74.90
VClassTypedef 74.89
VClassSIMDType 74.98

For the cloth simulation the VClass and VClassTypedef fight for the #1 slot. Followed by the VClassSIMDType. The XNAMath implementation isn’t too far behind at the #4 slot. Coming up in the rear is the VMath library.

3-band equalizer settings

3-band equalizer settings

Implementation Average Time (ms)
VMath 18.15
XNAMath 25.34
VClass 24.28
VClassTypedef 23.55
VClassSIMDType 25.25

For the 3-band equalizer the VMath implementation trounces the competition. Once again the VClass and VClassTypedef keep exchanging the #2 and #3 slots. Next up is the VClassSIMDType, and at the end is XNAMath.

Conclusion

The VClass implementations perform better than the XNAMath implementation for both tests, and better than the VMath implementation in the cloth simulation. Because of this the SIMD interface appears to be a viable approach.

I plan on creating a math library using this technique, with the SIMD type implemented using a procedural interface. I’ve started a repository at GitHub and will be developing it into a full fledged solution.

Feel free to follow its development!

Lightweight Lua Bindings

Original Author: Niklas Frykholm

A scripting language, such as Lua, can bring huge productivity gains to a game project. Quick iterations, immediate code reloads and an in-game console with a read-eval-print-loop are invaluable tools. A less obvious benefit is that introducing a scripting language creates a clear dividing line between “engine” and “gameplay” code with a well defined API between them. This is often good for the structure of the engine, at least if you intend to use it for more than one game.

The main drawback is of course performance. It is a scary thing to discover late in a project that the game is slow because the script is doing too much. Especially since bad script performance cannot always be traced back to bugs or bad algorithms. Sure, you get those as well, but you can also get problems with “overall slowness” that cannot easily be traced back to specific bottlenecks or hot spots. There are two reasons for this. First, the slowness of script code compared to C, which means that everything just takes more time. And second, the fact that gameplay code tends to be “connection” rather than “compute” heavy which means there is less to gain from algorithmic improvements.

Part of this is a management issue. It is important to monitor the script performance (on the slowest target platform) throughout the production so that measures can be taken early if it looks like it will become a problem. But in this article I will focus on the technical aspects, specifically the C-to-Lua bindings.

It is important to note that when I am talking about performance in this article I mean performance on current generation consoles, because that is where performance problems occur. PC processors are much more powerful (especially when running virtual machines, which tend to be brutal to the cache). The extra cores on the consoles don’t help much with script execution (since scripts are connection heavy, they are hard to multithread). And the PC can run LuaJIT which changes the game completely.

This may of course change in future generation consoles. If anyone from Sony or Microsoft is reading this, please add support for JITting to your next generation ventures.

Lua bindings

Apart from optimizing the Lua interpreter itself, optimizing the bindings between Lua and C is the best way of achieving a general performance improvement, since the bindings are used whenever Lua calls some function in the C code which in a typical game happens constantly.

The standard way of binding an object on the C side to Lua is to use a full userdata object. This is a heap allocated data blob with an associated metatable that can be used to store the methods of the object. This allows the user to make a call like:

game_world:get_camera():set_position(Vector3(0,0,0))

In many ways, this is the easiest and most convenient way of using objects in Lua, but it comes with several performance problems:

  • Any time an object is passed from C to Lua, such as the camera in get_camera()

    or the vector created by Vector3(0,0,0), memory for the object must be allocated on the heap. This can be costly.

  • All the heap objects must be garbage collected by Lua. Calls such as get_camera() create temporary objects that must be collected at some later time. The more garbage we create, the more time we need to spend in garbage collection.
  • Making use of many heap allocated objects can lead to bad cache performance. When the C side wants to use an object from Lua, it must first fetch it from Lua’s heap, then (in most cases) extract an object pointer from its data and look up the object in the game heap. So each time there is an extra cache miss.
  • The colon method call syntax world:get_camera() actually translates to something like (I’ve simplified this a bit, see the Lua documentation for details) world._meta_table[“get_camera”](world). I.e., it creates an extra table lookup operation for every call.

We can get rid of the first two issues by caching the Lua objects. I.e. instead of creating a new Lua object every time get_camera() is called, we keep a reference to the object on the Lua side and just look it up and return it every time it is requested. But this has other disadvantages. Managing the cache can be tricky and it creates a lot more objects in the Lua heap, since the heap will now hold every object that has ever been touched by Lua. This makes garbage collection take longer and the heap can grow uncontrollably during the play of a level, depending on which objects the player interacts with. Also, this doesn’t solve the issue with objects that are truly temporary, such as Vector3(0,0,0).

A better option is to use what Lua calls light userdata. A light userdata is essentially just a C pointer stored in Lua, with no additional information. It lives on the Lua stack (i.e. not the heap), does not require any memory allocations, does not participate in garbage collection and does not have an associated metatable. This addresses all our performance problems, but introduces new (not performance-related) issues:

  • Since the objects don’t have metatables we cannot use the convenient colon syntax for calling their methods.
  • Light user data objects do not carry any type information, they are just raw pointers. So on the C side we have no way of telling if we have been called with an object of the right type.
  • Lifetime management is trickier since objects do not have destructors and are not garbage collected. How do we manage dangling pointers in Lua?

Colon syntax

With light user data we cannot use the colon syntax to look up methods. Instead we must call global functions and pass in the objects as parameters. But we can still make sure to organize our methods nicely, i.e., put all the functions that operate on World objects in a table called World. It might then look something like this:

Camera.set_position(World.get_camera(game_world), Vector3(0,0,0))

If you are used to the object oriented style this way of writing can feel awkward at first. But in my experience you get accustomed to it quite quickly. It does have some implications which are not purely syntactical though. On the plus side, this style of writing makes it easy to cache the method lookups for better performance:

local camera_set_position = Camera.set_position
 
  local world_get_camera = World.get_camera
 
   
 
  camera_set_position(world_get_camera(game_world), Vector3(0,0,0))

This transformation is so simple that you can easily write a script that performs it on your entire code base.

The main drawback is that we are no longer doing dynamic method lookup, we are calling one specific C method. So we can’t do virtual inheritance with method overrides. To me that is not a big problem because firstly, I think inheritance is vastly overrated as a design concept, and secondly, if you really need virtual calls you can always do the virtual method resolution on the C side and get the benefits while still having a static call in Lua.

Type checking

For full userdata we can check the type by looking at the metatable. The Lua library function luaL_checkudata provides this service. Since light userdata is just a raw pointer to Lua, no corresponding functionality is offered. So we need to provide the type checking ourselves. But how can we know the type of an arbitrary C pointer?

An important thing to notice is that type checking is only used for debugging. We only need to know if a function has been called with the right arguments or not. So we don’t actually need to know the exact type of the pointer, we just need to know if it points to the thing we expect. And since this is only used for bug detection, it doesn’t matter if we get a few false positives. And it is fine if the test takes a few cycles since we can strip it from our release builds.

Since we just need to know “is the object of this type” we can make test different for each type. So for each type, we can just pick whatever test fits that type best. Some possibilities are:

  • Store a known four byte type marker at the start of the object’s memory. To verify the type, just dereference the pointer and check that the first four bytes match the expected marker. (This is the method I use most frequently.)
  • Keep a hash table of all objects of the specified type and check if it is there.
  • For objects that are allocated from a pool, check that the pointer lies within the range of the pool.

Object lifetimes

There are two approaches you can take to ownership of objects in the Lua interface. They can either be Lua owned and destroyed by the garbage collector or they can be owned by the C side and destroyed by explicit function calls. Both approaches have their advantages, but I usually lean towards the latter one. To me it feels more natural that Lua explicitly creates and destroys cameras with World.destroy_camera() rather than cameras just popping out of existence when the garbage collector feels they are no longer used. Also, since in our engine, Lua is an option, not a requirement, it makes more sense to have the ownership on the C side.

With this approach you have the problem that Lua can hold “dangling pointers” to C objects, which can lead to nasty bugs. (If you took the other approach, you would have the opposite problem, which is equally nasty.)

Again, for debugging purposes, we would want to do something similar to what we did with the type information. We would like to know, in debug builds, if the programmer has passed us a pointer to a dead object, so that we can display an error message rather than exhibit undefined behavior.

This is a trickier issue and I haven’t found a clear cut solution, but here are some of the techniques I have used:

  • Clear out the marker field of the object when it is freed. That way if you attempt to use it later you will get a type error. Of course, checking this can cause an access violation if the memory has been returned to the system.
  • For objects that get created and destroyed a lot, such as particles or sound instances, let Lua manage them by IDs rather than by raw pointers.
  • Keep a hash table of all known live objects of the type.
  • Let Lua point to the object indirectly through a handle. Use some bits of the pointer to locate the handle and match the rest to a counter in the handle so that you can detect if the handle has been released and repurposed for something else.

Conclusions

Using light instead of full userdata does make things more inconvenient. But as we have seen, there are tricks that help overcome many of these inconveniences.

We still haven’t looked at truly the temporary objects, such as Vector3(0,0,0). In my next article I will discuss what can be done about them.

BitSquid blog.)

Hierarchical Z-Buffer Occlusion Culling – Generating Occlusion Volumes

Original Author: Nick Darnell

volume_cat

Almost a year ago I wrote a couple of posts on Hierarchical Z-Buffer Occlusion Culling (HZB-OC).

One very glaring issue I left largely untouched was the workflow issue.  Artists currently have to author all or a large percentage of the volumes by hand in the implementations I’m aware exist.  Ideally these simplified occlusion volumes would be generated by a tool in the art pipeline instead.

A few weeks ago I started looking for a new side project and decided to try and flesh-out a very rough idea I had for automatically generating occlusion volumes.  This is still a work in progress but I wanted to share my current progress because I think it has some value even at this early stage.

The Problem

An ideal occlusion volume has some important features,

  1. Conservativeness – Doesn’t extend beyond the surface of the mesh
  2. Simplicity – The occlusion volume is made of very few triangles or is fast to render
  3. Volume Conservation – Closely matches the original mesh’s volume
  4. Dynamic – Some games have large movable occluders or destroyable walls

Normal methods of simplifying a mesh such as typical triangle simplification causes problems in both the area of conservativeness and volume conservation.  In some cases you can use the physics collision mesh if available.  One thing to be aware of is that when the physics volume is larger than the visual mesh it can cause false occlusions leading to things popping into view.  Also not every mesh needs a physics volume nor are physics volumes always meshes.  Will Vale’s talk from GDC 2011 covered using physics collision meshes pretty thoroughly for HZB-OC, you should check it out.

The Technique

Let me start by summarizing the technique I’ve been developing for generating the occlusion volumes before I go in-depth into each step in the process.

  1. Find all the voxels completely inside a mesh
  2. Find the voxel at the densest point in the volume
  3. Expand a box from this point until all sides collide with the mesh surface or another box
  4. Repeat 2-3 until you’ve consumed X% of the total volume
  5. Filter small or useless boxes (Unimplemented)
  6. Use a Constructive Solid Geometry (CSG) algorithm to merge the boxes you create

1. Voxelization

surface_solidFirst you have to determine all the voxels completely inside the mesh.  That way we can have complete confidence that anything we generate contained inside these voxels will be conservative.  It also gives us a very easy way of quantifying the total volume and the volume remaining.

The voxelization algorithms I ended up using comes from this paper Fast Parallel Surface and Solid Voxelization on GPUs [Schwarz 2010].  I used the 3.1 section for surface voxelization and the 4.1 section for solid voxelization.  Both are required because a voxel is considered part of the solid volume if the center of the voxel is inside the mesh.  We want the entire voxel to be inside the mesh though.  So you have to remove the excess voxels from the solid set using the surface set.

My first implementation was in C++ and entirely on the CPU which took about 20 seconds to run on a mesh with ~10,000 triangles and a 503 voxel volume.  In the version I’m presenting here I moved to C#, when I did that time went up to about 60 seconds.  So I ended up porting a similar version of the CPU implementation to the GPU using OpenCL (Cloo), on the same data it ran in about 6 seconds on my Nvidia 540M, with no attempt to optimize.

One unfortunate limitation of Schwarz’s solid voxelization algorithm it that it requires watertight meshes.  The reason for this is you’re shooting a ray down a column of voxels until you intersect with a triangle.  You’re then xor’ing the bit for each voxel signaling that this column of voxel’s is inside or outside.  Voxels that are inside a mesh will be Xor’ed an odd number of times, where as outside voxels will be xored an even number of times.  So if there is a hole in the mesh, you’d leave a incomplete trail of xor’s in that column, leaving every voxel from the last triangle to the bounding box marked as ‘inside’.

Two papers I’ve run across that might contain solutions to this problem are Complete Polygonal Scene Voxelization.

2. Find The Highest Density Voxel

In this step you’ll find the voxel that is furthest away from any external area, and excluding any voxel that has already been enclosed by a box in step 3.  Because the number of empty voxels likely far exceeds the number of voxels touching an empty voxel, you’ll probably want to test against that list instead of determining the distance from the closest empty voxel.

For my project this was implemented on the CPU, but this is probably something you could do faster on the GPU if this ever became too slow for your needs.

3. Box Expansion

Once you’ve found the densest voxel you’re going to create a 13 voxel sized box at that location.  You’ll then proceed to iteratively expand each side of the box in voxel space until you can’t expand any side of the box without entering an empty voxel or another box that has already been placed.

As you verify each expansion of the box you’ll mark the enclosed voxels in such a way so that the next time you choose the densest voxel you can exclude any already enclosed in a box.

Currently the expansion order is uniform, however, I suspect a better approach would be some small amount of prediction allowing me to grow one side more than another to maximize the potential volume of each box. 

The best implementation of this AABB expansion methodology would be to use an optimization solver, but that could potentially take quite awhile to run.

I’m still looking for alternatives to the AABB box expansion method of generating the simplified geometry. The axis aligned boxes aren’t the best for models at non-right angles.  Perhaps a better approach could be divined with OBBs but that’s just conjecture.

I briefly considered the Iso-Surface method Xi Wang mentioned in his Automated Level of Detail Generation for HALO: REACH talk at GDC 2011 but that generates a lot of triangles and a volume larger than the voxel volume you start with. So that just gets us right back to the problem we started with.

4. Repeat 2-3

I ended up having two different stopping conditions.  If an absolute percentage of the remaining volume is met I stop adding new boxes.  Alternatively if the last box created consumed too small a percentage of the total volume that also stops the box expansion process.

Some other things I didn’t implement but that could be tried are, setting a maximum number of boxes, or a box too small (in voxel space, instead of percentage) as a cutoff point.

5. Filter Boxes

This portion of the technique I haven’t tried but makes a lot of sense depending on your stopping condition.  If several small boxes are generated before one of the cutoff conditions is met during step 4, you’ll probably want to ignore them before moving to step 6 to reduce your occluder triangle count.

6. Merge Boxes With Constructive Solid Geometry

Drawing individual boxes even as a single draw call would cause lots of overdraw.  Instead we should take our collection of boxes and combine them into a single mesh before we finish.  To do this we’re going to use Constructive Solid Geometry (CSG).

Soon after I realized I was going to need CSG (or something akin to it) to solve this problem I was informed of a new book, Sander van Rossen and Matthew Baranowski.  The implementation is pretty slick and I was able to get it up and running pretty quickly.

The Results: Time-Lapse

The box expansion runs in a fraction of a second, but I slowed it down and broke it up over several frames so that you could see how the occlusion shape evolves.

The Results: Different Models

To get an idea of the results on different models I’ve provided some screenshots.  The settings I used for these was to cutoff after 90% of the volume had been consumed by boxes.

occluder_results_1 occluder_results_3
occluder_results_2 occluder_results_4

Notes

In situations where the technique doesn’t work very well allow an artist to specify a custom mesh.  In some cases an option to use the same mesh for both the visual and the occluder would probably be a good idea (like for planes).

To reduce draw calls you will likely want to generate the occluders during export time.  Then during the cook stage in your level editor, merge some number of the occluders together to reduce draw calls.

Download

works_on_my_machine Voxelization Occlusion Generator.zip
License: MIT

Thanks

Thanks to Stephen Hill for telling me about Rossen / Baranowski’s Constructive Solid Geometry implementation and for giving me feedback on this post.

Hierarchical Z-Buffer Occlusion Culling – Generating Occlusion Volumes


Two small iOS tricks

Original Author: Gustavo Ambrozio

Sorry

Well, I got back from WWDC and there was just too much to do, so I’ve been neglecting my blog a little bit. But since I already missed one post on AltDevBlogADay and today I was about to miss another (3 strikes and I’m out???) I decided to get something quick but maybe useful for all you iOS devs out there.

I’ll share two tricks I recently had to use for Snap. One I learned during one of the labs at WWDC and it’s an old but very hidden trick that’s not covered by NDA so I can share. The other is something that I hacked on my own but got somewhat validated by an Apple engineer I showed to, so not I feel more confident in showing this in public…

First trick

Snap is a camera app and my users were asking me to implement zooming. I studied the API a bit and there was no way to tell the camera to zoom. What I came up with (and Apple’s engineers that work with this API have said it’s the right thing to do) was to change the frame of the camera preview layer so that it would “bleed” out of the view and then give the illusion of zoom. After taking the picture I have to crop but that’s another story.

My problem was that when I changed the frame of the layer, even though I was not applying any animation, the system would animate my change and the interface felt a little weird. It felt like the zoom was “bouncing”. It’s hard to explain but the result was not good and I could not figure out how to remove this animation.

During one the the labs I asked an Apple engineer about this and as he was about to go looking for the answer another attendee from across the table who overheard me said he knew how to do this and very quickly guided us to the documentation where this little trick is hidden.

So, inside the “Temporarily Disabling Layer Actions“:

[CATransaction begin];
 
  [CATransaction setValue:(id)kCFBooleanTrue
 
                   forKey:kCATransactionDisableActions];
 
  // Do what you want with your layer
 
  [CATransaction commit];

So there you have it. Very obscure but it works. You can also change the duration if this is what you want:

[CATransaction begin];
 
  [CATransaction setValue:[NSNumber numberWithFloat:10.0f]
 
                   forKey:kCATransactionAnimationDuration];
 
  // Do whatever you want with your layer
 
  [CATransaction commit];

Second trick

Another problem I faced with Snap was that, even though the saving of the image takes place mostly on the background using block and GCD (more about this on another post…), in order to compose the image I still had to make the user wait. I can do it on the background but that would involve copying a lot of memory and I didn’t want to do this on the iPhone. And it’s fast enough not to be a problem but I didn’t like that the interface froze when I was composing the image with the notes and the user was just staring at an unresponsive device.

So, I decided to use MBProgressHUD to at least show something to the user. My problem was that I had a lot of calls to the method that generates the image and the caller expects to get the UIImage back. As the calls are made on the main run loop and the method takes too long the interface froze and the HUD would not show.

Yes, I could have refactored everything to use GCD and callback blocks but I had to release an update and didn’t have much time. So, I decided to pump the main queue myself:

    // Will return my image here
 
      __block UIImage *img = nil;
 
      // To indicate the work has finished
 
      __block BOOL finished = NO;
 
   
 
      // This will execute on another thread. High priority so it's fast!
 
      dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0), ^{
 
          // Call my very long method and indicated we're finished
 
          img = [[self annotatedImage] retain];
 
          finished = YES;
 
      });
 
   
 
      // This will probably execute even before my image method above
 
      MBProgressHUD *hud = [MBProgressHUD showHUDAddedTo:view animated:YES];
 
      hud.labelText = label;
 
   
 
      // Get the main run loop
 
      NSRunLoop *runLoop = [NSRunLoop currentRunLoop];
 
      // Run until finished
 
      while (!finished) {
 
          [runLoop runUntilDate:[NSDate dateWithTimeIntervalSinceNow:0.01f]];
 
      }
 
      // Hide the HUD
 
      [MBProgressHUD hideHUDForView:view animated:YES];
 
   
 
      // Return my image that is now composed.
 
      return [img autorelease];

Even though it’s kind of an ugly hack it can be used in situations where you’ll really have to make the user wait and a synchronous call on the main thread is what you already have or what it’s faster for you to implement.

I don’t recommend this for every situation. There are situations where this might lead to a deadlock in your app, so test this a lot if you decide to use it. It worked for me. And, as I said, I showed this to an Apple engineer during one of the labs and he said it was a good solution to the problem.

I have a fork of MBProgressHUD and have used these principles to build a category for MBProgressHUD that does this AND can even be cancelled by the user. This version is even hackier so I won’t go into it right now for lack of time but if someone wants to read about it just ask in the comments and I’ll do it.

An afterthought about WWDC

One of the things I learned during last year’s WWDC is that, even though the sessions are great, the labs are even better. And as the sessions are not opened for questions and they’re usually out on iTunes for you to watch less than 2 weeks after the event this year my main priority were the labs. I went to every lab that I could think of and even went to some twice.

So, my advice to any WWDC attendees: Forget the sessions and go to the labs! The sessions you can watch later at home but you only have access to these great engineers that are building the stuff we use for these 5 days, so make the most of this. Even if you have a stupid question, don’t be shy and go to a lab and ask the question. These guys are great and are always willing to help. This is consulting from Apple that is well worth the US$1600 bucks. I dare to even say that 1600 is cheap! (don’t tell Apple though…)

I even bumped into a guy that helped me  last year that remembered me, my problem and tried to help me again this year even though my problem now was not at all related to his expertise. Nice guy. Thanks again Sam. See you next year.

Oh, and have I mentioned that you should get Snap!

I’ll be writing every 15 days here but I try to post at least once a week on visit my blog and subscribe.

Well, that’s it. Sorry for the quick post. I’ll come up with something better next time. And if you have any comments on this post please leave them here and I’ll try to respond and correct whatever you guys come up with.


Micro Talks

Original Author: Paul Evans

This is a quick write up of my thoughts about the micro talk format, its use in Lionhead’s “Creative Day” and the talks organized by Austin City Limits show).

The “Micro Talk” format

Every participant is given ten minutes to talk. A time keeper near the front of the stage flashes up cards when the speaker is near the end of their session so they can wrap up the talk. There is also a longer message for when a speaker goes over that limit.

There are variations where the time limit is less strict.  Sometimes there is a defined period of time for Q&A that may be filled. I recall as part of my Bachelors degree we had strict time limits and a QA for presentations on projects; so those from an academic background may be familiar with this style already.

Ten minutes is time enough to cover a large subject at a very high level or a very specific topic in detail. This forces focus and sets the pace for all the talks.

For the speaker ten minutes should be relatively easy to fill about something they know about. If anything, reducing things down to ten minutes may be an issue.

For the audience ten minute bite-sized chunks of content should be digestible even if it is something individual members have little interest in.

Creative Day 2011 at Lionhead Studios

A similar format was used for Creative Day at Lionhead Studios.  We did not use the term “micro talks” but that is what the format was at the core (though without the very strict time keeping). Everyone who participated was given a short time to introduce their project and demonstrate it. Some ran under and some ran over but every presentation got part or all of the audience interested. Each team was given a couple of days of studio time to complete their own project (and as much of their own time as they wanted of course).

The Lionhead event has been quite well covered in the industry press. The project I worked on was one of the lower profile ones that just gets a mention in Edge. I worked with developer node.js.  Our proof of concept was implemented in a few days.

The demo itself had a few technical problems due to us trying to improvise a wireless network.  It got a bit overwhelmed by a cinema full of developers trying to connect to it with laptops, iPads and mobile phones… but overall the presentation was well received and the multiplayer capability was demonstrated. We kept things going long enough to show Peter drowning little people by lowering some land in to the ocean, which got a good laugh.

It was an emotionally charged experience presenting to all the talent of the studio – especially considering some of the mind blowing things that were demoed before we were up.  Keep in mind that most of the people presenting were not at all used to being in the public eye. In the relatively safe environment of that cinema full of colleagues, everyone shone. Every single presentation had something interesting to give and add to the day. People really got to know each other better that day.

If IGDA Guildford ever gets going again – meetings in this format should happen.  There are enough people working there to support it!

IGDA Austin Micro Talks (2011-06-25)

My first IGDA Austin meeting was in the micro talk format and provided a great introduction for me to some of the people in the chapter.  The talks took place in a television studio and were recorded – so should end up on YouTube at some point in the future.  See the austingamedevs.org website for further details.

John Henderson acted as master of ceremonies, introducing each speaker and announcing breaks between the speakers. At the end of the breaks he took the opportunity to introduce some of the volunteers around the hall that were running the event.

Jon Jones opened up the evening with a talk about making yourself more attractive on the job market by differentiating yourself. His example used a character artist portfolio with a few stereotypes that got laughs from the audience. The drive of the talk was to be a little different by thinking creatively and to actively network. Many of his own career advances had come from networking and he said that he pushed himself to get out there. He made special note to try not and make enemies, as you never know how that will come back to bite you later.

Denis Loubet made a presentation about how much luck there is involved in creating a profitable iPhone game. He made his presentation without aid of slides. The message I took away from that was that it was better to prototype an idea and ship early before investing too much, as you can never be quite sure where lightening will strike. Rovio made a few games before their success with Angry Birds.

Kalev Tait – would have a good talk about their various ideas for balancing.

Fred Schmidt made the case for how awesome Austin is, the history of the area and how the creative industry fuels the economy. There needs to be more engagement across the educational, public and private sectors to better support the many small companies doing amazing things in the area.

Damion Schubert spoke about the power of innovation – where it is useful and when it should be reined in. Innovation can take many forms, from simplification of an idea to something completely new. An innovative game should be explainable in a sentence and have true resonance with the audience. Swing big, make sure the players get to use your innovation and support it through every other element of game play. Remember that an idea is only innovative if it is better, and that it should be less expensive than making a “me-too” product.

“From Startup to Survival” by Quoc Tran was an interesting study in what can make the difference between success and failure in a start up. It boiled down to focus, underdo the competition and release early. Remember that resources are limited and that if cash is a problem, ship sooner. The feedback generated and market position achieved by shipping sooner is more important than a complete feature set.

Carl Canga made a presentation about how important investing for retirement is. Despite the presentation slides not working, Carl went on to make a well rehearsed presentation. It was certainly a sobering talk for me and something I will make a priority as soon as my means allow it.

Conclusion

Presenting and demonstrating things in front of an audience even just for ten minutes may seem daunting but there is no doubt in my mind it is worth it. It is a great introduction to public speaking and one I personally intend to try again. I encourage you to try it out – either at your studio, an IGDA meeting or wherever you can.

The conversations during the breaks at events like these are also invaluable. It is a great way to make friends and eventually introduce people who can help each other out. Get yourself out there!

Never under estimate how important it is to communicate the knowledge you have to other people. What you write or present will never be useful to everyone – but even if you help a few people out, it is worth it.  Why not try writing for #AltDevBlogADay?

About this article

I wrote this in Danni (my wife) made. While writing this I’m listening to a Blue Grass jam session. I’ve only been in the Austin area for a few weeks – and although I am still adjusting to the baking hot weather it has been a good experience so far. I have no doubt that this place is another great creative hub.

I taught my cache to lie

Original Author: Jonathan Adamczewski

One day soon, I hope to finish this PhD thing I’ve had going for a while now. I’ve done some research, I’ve written a thesis (aka dissertation),  I recently received feedback from two examiners, and now have the task of applying their feedback to the thesis text — which after a nine month break seems simultaneously very familiar and very foreign.

As I’ve revisited the text, I’ve noticed a number of things that I wish I’d been able to complete, or to have done differently — which is to be expected, I suppose. There was one part, though, that caught my eye to write about here. It was included in the thesis for that reason — because it was eye-catching (any excuse to take a break from pages and pages of text), but was something of an aside that I didn’t have a chance to look more closely at. I don’t consider it to be particularly profound or world changing, but I do think it’s a neat little trick.

The context

The Cell BE SDK from IBM included a Julia set raytracer that made use of a software cache — you can read about the program in an early SDK Installation and User’s Guide (pdf, see the last four pages).

So, that program produces images that look like this (small version — click for full size):

It’s a blobby thing in a box!

To quote from the above-linked document:

[there are] five cubemap texture lookup passes – 3 refraction lookups, a reflection lookup, plus a background lookup.

and it is these texture lookups that make use of the software cache.

The goal

When you’re implementing a cache, it’s good to know how good a job it’s doing. You can count hits and misses and other things, and build a wide range of statistics. That’s fine. And it so happened that I was implementing a cache(-like thing).

The problem is that you end up with broad averages that convey no specific information about how the cache is behaving in particular parts of the program. Also, the tables of numbers you get back are hideously dull to look at.

Wouldn’t it be nice to see the cache performance somehow? To be able to visualise which texture lookups were hits, and which were misses?

The trick

I taught my cache to lie.

When a request for some texture data is received, the cache handles the request as it normally would — fetching the data from main memory if it needs fetching, or just locating it in the SPE’s local store. Then — and here’s the lie — rather than returning the data was asked for, the cache returns either black if the access was a hit or white if there was a miss. The results look something like this (again, clicky for big version):

It’s a stripy blobby thing on stripes!

And it reveals various things about the cache and the rendering algorithm. You can see how texture colours are processed for each of the passes. The texture data is tiled, and you can probably work out the tile sizes and cache line size from the picture if you really wanted to. You can see how well the cache performs in different parts of the image, and plenty of other details.

The interesting task from here would be (perhaps) to look for clues to writing a better cache. Or perhaps improving the algorithm so you don’t need a generalised cache :P) There’s plenty of other things you could do to convey more information about the operation of the cache as well.

The end

To be honest, when I first got it working I didn’t understand much of what I could see. It was only after digging through the code that I gained a clearer understanding of what was happening, and what these pixels actually meant. I made it as far as understanding a lot of the Why? questions that the image evokes, but — for various reasons — didn’t get to working out how to apply that understanding.

Regardless of its usefulness, I think this was a neat little trick 🙂


A Word of Warning!

Original Author: Michael A. Carr-Robb-John

Somewhere in the world, there are a group of individuals that have dedicated a great deal of their professional careers to building the basic tools we use every day, specifically in this instance I’m thinking of the compilers and linkers we use to build our games. I couldn’t even guess how many man hours have gone into their research and development, certainly far more than I have actually spent writing an individual game.

Given all this time and effort is it really the wisest course of action to globally turn off the warnings these tools generate? The key word there in-case you missed it was “globally”, sometimes I admit there are very valid reasons for turning off warnings but they should be local to the issue and well documented, not just globally disabled.

The reason I’m getting on my soap box about this is that I wrote a beautiful function not too long ago that made the morning dew on flowers glisten in the sunshine… okay there wasn’t much sunshine and it didn’t involve flowers and there wasn’t much glinting. The actual function itself is not important, I wrote a function, compiled the code and when I ran the function it didn’t do what I was expecting. Okay debug time, 20 minutes later I track the issue and it was a bug in my code, I hold my hand up to that. What’s annoying me is that there is a compiler warning that would’ve pointed me to the issue within moments of hitting the compile button, if it had been enabled!

Of course when you discover that an important warning has been globally disabled, you do have to dig a little deeper to see what else is globally disabled. What followed was a stream of “What!”, “You’re kidding!” and “Who did that!” as I discovered other warnings that had been disabled globally.

It is so easy over the lifetime of a code base for it to accumulate issues and patch-ups but disabling warnings globally does nothing but cripple our ability to write and maintain solid code. If you have got this far in my rant, then might I recommend doing a search in your projects for “#pragma warning” and seeing what turns up. It would be interesting to hear what you discover.

Now that I’ve got that off my chest, lunch anyone?

Michael

Working For Myself – First Two Months

Original Author: Keith Judge

Lionhead Studios – in some ways it feels like it was a previous life, in others like it was yesterday. I thought I’d write a post-mortem style post about my experiences so far.

What Went Right

  • Productivity – When I began, I was excited and determined to write a game on my own. I was also somewhat apprehensive about having the mental discipline to remain productive on my own. A common reaction from people when I told them about my new studio was to ask how I was going to get any work done at home with a family. So far, I’m happy to report that those fears were unfounded – or at least so far, my enthusiasm has overridden my natural laziness and the kids have proven less of a distraction than expected. We’ll see if this extends to shipping a game, though I’m quite optimistic at present.
  • Learning – Even with a decade’s experience, I would never claim to know everything (or even a substantial portion) of the knowledge required to build a game. In the last two months my learning has accelerated. Subjects such as physically based lighting, better knowledge of rigid body physics, DirectX 11 and a smattering of the Win32 API have since entered my lexicon.
  • Great Free Software – I’ve been able to build my game at an accelerated rate due to some very handy open source code that is available at no cost. Blender, etc have all saved me significant chunks of time. When I finish my game I may be able to donate code or money (if I make any) back to the projects.
  • DirectX 11 – It may be seen as a gamble to limit myself to a smaller portion of PC gamers (though not that small according to the monthly Valve Hardware Survey), but supporting only Windows Vista/7 has made development of the game engine a lot simpler. This is because I haven’t had to rely on any vendor specific DirectX 9 driver hacks (INTZ, RAWZ, instancing, etc) in order to make things work, plus DirectX 11 has a much better debug layer and useful new features. Although I’m developing my engine with only the GeForce 8800 GTX I have at my disposal, I hope there will only be minor issues in making the game work with any other DirectX 10 or 11 hardware – and there’s simply a smaller set of hardware to test which helps with my limited resources. Also, the knowledge gained in using DirectX 11 will be useful if I need to do any contracting to top up the coffers. If nothing else, this technology also gives me something to differentiate my game from most other indie games.
  • Finding Free Days Out At The Weekend – I’ve been keeping my weekends free so far to spend with the family rather than with my PC. My wife and I became National Trust members a few months ago, so there’s plenty of free (except for fuelling the car) days out locally that we can take the family to at the weekends. For those of you not in the UK, the National Trust is a large organisation which looks after lots of old houses, gardens, monuments, castles, palaces, etc around the country, and a large chunk of those are in or near Surrey, where I happen to live.
  • #guessmycompanyname – I ran a little Twitter/Facebook game in May where I invited people to guess the name of my new company, and posted periodic clues. I offered a copy of my first game to the first person to guess correctly. The game lasted a few hours, generating lots of Twitter messages (though not much Facebook activity) before two people correctly guessed “Razorblade Games” at the same time – I’ll be giving them both the prize when the time comes. At that point I turned on the company website, Twitter and Facebook accounts for business. After an initial flurry of interest things have waned which is no surprise since I’ve not announced my game yet. I was a little disappointed none of my twitter followers who work in the press reported on the new studio, but that would have been a bonus as that wasn’t the objective – I plan to start talking to the press when the game is demo-able and I have screenshots/video to back it up.
  • More Time With The Family – I’m almost always home for mealtimes, am keeping weekends free and have time to play with the children every day. As a result I feel as though I’ve grown closer to my family.
  • I’m Generally Happier – I’m on the way to achieving a lifetime ambition and this is a great boost for me. I feel less stressed about work, there’s no office politics or bureaucracy to deal with, I can work more or less when I feel like it (exceptions below!), My wife often comments that I’m happier now than I’ve been for years.

What Went Wrong

  • Very Little Social Life – My social life has declined massively since working for myself – mostly down to the lack of money, but also because there’s no pub to nip into on the way home from work. I’ve been to a couple of parties since and had the occasional beer or five with friends, though I can quite easily go an entire week without leaving the house or having a face-to-face conversation with an adult other than my wife. If we can solve the money issue (detailed below), this problem may fix itself.
  • No Work Colleagues – Related to social life, I miss working with my friends and having people to chat about what I’m actually working on. My wife will listen to me, but she doesn’t understand the details of normalised specular, or rigid body interpenetration so it’s harder to have any work conversations. Twitter and MSN are proving to be a substitute of sorts, but it’s not really a replacement for a good old chinwag. In future I may collaborate with others, but I want to at least get one game done on my own. I’m also going to the Indie Day of the Develop Conference in Brighton in July (and hopefully to other, future meet-ups), so that will be a great opportunity to chat with other game developers in person.
  • Sharing the Office With My Wife – My wife also does some work from home and we started out time-sharing the office, but that turned out not to be practical. Currently, my wife’s PC is on the dining table downstairs so we can both work in parallel in the evenings, but this isn’t a permanent solution and we’ll have to fork out for a desk for her soon and move some stuff around to fit it in. She’s a little annoyed I’ve essentially booted her out of what used to be her office, but she understands the necessity.
  • Looking After The Kids Is Time Not Making My Game – When my wife does her contracting work away from home, I have to look after the children as paid childcare is out of our budget and neither my mum or her parents live nearby. This has occasionally meant 2-3 frustrating days at a time where I’m unable to work (the kids’ afternoon nap times excluded). There’s not much we can do about this though, my wife’s contracting work brings in essential funds, so we’re just going to have to grin and bear it for the time being.
  • Game Hint On The Website Is Too Subtle – On the company website there’s a little square to the left of the logo that is slightly darker than the background. Click on it, and the website’s colour scheme will reverse and the text will change slightly. Now there’s another square slightly darker than the background to the right of the logo, click this and things will go back to how they were. This is actually a hint about the game I’m building and I was hoping this would generate some chatter and interest, but this seems to have not been the case. I don’t think anyone has found it without being told there’s something there to find (and those people found it by looking at the source code, not finding it naturally). However, this sort of ARG style thing is something I’m interested in playing with further when it comes to promoting my game, so keep your eyes peeled.

What Isn’t Decided Yet

  • Money – I’m working entirely self funded (more accurately wife funded), but money is quite tight. As I said above, my social life has suffered as a result of this, but we’ve been getting by. Unfortunately, there’s no guarantee my game will make any money, even if it turns out as good as it seems in my head. I’m probably going to have to do some contracting work before too long to top up the coffers (open to offers, hint hint) or try to obtain some outside financing (again, I’m open to any offers!).
  • The Game – My game isn’t finished yet. Although it seems brilliant in my head, I worry about whether it will be any good to the extent that it sometimes keeps me up at night. Perhaps this is the life of a tortured creator, but it hasn’t put me off – far from it!

Overall I’d say the good stuff vastly outweighs the bad, and I’m raring to go for another two months and beyond.