What is gimbal lock and why do we still have to worry about it?

Original Author: Niklas Frykholm

Technology/ Code /

If you have ever worked with rotations and Euler angles you are probably at least somewhat familiar with the phrase “gimbal lock”. But like many things concerning rotations, angles and spaces it can be tricky to visualize and get a good grasp of.

Sometimes it feels like every time I need to think about gimbal lock I have forgotten everything about it and have to go back the beginning and ask myself: OK, but what is it really that is happening?

Hopefully, this article will take care of that problem.

The Wikipedia page shows how gimbal lock can happen in a mechanical system. But it isn’t necessarily self-evident how this translates to the computer game world. In the computer there are no mechanical limitations, we can rotate an object however we like. How can anything be “locked”?

Euler angles

When we are using Euler angles, we represent an object’s orientation as three consecutive rotations around the object’s axes. We can choose the axes and the order in which we apply the rotations arbitrarily, and depending on what we choose we get different Euler representations. So XYZ is the Euler representation where the first angle rotates the object around its X-axis, the second around its (new) Y-axis and the third around its (new) Z-axis. YZX gives us a different representation. We can even have representations with repeated axes, such as XZX.

So if we want to talk about the “Euler angles” of an object, we really must also talk about what axes we are rotating around and in what order. Otherwise we have no idea at all what we are talking about. Unfortunately, many articles about Euler angles are pretty sloppy with this and throw around terms like yaw, pitch and roll as if they had completely well-defined and unambiguous meanings. I prefer to use more wordy, but descriptive names, such as euler_xyz[0] that unambiguously state the axis rotation order and the index of the angle we are talking about.

An object has three rotational degrees of freedom and it is quite easy to see that the three Euler angles for a particular axis order (XYZ) are enough to define any possible orientation of an object. Note though that the representation is not unique. There are many possible Euler angles that represent the same orientation. For example, adding 360 degrees to any of the three angles will give us a different representation that results in the same object orientation.

So “gimbal lock” doesn’t mean that there are rotations that can’t be expressed as Euler angles. We can express any rotation in Euler angle form. Given an object, we can convert its orientation to Euler angles, and from that orientation we can rotate the object however we like and convert the new orientation to other Euler angles.

So what exactly is it that is “locked”? It seems we can do whatever we like.

The “lock” in gimbal lock

The term “gimbal lock” comes from the mechanical world. If the problem had originated in the world of computers, it would probably have been called something less confusing, such as “Euler angle flip” or “coordinate singularity”.

Because, in the computer world, there is really nothing that gets “locked”. Instead, the problem is this: when the Euler angles have particular values, there are orientations that are very similar to the current orientation which can’t be achieved by just making small changes to the Euler angles. In particular, this happens when one of the angles is at 90 degrees, so that two rotation axes coincide.

So even though the orientations are “close” in the real world, they are not close in the Euler representation. In fact, at least one of the Euler angles will have to flip 180 degrees in order for us to represent the new orientation.

So one of the angles have to flip? What is the big deal? Can’t we just flip it and get on with our stuff?

We can, as long as the angles only represent instantaneous “snapshots” of the object’s orientation. However, if the angles represent key frames in an animation and we want to interpolate between those key frames we run into trouble. If one of the angles flips 180 degrees between two key frames and we interpolate between those values, we will see the the object animating through all those 180 degrees. In the viewport, we will see the object doing a “flip” or “roll” that shouldn’t be there.

Note that it is only the interpolation that creates this unwanted behavior. If we just displayed the actual key frames and didn’t interpolate between them — everything would look right. We could work in Euler angles as much as we liked and be as close to the gimbal lock position as we wanted and no-one would ever know.

So the only thing we need to fix to get rid of gimbal lock is the interpolation. If you have done any work with 3D graphics you probably already know the answer — to use quaternions instead of Euler angles to represent angles. Quaternions don’t have the weird singularity points that Euler angles have and we can interpolate between any keyframes by just lerping the quaternions.

It doesn’t matter if the animation package is using Euler angles internally, as long as we convert everything to quaternions before we do the interpolation. Note that interpolation in quaternion space is not the same as interpolation in Euler space though, so to get as close as possible to what the animator intended, we probably want to sample the animation at our target frame rate and generate our quaternion key frames from those samples, rather than directly converting the animator’s key frames (which may be further apart).

Well, there is one caveat actually. If we have more than 180 degrees of rotation in a single frame we can’t represent that nicely with quaternions. Quaternions always lerp the shortest path between two orientations and you can’t represent several “laps” of rotation with quaternions as you can do with Euler angles (by setting one of the angles to 9000 degrees, for example). But you can fix that by sampling at a higher frame rate if you need to represent really fast rotations with quaternions.

So with that we can say good bye and good riddance to Euler angles and never have to worry about their sorry gimbal locking asses ever again.

Or so you may think…

The return of gimbal lock

I certainly thought so, until I started working on the new cutscene animation system for our level editor.

You see, animators really like to work with curves. They like to see a visual representation of what the animation will do to an object over time with key points that can be moved and handles that can be adjusted to change the slope of the curve.

Curves! Animators love them!

Quaternions are great for interpolation, but they are no good for curve editing.

Sure, you could probably draw some curves that represented a quaternion (the laziest thing would be to just draw the x, y and z components of the quaternion), but those curves wouldn’t mean anything to an animator, the way the Euler angle curves do. They wouldn’t be able to do anything with them.

So, animators want curves with keyframe interpolation. Curves need Euler angles. But what happens when we mix Euler angles with keyframe interpolation? Presto! Our old friend the gimbal lock is back again! Haven’t we missed him.

That’s it. We’re stuck. Gimbal lock is here to stay and the animators will just have to work around it.

And we have to add support for all the usual tricks and workarounds that animators use to get around gimbal lock, such as changing the axis order (from XYZ, to XYZ, XZX or another of the twelve possible permutations), converting to quaternion and back again, applying an “Euler filter”, etc.

But who said this game engine gig should be easy?

This has also been posted to The Bitsquid blog.

Launch Day 2.0

Original Author: Kyle-Kulyk

Vex Blocks. Despite the weeks we’ve had the game in the hands of testers, a new bug had come to light late Saturday evening. While not game breaking it still needed to be addressed. I stayed up working on the problem until solved. Chat messages, frantic on my part, went back and forth between myself and my partner until between the two of us we were able smooth things out. I then prepped the build for the next morning’s launch and finally after a long day allowed myself to push back away from my desk just after 1am. I tried and failed to sleep after that.

The late night adrenalin rush of fixing a bug on the eve of our self-imposed deadline had fired up my brain and my thoughts erratically leap from point to point like popcorn popping while anxiety knotted up in my stomach and settled in for a winter’s nap. I stared at my bedside table for an undetermined length of time before I eventually drifted off into a fitful sleep.

At some point during the night my wife woke up to comfort our baby girl, Hanna and had fallen asleep in her room. My day started when my wife came back to bed and informed me that our 3-year-old, Jake, was awake and calling for Daddy. I squinted, noticed the sun was up so there was no chance at another hour of sleep and rolled out of bed just as Jake hit the living room and started honking the horn of an obnoxiously noisy pink riding car meant for his sister. I settled him down with a cup of a fruit/veggie juice I have yet the had the courage to taste and turned on “The Cat in the Hat knows a lot about that” before heading to my office to launch our second title.

Launch day this time was different. With our Vex on other platforms, in other app stores. There’s another team working on our third title that needs checking in on. There’s communications that have been piling up, unanswered while we pushed to complete. The gaping maw of uncertainty that greeted me after the first game released didn’t exist this time out. I know what needs to be done and we can’t afford to slow down.

So while the Cat in the Hat explains to my 3 year old the benefits to singing underwater with a whale I sat down to release our new game. The sun tries to poke through the clouds outside my office window but it just can’t decide whether it wants to clear up or add to the snow on my back deck. I noticed the time and remembered that Daylight Savings had struck again. I could almost feel that hour suddenly sucked away from me like a voodoo spell setting in as I lamented the time shift. I shook the sudden tiredness off and a quick check of my emails included a brief message from our other developer, my brother-in-law Will. No doubt he had a late night as well. If he wasn’t awake due to work, I’m sure his sleep would have been interrupted by their first child only weeks old, waking and bellowing his displeasure. I took the few cosmetic changes to Vex Blocks that Will had pushed up to our server, installed the new build and then uploaded the apk file to Google Play. Then I hit “Publish”.

That was it. I leaned back in my chair and despite being told by an online message that it could take hours to populate to Google’s servers, I punched in the url just to see if it was visible. It wasn’t. I went through my email and then checked a few links to press releases that were scheduled to go live that morning to herald the Vex Blocks release. I clicked refresh on the Google Play site again. Still nothing. I flipped back and forth between press releases, finding the announcements somehow comforting and real, like the launch of the game after ten months wasn’t just a dream. I’d then refresh the Google Play link. I don’t know how many times I did this before the baby monitor in the living room announced that Hanna was awake. I fired off a message to Will for whenever his sleep deprived old bones made his way into his office that morning, informing him that the deed was done.

The links to the game on our Facebook and web pages were live. The press releases were out. The game was published. I went to rescue my daughter from the confines of her crib. She smiled her huge, baby smile when I entered her room, just like she always does and I took her into the living room where she proceeded to make short work of some Cheerios I placed out on the coffee table for her to keep her occupied. My son jumped up on the couch beside me and we talked and watched TV, letting my wife sleep in a bit. Little Einsteins was on now. The rocket ship apparently runs off leg pats, so we patted our laps faster and faster to rev up Rocket as we were instructed, raised our hands as high as we could and announced together “Blast off!” Hanna turned from where she was standing, leaning against our ottoman, swaying a bit on her unsteady little legs. She giggled at us.

It was a good day.

30 Weeks of Game Development

Original Author: Forrest Smith

Outland Games is an iOS game I’ve been working on at Uber Entertainment since late last summer. We just released the game to the app store this week (iTunes – $0.99). This post is the first in a short series on it’s development.

Evolution

On every project I’ve ever shipped I wished I could have chronicled the game’s production from start to finish. It’s a really interesting process that the public never gets to see.

The scale and starting point of Outland Games enabled us to record progress in a way that I’ve never been able to do before. We used fraps to capture a video at the end of almost every week of development [1]. Eka took those videos and edited them into the following video.

We also took that video and compressed it to a single image.

That’s what game development looks like – warts and all. I’m not sure I’ve ever seen a video put together like this one. Certainly nothing so raw.

Scope

The team behind Outland Games was small and consisted of just five primary team members.

Forrest Smith – coding/design – 7 months
Chandana Ekanayake – art/ui/design/trailer – 4 months
Aung Zaw Oo – animation/modeling – 3 months
Ben Golus – effects/misc awesome – 3 months
Howard Mostrom – audio – 2 months

Unity3d is all the rage these days but this game was made using a custom engine. It’s actually the same codebase as Planetary Annihilation. I’ll cover that in more detail in another post.

The game never had any official milestones. I think it can be broken down into the following stages.

Prototype (weeks 1-6)

When we started the goal was to create an endless runner using internal tech and the Monday Night Combat IP. What that meant precisely we didn’t know.

After implementing and playing with various jump mechanics we settled into the double jump you see in the final version. The remarkably basic grey and red block based gameplay from the very first week was actually pretty fun!

We also tested various art creation strategies. Our initial plan for character animation was hand drawn sprite flipbooks. This idea was quickly scrapped in favor of a 3dsmax skeleton with attached sprites. (Which would later become full 3d meshes.)

Much experimentation was done in regards to physics, dash mechanics, level design, and art style. Howard wanted audio hook ups from week 1 and insisted it wasn’t a “real game” until it had sound. Unfortunately sound kept getting pushed back and didn’t get configured until the end of the prototype phase. Sorry Howard!

Europe (weeks 7-8)

During September I went to Germany and Amsterdam on a two week trip. At this point I was still the only full time member on Outland Games so development was temporarily stalled. Beer consumption levels however reached an all-time high.

Beerfest in Munchen, Germany-Beer festival tourism destinations

Tech (weeks 9-13)

Because we used early stage internal tech there were a lot of feature gaps to fill. Adding build tool support for iOS. Fleshing out the bare bones UI system. Defining and/or implementing export/import processes for compressed textures, meshes, and animations. Plus various game specific systems.

Much of this work didn’t actually show up on screen. It’s easy to work for weeks on systems that are invisible to the eye. This can be super frustrating at times. Particularly when attempting to record weekly videos.

Content (weeks 14-25)

The basic components of functionality were finally in place at the start of week 14 despite the game looking quite meager. If you exclude the non-productive Thanksgiving week and out town Christmas week the content phase was ten weeks.

By week 25 the game had two environments, animated enemies, coins, pickups, profile save/load, upgrade progression, full UI including store, layered music, voice overs, and the whole shebang. Although not quite ready for primetime the game was functional and, more importantly, fun.

Ship (weeks 26-30)

The final push was focused on one clear objective – shipping.

I was tempted to call this phase polish but that’s not quite right. A lot of polish did go in. Final animations. Carefully tuned audio. Continuously tweaked level design. But too many features weren’t finished to call it just polish. GameCenter leaderboards and app store IAP weren’t even started until the very end. Even the in-game intro cinematic was a last weekend addition.

After three submissions the game was finally approved. We gave a great sigh of relief and consumed much scotch.

photo 3

History Keeping

The hardest part of capturing footage is having the willpower to record every week. It’s easy to push back the capture because a major feature is almost done. Game dev is often an emotional rollercoaster and when you’re in a low it’s hard to muster the motivation to click record.

I actually stopped making the videos around week 16. Every week felt like too little progress was being made to justify a video. After going gold and watching the early videos I was hit with a wave of inspiration so I synced perforce week by week to re-build and capture [2].

In hindsight progress was much more visible than I realized! It’s easy to feel perpetually “one month away” which it turn feels like no progress is being made. The truth is that you are getting a lot done even if it’s taking longer than initially planned [3].

All I can say is to stick with it. Capture every single week. Even if you are almost done with some cool new thing. When the project is shipped and you look back you may be surprised at how much was getting done.

Conclusion

That’s a brief review of the development of Outland Games. We’re super happy with the final product and thrilled to finally release an evolution video. It’s a unique insight to the dev process that I hope folks find informative and entertaining.

Footnotes

[1] Despite being an iOS game 95% of development was done on PC.

[2] synced? sync’d? sunk? sank? These all feel wrong.

[3] Game dev always takes longer than initially planned.

A Bug in Object Replication and Message Reordering

Original Author: Niklas Frykholm

The Bitsquid network system supports a peer-to-peer model with migration of network objects — i.e., changing the owner of a network object from one peer to another. This recently lead to an rare race condition.

To understand this bug you must first understand a little bit about how our network system works.

Background

The entire network model is based on an packet delivery system (on top of UDP) that provides ACKs for unreliable packets as well as a reliable (and ordered) packet stream between any two network endpoints. At the next layer we have implemented a remote-procedure-call service for Lua as well as an object replication system.

Games can use these services however they like, but our recommendation is to do as much as possible with the object replication system and as little as possible with RPC calls, since using explicit RPC messages tends to require more bandwidth and be more error prone.

The network be run in both client-server and peer-to-peer mode. The only difference is that in client-server mode, the server relays all messages (clients never talk directly to each other) and owns most of the network objects. (Clients can own objects in client-server mode, in that case the changes to the objects are relayed by the server.)

Objects are replicated with a message stream that looks something like this:

A: CREATE [wait for ack] UPDATE_1 UPDATE_2 ... UPDATE_n DESTROY

Here, A (the owner of the object) first sends a reliable message that creates the object. When it has received an ACK for that message, it starts sending updates, informing the other players about changes to the object. (By monitoring ACKs, it knows which changes the other players have received, so it only sends updates when necessary and it will resend updates if the messages are lost.) Finally, at some future point, the object is destroyed, through another reliable message.

The UPDATE messages are sent on the unreliable stream (for maximum performance), so they can potentially arrive before CREATE or after DELETE. But this is not a problem, because we simply ignore UPDATE messages that arrive out of order.

This approach ensures that everybody that participates in the game session will see the same set of objects with the same properties (at least eventually, keeping in mind that messages can be delayed).

Migration

Migration complicates this picture somewhat.

Migrating a network object means changing the owner of the object from one peer to another. There are a number of reasons why you might want to do that. First, if a player drops out of the game, the objects owned by that player may need to be taken over by somebody else. Second, in a peer-to-peer game we may want to load balance, so that each peer is managing about the same amount of objects. Finally, sometimes a particular player is interacting directly with a particular object (picking up a rock, etc). It can then be beneficial to make that player owner of the object, so that the interaction is not affected by network latency.

In our network, migration is implemented with a reliable MIGRATION message that tells everybody in the session about the object’s new owner. The migration message is always sent by a special peer, the HOST of the game session. (To ensure that peers do not compete for the ownership of an object.)

So if we look at a message stream with migration involved, it looks something like this:

   A:  C Ua Ua Ua Ua Ua
 
  HOST:                    M_ab
 
     B:                          Ub Ub Ub Ub Ub Ub Ub Ub D

If you are an experienced network programmer you should start to smell trouble at this point.

The problem is that while the message system provides an ordered stream of messages between any two endpoints, there is no ordering of messages between different endpoints.

Consider an additional network peer X. There is an ordered stream of messages A → X. There is also an ordered stream of messages B → X. But there is no guaranteed ordering between the messages sent from A and the messages sent from B and HOST. So, suppose the messages from A → X are delayed. Then X could see the following message stream:

M_ab Ub Ub Ub D C Ua Ua Ua

So X gets a request to migrate the object before it has been created. And the creation message arrives after DELETE. In other words, a complete mess.

To be sure, this only happens if the object gets migrated really close to being created or deleted and if there are asymmetric network delays on top of that. But of course, it always happens to someone.

The Fix

There are many possible ways of fixing this. Here are some:

  • We could impose a global message ordering. We could make sure that the reliable message streams are globally ordered to prevent “paradoxes” of this kind. I.e., if HOST sends M_ab after receiving C, no peer should receive M_ab before C. Unfortunately, this is not as easy as it sounds. For example, what if A dies before it has sent C to X? In that case, that failed delivery will also block the channels HOST → X and B → X, since they are not allowed to deliver any messages before X has received C.

  • We could use a migration handshake. We could do some kind of handshake procedure to make sure that everybody has received M_ab, before B takes over ownership. But this would require a lot of extra messages and temporarily put the object in limbo.

  • We could fix the ACKs. We could make it so that X doesn’t ACK M_ab until C has arrived, thus forcing HOST to keep resending it, until we are ready to receive it. This would work, but would require us to implement ACKing of individual messages. Currently, we just ACK an entire UDP packet (containing many messages) on reception, which is simpler and more performant.

  • We could create an internal message queue. We could queue up migration, create and delete messages in some sort of internal queue if they arrive out of order and try to fix things up later. This is a truly horrible “solution” that increases code complexity and is likely to cause lots of confusing bugs in the future.

All these solutions are probably workable, but they all have the drawback of increasing complexity. And I really don’t like to increase the complexity of network code. Reasoning about network code is hard enough as it is, we should always strive for the simplest solution possible.

So, instead, the first thing I did was to simplify the problem by eliminating the host from the equation. I simply let the new owner send out the migration message instead of the host:

   A:  C Ua Ua Ua Ua Ua
 
     B:                    M_ab Ub Ub Ub Ub Ub Ub Ub Ub D

This is already a lot better. Now we only have two parties to worry about (apart from X), instead of three.

We still want the host to be in charge of migration. Otherwise we run into tricky problems of what should happen if several peers try to assume ownership of an object at the same time. So we let the host initiate the migration by sending a message to the new owner (B). Then, B is responsible for notifying everybody else about this.

With this approach, we can use the same “wait for ack” trick that we used during creation to make sure that B doesn’t send any updates to peers that haven’t acked the migration:

   A:  C [wait] Ua Ua Ua Ua Ua
 
     B:                            M_ab [wait] Ub Ub Ub Ub Ub Ub Ub Ub D

We still haven’t completely solved the problem, X can still see weird message orderings such as:

M_ab   C   D
 
  M_ab   D   C

But this won’t be a problem as long as we do two things:

  • We let MIGRATE act as a CREATE message, if we get MIGRATE for an object that doesn’t exist.

  • We ignore “old” CREATE messages. (The C that arrives after M.)

To be able to distinguish old messages I introduced a migration counter. This is just a number that starts at zero when the object is created and is increased (by HOST) every time the object is migrated.

We tag all CREATE, DESTROY and MIGRATE messages with the migration counter and simply ignore “old” messages. With this approach, the message streams will look like this:

   A:  C_0 [wait] Ua Ua Ua Ua Ua
 
     B:                             M_ab_1 [wait] Ub Ub Ub Ub Ub Ub Ub Ub D_1

We can now verify that all possible message orderings that X can see work correctly:

C_0      M_ab_1  D_1  -- ok, the expected order
 
  M_ab_1   C_0     D_1  -- ok, M_ab_1 creates the object with migration counter 1 and C_0 is ignored
 
  M_ab_1   D_1     C_0  -- ok, M_ab_1 creates the object with migration counter 1 and C_0 is ignored

The system works equally well if there are multiple migration steps:

   A:  C_0 [wait] Ua Ua 
 
     B:                   M_ab_1 [wait] Ub Ub Ub
 
     C:                                            M_bc_2 [wait] Uc Uc Uc D_2

No matter in which order the messages arrive we will end up in the correct state.

This has also been posted to The Bitsquid blog.