How we manage the virtual team

Original Author: Kyle-Kulyk

Email communication only can sometimes lead to unintended consequences

Itzy Interactive is, at its core, three primary individuals working from their home offices.  We’ve found it useful, and cost effective, to contract out specific work around the world but the main group consists of three.  Myself, the ex-financial guy, Will, my brother-in-law and friend of almost 20 years who’s been programming the last 15 years and Cole, a 20 something programmer and designer I met and worked with while re-training for a career outside of the brokerage industry after losing my job with an economy in free fall.  You wouldn’t think that managing a three person team would present many problems, but without the financing for a centralised studio, the challenges presented managing a team working from their homes using email as their primary communication became apparent quite quickly.

Managing our projects this way produced unique problems.  During development of our first mobile title, Itzy3d, the main issue was one of communication.  From the start we were using Mercurial for our version control while we managed our project using Acunote as our project management software.  This enabled us to keep track of our task lists while letting each member know what the other members of the team were currently working on in hopes that we didn’t trip over each other or duplicate work.  The thought was that by using version control and tracking/reporting our progress would keep everyone moving forward together.

The problems we ran into were decidedly human problems.  They were issues of miscommunication that quickly lead to bruised egos.  As all of us can attest to, email is a limited means of communication and the longer we went without speaking to each other, the more issues would simmer.  We quickly came to find that using only email was detrimental to the well-being of the team.  Often recommendations were taken as criticism, omissions made team members feel like their comments were being ignored and issues that seemed straight forward to one team member would be misinterpreted by another.  Added into the mix, the delay between emails due to work load or different working hours often left comments to fester in the minds of team members to the point where they would become blown out of proportion by the time clarifying emails arrived.  As much as we’d like to think we were all professional enough to address these types of issues rationally, the reality was steps needed to be taken to mitigate problems before they grew into more than they needed to be.

Luckily we were always able to address the challenges as they arose, usually by initiating a group voice chat.  This helped us maintain our team dynamic but we certainly couldn’t spend an hour each day chatting about what we were currently working on to make sure everyone was apprised to limit confusion.  There had to be some sort of balance.  While working on Itzy3d we decided that to make sure everyone was moving forward in the same direction we would endeavor to meet, face to face at least every month and a half.  These meetings were always informal and we would alternate the location.  What we found was the meetings enabled us to not only make sure that we were all on the same page on the project, but they also served as a useful break from the monotony of the solitary coding existence we had condemned ourselves to during our regular working week while helping us become more at ease with each other.  In the same room we could work on our “common ground.”

 

Just a regular, weekly scrum

Now as we’ve recognized the benefits of these types of face to face meetings, we’ve taken steps to have more regular meetings without taking the day off required to meet each other physically.  Starting with the development of our new title, Vex Blocks, we’ve mandated a regularly scheduled, weekly meeting via webcam using Google Hangouts.  During these scrum meetings, despite continuing to track our tasks and progress using Jira for our project management software, we still take the time to recap our weekly progress and outline what we plan on working on in the coming week.  Then we take time for a quick brain storming session to share ideas we had during the week.  While many may balk at the thought of using video in these chats due to an aversion to video in general, we’ve found team members are more likely to pay attention to what’s being said when you can see that everyone is paying attention.

So far this seems to have completely eliminated the type of conflicts we experienced during the development of Itzy3d.  We still try to have our physical meetings due to the much needed break it affords, but the weekly scrum meetings have proven invaluable in keeping everyone on track while helping build upon our existing team dynamic.


The Depth Jam

Original Author: Jonathan Blow

Last week, I got together with three other designers for a four-day intensive design retreat known as the Depth Jam. The other attendees were: Daniel Benmergui of Storyteller.

This event was an experiment. Chris and I had been discussing the idea for a while, but it took us a couple of years to get around to it. One of my personal goals in setting up this event was to find a new way to stimulate my professional development, because the old ways were not cutting it any more.

The Depth Jam was designed in reaction to shortcomings of other game-related events. In order to explain the design choices behind the Depth Jam, I will speak critically of these other events, in order to highlight the problems that the Depth Jam is meant to address. If you are a fan of these events, organize some of them, or otherwise identify closely with them, then this will be uncomfortable. The best I can do here is to assure you that this isn’t attack-style negativity; it is criticism that comes from years of carefully considering these situations and thinking hard about how to make things better.

Conferences

When I first started working in video games I learned a lot from conferences and lectures. The few days I spent at the Computer Game Developers’ Conference in 1996 were eye-opening, even though I wasn’t comfortable enough as a game developer to know how to make effective use of that time. As years go by and we get better at what we do, a natural shift occurs: in the beginning, we are mostly deriving benefit from other attendees and presenters; later on, we are mostly providing benefit to other attendees, getting little out of it ourselves. I have been to the Game Developers Conference 17 times now. I find that I still do get something from attending, but it really takes a lot of work and there are very few people I can expect to learn from.

All the smart programmers I know complain about conferences and consider them basically useless (except for the smart programmers who are also on conference advisory boards). I certainly developed this kind of frustrated “there’s nothing good here to see” attitude in the early 2000s, but to mitigate this situation I shifted into being much more of a conference presenter than an attendee. A lot of creative energy went into planning new conference sessions and making them good. This helped extend the useful life of conferences, because I was learning a great deal by running sessions. After about eight years, though, this ran its course and I had gotten the bulk of what I was going to get from this arrangement.

Game Jams

In a typical game jam, developers gather for 2-4 days to do a working sprint, the goal being to produce a finished game entirely during the event. I was lucky enough to be around for the Indie Game Jam, which probably started the game jam trend (though I was often too busy working on last-minute GDC lectures to make jam games, sadly!)

From the first Indie Game Jam. Pictured: Brian Jacobson, Sean Barrett, Ken Demarest, Charles Bloom, Jonathan Blow, Brian Sharp, Doug Church.

The Indie Game Jams were very different from the jams we see today. The IGJ was founded on the idea of exploring the design ramifications of a crazy technical question; we would provide attendees with some code to accomplish some technical feat, then they would see where that would lead in terms of design. For example, the question for the first IGJ was “Graphics hardware is pretty fast now; what will people design if we give them an engine that can draw 100,000 little sprite guys on the screen at once?” We were picky about who we invited: attendees had to be designer-programmers who were actually good at programming, because dealing with new technology on a short timeframe can be very challenging.

Also from the first Indie Game Jam. Pictured: Art Min, Charles Bloom, Robin Walker, Thatcher Ulrich, Brian Jacobson, Zack Booth Simpson, and … I don’t know, Justin Hall? Charles Bloom again?

In contrast, contemporary game jams are more open. The idea is that it’s great to make a game, any game, and that even if you don’t manage to finish, you were still part of a community. This community spirit is often upheld as the best part of a game jam.

I think these jams can be really nice for beginners, to show people that making a game is something within reach, and to help them meet other people interested in making games. For experienced developers, though, I think these jams are not so good, because the jams are part of an overall social context that supports stagnation.

Experienced developers would do well to continually hone their craft and push beyond their comfort zone, but jams celebrate low expectations and provide Warm Community Feelings that create a comfort coccoon. “You made a game that is not very interesting? You didn’t finish a game at all? Well, you are still part of a community that likes you because you are participating in a game jam, and that is what is truly important.”

I don’t think there is anything inherently wrong with desiring a feeling of community, but we must take care to separate the notion of participating in the community from the notion of success. Participating in a game jam is not an indicator of success at game development — nor is attending the Independent Game Summit or Indiecade. I think that nice feelings are nice, but you want to build an ecosystem where the nice feelings coincide with behavior that will make developers robust and powerful and interesting in the long term. You don’t want nice feelings to act as a soporific. Activities that are good for beginners are often stagnant for intermediate developers.

What to do about this? I think if there were a category of game jams with higher expectations, so that advanced jams were differentiated from beginner jams, that would be a nice start. The IGJ model definitely asked more of participants, but I don’t think the IGJ is the right thing going into the future, because even if you are playing with challenging technology, the time-limited format prevents you from going deep. IGJ was nice in 2002, but today it would just contribute further to the problem outlined in Chris Hecker’s rant Please Finish Your Game. We see lots of wacky but shallow game designs all over the web, and it feels like a problem, or at least a vast sea of potential unreached.

As Chris says in his write-up about the Depth Jam, “game jams are shallow by design.” Because they are shallow, I don’t feel they are the right place for me to develop as a game design practitioner.

Retreat-Like Gatherings

There have been a few retreat-like gatherings for forward-thinking game designers: see for example Project Horseshoe’s written reports unhelpful.

The obvious problem with these events is that they are mostly just a bunch of talking. When people get together for a bunch of talking, most of them just say a bunch of bullshit (here I mean bullshit in the Harry G. Frankfurt sense) that is unfocused and untethered to reality. If it is not of critical importance whether what people say is right, then most of it is not going to be right.

I really like the retreat model as a basic template. Over the past few years I’ve been to a number of retreats, mostly for things like meditation or silent existential contemplation. There’s a lot good about going to a far-away place where you are not bothered by the concerns of everyday life. But looking at Phrontisterion and Horseshoe, the question arises: how does one design a retreat like this that is not just a bunch of talking?

I knew some of the answer, because I’ve seen success in dealing with a similar question in a neighboring realm:

Local Developer Meet-Ups

Many cities around the world have regular meet-ups; once a month or so, game developers get together at a bar or something, and just chat and be social. I generally don’t go to these because bars are antagonistic to interesting conversations and because attendees tend toward the neophyte side, which means we have the same problem as at conferences: there’s little benefit to be had for an experienced developer. But these events also have the Horseshoe problem, in that the conversation is not focused and doesn’t really matter, so people just say a bunch of nonsense. If all you want is to get drunk and be social, these events are fine, but they don’t do much for professional development.

Two years ago I started a series of monthly developer meetups in the San Francisco Bay Area; the idea was to keep discussion quality high by (a) holding the meetings in quiet places conducive to good discussion, like someone’s house (b) inviting only active game designers [this was not a “game industry” meeting, it was a “thoughtful game designer” meeting], and (c) starting each meeting with one of the attendees presenting their own work. After the presentation, we discuss that work specifically for a while; then at some point, the discussion naturally dissolves into separate, more-general discussions.

(I have been to some developer meet-ups that also began with presentations, but which to me did not manage to establish a culture of quality. One example was the Austin IGDA meetings back in 2002-2003. I think there were many subtle things preventing quality from rising, mostly having to do with intention of the event: the goal of the meeting was to drive attendance, not to be deeply interesting; also, they were “game industry” meetings, with all the associated issues. Also, scale matters: the Austin IGDA meetings were bigger, and they occurred in offices or at places like Dave & Buster’s, which did not encourage a personal connection to the presentation or the other attendees).

The meetings in the Bay Area were very successful at keeping discussion quality relatively high. The situation wasn’t perfect, but it was much better than your typical bar meeting. (I would have attended these regularly even if I were not involved in organizing them). Other attendees really enjoyed the meetings as well. After about a year, though, I stopped arranging the meetings because we seemed to have exhausted the supply of people willing to present, and I didn’t want to start having meetings that were not kicked off by solid presentations. Because everyone had a good time, there’s a reasonable probability that in the future we will pick these up again. I think we might be able to improve the discussions further by reducing the presenter-vs-audience asymmetry somehow (see the Depth Jam section below).

Key to the success of these events was having specific, concrete issues to talk about: a specific game being presented, so that discussion could be anchored to the details of that specific game. It’s also important that the game was being presented by its author; if the session is just someone talking about someone else’s game that they liked or didn’t like or just want to say something about, it is too easy for the discussion to be useless bullshit.

It seemed like this model could be applied to ground a retreat so that it would not just be a bunch of talking.

The Depth Jam idea

Now, we come to the actual Depth Jam. We settled on the basic idea very quickly: the jam would occur in a relaxing retreat-like environment. There would be a limited number of participants; four seemed like a good number. Each participant would have a good game that he has already been working on for a while, and which presents some deep and interesting problem he would like to solve; this problem serves as a focus for discussion. The fact that every particpant has a game under discussion means that every participant has “skin in the game”, which keeps discussion tethered and unfrivolous.

I’d like to emphasize this last point because it can be subtle but it is crucial. Suppose some people are showing their games and being criticized and generally having a rough time due to all the stress that happens naturally when having one’s creation dissected; and meanwhile, the people who are criticizing do not have any obligations, and they are just tossing in comments from the peanut gallery. This situation creates a weird imbalance. The comments and criticism will not be as thoughtful as they could be, yet they will be taken very hard by the people showing the games.

If everyone is having their creations dissected, there’s only one class of attendee instead of two. It is easier to empathize and avoid unnecessary harshness. People are going to be more careful that their ideas and criticisms are thoughtful, because they are acutely aware of wanting careful input when it comes to their own game.

The Format

Our first proposal for the Depth Jam had us spending one entire day on each game, so that we could dive into each game at maximum depth without context-switching. However, as the idea churned, we decided it would be beneficial to allow some iteration. Why not split the day into two time slots, so that each game gets half a day and we cycle through the four games twice? This would allow time to modify the games based on the discussion, which would give the discussion even more teeth: now we’re not just talking about a particular problem in a particular game (with some kind of tendency to wax philosophical), we’re actually figuring out how the author will address the problem here at the retreat, with the results of that approach to be plainly visible in the next session.

In our pre-jam planning meeting, we decided to go even further this way, breaking each day into four slots, cycling through all the games four times (so that each game had a two-hour slot each day). This seemed to work pretty well and it allowed a lot of iteration, but it might have been excessive. Daniel thinks there was too much context-switching and he might have been able to think more effectively if given more stability. My impression is that the talking was great for the first couple of days and degraded in quality toward the end (but was still worthwhile even then). At least two attendees did not have a problem with this, though, and found that the final discussions were very valuable for them.

For the next jam I would propose a mixed format: we’d start with two 4-slot days, just like we did this time, followed by a day with no-talking, all-working-and-quiet-and-taking-a-walk, followed by a final 4-slot day. (Or, perhaps I would lengthen the retreat to 5 days and put the rest day in the middle.)

The nice thing about the 4-slot-per-day format, which would not have been true of the 2-slot-per-day format, is that it gives us the maximum amount of calendar time for ideas to stew between iterations. I have long been appricative of the role of calendar time in good design. Sometimes it doesn’t matter how many hours you pack in trying to get things done; the good ideas will arrive unpredictably, and maybe you just need to allow time for this to happen. I was curious whether this principle would also be true on the timescale of a 4-day retreat. For me, at least, it was; I got my best idea in the shower on the morning of day 3, in response to discussions we’d had on day 2. During my session on day 3 we discussed and refined the idea, and on day 4 I showed an implementation of it.

Creature Comforts

We spent some money renting a nice beach house and ordering catered food; the cost for the event was around $5000. Chris goes into more detail about this in his write-up. You don’t need to spend any money on an event like this, but if you can afford it I recommend spending some money to help create a nice environment that minimizes stress and factors away concerns like “what are we going to eat for dinner” and “who is going to do the dishes”. The purpose is twofold: first, it helps you focus on the subject at hand; second, the minimization of external stresses helps you deal with the potential added stresses of working really hard and disagreeing with people all the time.

If you think that the Depth Jam will help you make more headway on even one deep problem than you would have otherwise, and thus make your game better, it’s easy to think that the game will also sell a little better and that costs in the neighborhood of $1250 per attendee are easily justified.

Attitude

Discussions can easily turn into arguments, or at least energy-sucking disagreements. As peoples’ energy gets drained, they become more irritable, so there’s a feedback loop lurking here. A little bit of this happened at our Depth Jam, but we saw it happening and course-corrected, so that the final day’s discussion was reasonable.

Prior to the jam, though, we had not thought much about this. I think it would be helpful in future for the attendees to go in knowing that irritability is likely to happen; the mere fact of this awareness probably helps the situation, and anyway, a little bit of psychologically-aware pacing at the beginning would have gone a long way.

Games and Attendees

It’s very important the attendees be capable both of participating in good discussions and acting on the discussions to improve their games within a short timeframe. This latter requirement basically limits attendees to being competent designer/programmers. If someone can’t program, it’s hard to see how he can participate meaningfully in this style of depth jam, because it would be very hard to iterate. Possibly if someone is a level/world/puzzle designer who can build scenarios quickly in UDK or Unity or whatever, it can be made to work, but I still think that person would be feeling the limitations of being unable to make algorithmic changes.

I don’t think that having teams of people would work, at least not for the format described here, because it would dilute the energy and bog down the iteration process. If you have one designer and one programmer trying to do the job that the other people are doing as single designer/programmers, your duo is probably going to have a hard time keeping up. If the jam is made up entirely of duos, it’s going to make discussions a lot messier and lower-energy (twice as many people talking about the same number of games). At least one of us felt that four people was already too many for high-quality discussions, because you keep having interesting ideas but have to wait for other people to stop talking in order to say them, and by that time the discussion may have moved on to a different topic.

We have tossed around some ideas about how to scale the jam to larger groups, but haven’t come up with anything that is fully convincing yet.

It’s important that the games be high-quality efforts that pose problems that everyone will be interested in. We were fortunate to have four very interesting games: The Witness, a first-person puzzle game with a heavy emphasis on nonverbal communication.

If time permits I may do a detailed write-up of the issues we discussed for each game and the resolutions we reached (though care must be taken here, as we don’t want to disclose aspects of these games that the designers would rather keep secret).

Conclusions

I am happy with the way this first Depth Jam went. I think we can certainly tweak the format to improve it, but already it is a useful tool in my further development as a game designer. I got much more out of this four-person, four-day event than I do from attending a conference. It seems appropriate to me to do a Depth Jam every six months. Provided we are organized enough to get the next one together, we’ll adjust the format and see how it goes!

See also:

Chris Hecker’s write-up
Daniel Benmergui’s write-up
Miegakure web site

(This article was crossposted to the Witness Development Blog.)


AltDevConf 2012 and 2013!

Original Author: Luke Dicken

#AltDevConf / #AltDevUpdates / Advocacy /

It’s now been almost four months since we held the first AltDevConf, and although things have continued to be absolutely hectic in that time, it felt like it was well past time to post a summary of the event, as well as lay out our plans for the future.

AltDevConf 2012

The first ever AltDevConf was a great success. We ran 26 hours of material from some incredible speakers who are established names within the industry as well as some great up-and-coming talent. We managed to find a five hour window where most of the world would be semi-awake and crammed that 26 hours of content into two days of sessions in that window for a total conference length of just ten hours.

None of this would have been possible without the incredible support we had from the community, both here on AltDevBlogADay and in the wider sense. From helping to organise the conference, to spreading the word, presenting material and even providing feedback afterwards, the success of the AltDevConf is an amazing testament to what we as a community can achieve and we’re sincerely grateful for how much help we received in pulling this off.

We’d be remiss if we didn’t take a minute to thank the attendees. From the outset the big concern was whether we would get enough people to attend the conference to call it a success. We were really worried that we had absolutely no guarantees that we weren’t getting all of the speakers set up to present to an electronic equivalent of tumbleweed and crickets. But you all came; you registered and you attended and again, we couldn’t call it a success without you doing that, so thank you!

Of course, it’s all very well waffling about it, so here’s some numbers to go along with this. Our average per-hour attendance across the whole conference was 276 people. We had a monster turnout for “Cross Platform Game Development in C#”, presented by Matthieu Laban, Philippe Rollin, and Miguel de Icaza with 328 attendees, which was by far our best attended session, although several sessions managed to break 200 attendees.

You can see videos of every session (except one which was withheld at the request of the speaker) on our Youtube channel.  These are now receiving hundreds and in some cases thousands, of views, and we’re delighted to have reached such a large audience after the event itself with these recordings.

However, as successful as we feel the event was, there’s always scope for improvement. We’re definitely conscious that there are parts of the process that could run more smoothly – we’re already working to address this. Equally though, there might be things that we’ve overlooked, so we’re open to any feedback you want to provide – please feel free to email feedback@altdevconf.com or hit the comments at the end of this article!

Announcing AltDevConf ‘13

So today, we’re pleased to announce that we’ll be holding AltDevConf ’13 in around a year. One of the major pieces of feedback we got from speakers was that the proximity to GDC made life a bit awkward, and we were well aware that the Christmas and New year period was also complicating things. To fix this, we have decided to push back the conference to the far side of GDC. We want people to give great sessions at AltDevConf, but not at the expense of their GDC sessions (or, more realistically, vice versa). We’ve not got a specific date nailed down yet, but we’re thinking that we should leave at least 6 weeks from the end of GDC.

Meanwhile, we’re going to spend some time working out the kinks in our “conference pipeline”, as well as starting to put together the infrastructure and support we will need to run an even better, and even bigger AltDevConf in 2013, covering a wider range of material with the same emphasis on high quality sessions from experienced developers. We’ll be putting out a call for proposals, with plenty of notice once we have all the pieces in place. Again, if there’s something you particularly want to see this time that you feel we overlooked, then we want to hear from you either through the article comments or by emailing feedback@altdevconf.com

One Last Thing – AltDevConf Student Summit (Fall 2012)

Given the prospect of not having an event for around a year, we decided to try something a wee bit different in the meantime. We noticed that although the Education track of the 2012 AltDevConf was aimed at Educators, there was interest in it from students, and in the conference as a whole. A lot of game developers are already doing outreach to students in a very focused manner – for example giving guest lectures at local colleges. What we are going to do is hold an entire event based around this kind of session. We want to give the devs a much broader, global audience for their talks, and we want to give students a central place for this sort of thing, rather than relying on the goodwill of the local developer community. We’re calling this the AltDevConf Student Summit.

We’re hoping that the Student Summit will be held in Fall of 2012 (again, exact dates are to be decided), and will feature amongst other things, established names of game development telling students what they truly need to know to prepare for life and success in the industry.

Stay tuned to AltDevBlogADay.com and @AltDevConf to keep up to date on our announcements

Why Kompu Gacha Was Banned

Original Author: Tyler York

The Japanese social gaming market is substantial, kompu gacha is illegal in Japan and both companies are swearing up and down that the new regulation will not cripple their businesses. So what is “kompu gacha”? What made is so valuable to the kingpins of the Japanese social gaming space? And why was it made illegal?

kompu gacha explanation
Image source: InsideSocialGames

Kompu gacha, or “complete gacha”, is a system that strongly incentivizes the gacha monetization method. Gacha is similar to a prize vending machine at a carnival: you pay a small amount of money to receive an item at random. Kompu gacha expands on this mechanic by offering players an extremely valuable grand prize for completing a set of gacha prizes. Since the gacha prizes are awarded at random, it’s very hard to get these grand prizes. If you do the math, they can be worth hundreds of dollars each on average.

social game company comparison

This means big money for Japanese social game companies, whose monetization metrics have record revenues. The extent of this reliance can vary by game (competitive games rely on it much more than casual games), but the overall ARPU lift is clear to see in the graph above. However, kompu gacha as a monetization method isn’t evil: in fact, it’s one that players overwhelmingly enjoy. Kompu gacha mechanics are incredibly popular among players, who enjoy the thrill of possibly winning that grand prize. The use of these mechanics has often been viewed as a win-win for developers and players. Which begs the question:

Why was kompu gacha made illegal?

Kompu gacha is essentially an extension of the core “gacha” mechanic, which gives the player the ability to pay for a chance at a random reward. Random reward schedules aremystery box” mechanic commonly used by American social game companies. The reward is virtual, so this is not explicitly gambling, but the virtual items often have a virtual currency value that can be to a real-money amount. This method has escaped regulation in the past because players can never take their money out of the system, so whether they spent the money “gambling” in game or simply purchasing virtual goods was irrelevant.

However, while gacha itself is not being made illegal, kompu gacha compounded the issue because it has a much lower chance of a much higher payout. This made kompu gacha mechanics feel too close to gambling for Japan’s Consumer Affairs Agency, which banned the practice on May 18th. In addition, two extreme, well publicized cases where a middle school boy spent $5,000 in a month, and one younger student spent $1,500 in three days. While GREE and DeNA have specifically enacted their own consumer protection agency to combat these issues, the government still decided to take additional action.

The kompu gacha scandal teaches two key lessons. First, players love real-money betting on both virtual and real rewards. And second, that social game companies should create a safe, self-regulated environment to prevent excess and restrict players under the age of 18.  Many social games’ similarities to real-money gambling mean that it should be given the same care and attention that gambling companies give their games. All reputable gambling companies, including Betable, are required by law to provide self-exclusion features for gambling addicts and vigilantly restrict players under the age of 18. As social casino gaming explodes onto Facebook and iOS, it’s increasingly important that game companies act responsibly, lest they succumb to a similar fate as kompu gacha.

This was also posted on the Betable Game Monetization Blog.


x64 ABI: Intro to the Windows x64 Calling Convention

Original Author: Rich Skorski

I’ve become fascinated with x64 code recently, and have taken on a quest to learn about it.  There’s a fair amount of information on the net, but there isn’t nearly as much for x64 as for x86 code.  Some of the sources I’ve found were wishy-washy, too, since they were created before or shortly after the rules were agreed upon.  I have found very little in the way of explaining the performance considerations that are not immediately apparent and would come as a surprise to x86 experts.

If you’re here, I’m sure you’re just as interested as I am about it. let me tell you what I know…

What is an ABI?

ABI stands for Application Binary Interface.  It’s a set of rules that describe what happens when a function is called in your program, and answers questions like how to handle parameters and the stack for a function call, what registers (if any) are special, how big data types are…those sorts of things.  These are the rules that the compiler guys follow when they’re determining the correct assembly to use for some bit of code.  There are a lot of rules in the x64 ABI, but the rules that are most open to interpretation make up what’s known as the calling convention.

What is a calling convention?

A calling convention is a set of rules in an ABI that describes what happens when a function is called in your program.  That only applies to an honest to goodness call.  If a function is inlined, the calling convention does not come into play.  For x86, there are multiple calling conventions. If you don’t know about them, Alex Darby does a great job explaining them: start with C/C++ Low Level Curriculum part 3: The Stack and read the later installments as well.

Differing ABIs

An ABI can be specific for a processor architecture, OS, compiler, or language.  You can use that as the short answer as to why Win32 code doesn’t run on a Mac: the ABI is different.  Don’t let the compiler specific implementation scare you.  The rules for an OS and processor are quite solid so they’ll all have to follow those.  The differences can be in how they define the calling convention.

If you think about it, a processor doesn’t know exactly what the stack or functions are.  Those are the crucial parts of a calling convention.  There are processor instructions that facilitate the implementation of the concepts, but it’s up to programmers to use them for great justice.  The compiler takes care of most of that, so we’re at the whims of their implementation when it comes to calling conventions.  It’s more likely that the calling convention rules are influenced by the programming language than anything else.

The finer details will only be a burden if you’re linking targets built by different compilers.  Even then you might not run into any problems because the calling convention is currently standardized for a given platform.  I only mention it in case you read this sometime after it was written and the compilers have diverged.  If it comes to that, certainly consult vendor documentation.

It’s worth highlighting that the idea having multiple calling conventions is unique to 32-bit Windows.  The reason for that is partly legacy and partly because there are few registers compared to other architectures.  Raymond Chen had a series explaining some of the history.  Here’s the 1st in the 5 part series: The history of calling conventions, part 1.

What do you mean by x64?

The label x64 refers to the 64-bit processor architectures that extend the x86 architecture.  It’s full name is x86-64.  You can run x86 code on these processors.  The x86-64 moniker might be something you see in hardware documentation, but x64 is probably what you’ll see most often.  AMD and Intel chips have different implementations and instructions, thus their own distinct names.  AMD is conveniently named AMD 64.  Intel has a few: IA-32e, EMT64, and Intel 64.  EMT64 and Intel 64 are synonymous, the latter one being the most prominent in Intel’s docs. They say there are “slight incompatibilities” between IA-32e and Intel 64, but I don’t know what they are.  If you are curious, they’re buried somewhere in these docs: MSDN – x64 Architecture.

Are there different x64 calling conventions?

On Windows, there is only one calling convention aptly named the “Windows x64 calling convention.”  On other platforms there is another: the “System V calling convention.”  That “V” is the roman numeral 5.  System V is the only option on those systems.  So there are 2 calling conventions, but only 1 will be used on a given platform.

There are some similarities between the Windows and System V calling conventions, but don’t let that fool you.  It would be dangerous to treat them as such.  I myself made that mistake (or would have if I were developing outside of a Windows environment).  There’s also a syscall, which is a direct call to the kernel.  There are different rules for calling them as opposed to the functions you’ll be writing.

I won’t be discussing System V or syscalls here.  I’m not familiar enough with either to speak well about them, and as a game developer you may never deal with them.   But be aware that they exist.

A tip of the hat toward consistency

A theme you’ll see with the Windows x64 calling convention is consistency.  The fact that there aren’t optional calling conventions like there were for Windows x86 is an example of that.  The stack pointer doesn’t move around very much, and there aren’t many “ifs” in the rules regarding parameter passing.  I wasn’t part of any decisions about the calling convention, so I can’t be certain.  But looking at how it turned out I get the impression any decision that may seem peculiar was made for consistency.  I’m not suggesting that alternative solutions would have led to unbearable pain and destruction.  I’m merely suggesting a reason why the calling convention is the way it is.

How does the Windows x64 calling convention work?

The first 4 parameters of a function are passed in registers.  The rest go on the stack.  Different registers will be used for floats vs. integers.  Here’s what registers will be used and the order in which they’ll be used:

Integer: RCX,  RDX,  R8,  R9

Floating-point: XMM0,  XMM1,  XMM2,  XMM3

Integer types include pointers, references, chars, bools, shorts, ints, and longs.  Floating-point includes floats and doubles.

All parameters have space reserved on the stack, even the ones passed in registers.  In fact, there’s stack space for 4 parameters even if your function doesn’t have any params.  Those parameters are 8 bytes so that’s at least 32 bytes on the stack for every function (every function actually has at least 48 bytes on the stack…I’ll explain that another time).  This stack area is called the home space.  There are few reasons behind this home space:

  1. If the registers need to be used for something else, the called function can store the data in the home space without moving the stack pointer.
  2. It keeps the stack structure easy to determine.  That’s very handy for debugging, and perhaps necessary for x64′s stack metadata (another point I’ll come back to another time).
  3. It’s easier to implement variable argument and non-prototyped functions.

Don’t worry, it’s not as bad as it sounds.  Sure, it can be wasteful and it can destroy apps with excessive recursion if you don’t increase the available stack space.  However, the space may not be wasted as often as you think.  The calling convention says that the home space must exist.  It merely suggests what it should be used for.  The compiler can use it for whatever it wants, and an optimized build will likely make great use of it.  But don’t take my word for it, keep an eye on your stack if you start working on an x64 platform.

The return value is quite easy to explain: Integers are returned in the RAX register; Floats are returned in XMM0.

Member functions have an implicit 1st parameter for the “this” pointer.  Take a moment to think about how that’s different from the x86 calling convention…  If you decided there’s no difference, then give yourself some bonus points!  The “this” pointer will be treated as an integer parameter, ergo it will use the RCX register.  Ok, ok, it’s using the full RCX register instead of only the ECX portion, but you get the point.

With regards to function calls, there are registers that are labeled as volatile or non-volatile.  Volatile registers can be used by the called function without storing the original contents (if the calling function cares, it needs to store them before the call).  Non-volatile registers must contain their origial value when the called function returns.  Here’s a table that labels them: MSDN – Register usage.

Notice that the SSE registers are used for float parameters.  Float operations will be taken care of by SSE instructions.  The x87/MMX registers and instructions are available, but I’ve yet to see them used in a Windows x64 program.  If your code uses the x87/MMX registers, the MSDN says that they must be considered volatile across function calls.  As a game developer, you may not care about this at all.  In fact, you may welcome it (I do).  Be aware that this means x64 code uses the same precision for floating-point intermediate values as the operands being used.  On x86, you had the power to use up to 80-bits for intermediate results.  Bruce Dawson will explain that much better than I can: Intermediate floating point precision.

Which parameter gets which register?

There is a 1:1 mapping between parameters and registers.  If you mix types, you still only get 4 parameters in registers.  Take a look at this function declaration:

int DoStuff( float param1, short param2, bool param3, double param4, int param5 );

Where does the bool go?  What register do you think will hold param2?  Here’s what the registers will look like:

XMM0 = param1
RDX = param2
R8 = param3
XMM3 = param4

Param5 isn’t in any register even though only 2 of the registers reserved for integer parameters are used.  Also, param2 and param3 get their own registers even though they could share the same one and still have room to spare. The compiler will not combine multiple parameters into a single register, nor will it stretch a single parameter over multiple registers.

This makes debugging a little easier.  If you want to know where a parameter is in memory or registers, you only need to know where it is in the function’s parameter list.  You won’t have to examine the types that came before it.  This also makes it easier to support unprototyped functions.  There will be details on that in a bit.

Structs

A struct might be packed into an integer register.  For that to happen, the struct must be <= 8 bytes, and its size must be a power of 2.  Meeting that criteria will also allow the struct to be returned in a register.  Even if you mix float and integer member types, it will be placed in an integer register if able.

If a struct doesn’t fit that description, then the original object is copied into a temporary object on the stack.  That temporary object’s address is passed to the function following the same rules as integers and pointers (first 4 in registers, rest on the stack).  It’s the caller’s responsibility to maintain these temporary objects.

You might be wondering what happens if you return a struct that can’t fit in a register.  Well, those functions sneak in an extra 1st parameter, just like the “this” pointer.  This first parameter is the address to a temporary object maintained by the caller.  The RAX register is still used to return the address of the temporary return object.  This provides the function the address of the return object and doesn’t require conditional logic to determine which register has a parameter.  If, for instance, RAX had the return object’s address, certain functions would store RAX upon entry others wouldn’t.

Structs will not use the SSE registers.  Ever.  If you have a struct that is a single float, it will get passed to the function in an integer register or on the stack.  We’ll talk about why that’s a performance concern another time.

SSE types

Surprisingly, SSE types are handled the same as a struct for the most part.  Even though the SSE registers are perfect in this situation, they will still have a temporary put on the stack and have an address passed to the function.  I find that super frustrating, but it does make sense.  Remember that the home space reserves space for 4 separate 8 byte parameters.  There’s no room for a 16 byte SSE parameter.  So instead of messing with that consistent behavior, SSE types use the rules already explained up to this point.  Another point for consistency.  It also makes it easier to implement vararg functions which are explained below.

Unlike structs, return values will go in the XMM0 register.  Hooray for that, at least.  If you’re using shiny new hardware that has AVX extensions, then these rules apply to the __m256 types as well, and the YMM0 register is used for the return value.

Varargs

Since different registers are used to pass parameters of different types, you may be scratching your head wondering how vararg functions are handled.  Integers will be treated the same, but floats will have their value duplicated in the integer registers.  This lets the called function stores values in the home space without needing to decide which register to use.  It will always store the integer register.  If it decides one of those varargs was a float, then it can use the SSE register as is or load into the SSE register directly from the home space.   This is another reason why SSE types aren’t passed in the SSE registers; they wouldn’t fit in the integer registers.

Unprototyped functions

The C89 standard allowed you to call a function without function prototype.  I had no idea that was ever possible until reading up on the x64 calling conventions.  This feature paired with varargs is another reason why there’s a 1:1 mapping between parameters and registers and why SSE types aren’t passed in registers.  It may even be the only reason, and everything else is just a side effect.  Regardless, this is how things are.  The x64 calling convention was able to get rid of vestigial pieces like calling convention options and nearly drop x87, but this bit of history sticks with us.

There’s still more to talk about!

There’s a lot here, and there’s a lot more to talk about.  I’ll let you digest this for now, and we’ll fill in some of the gaps and answer questions you’re likely to come up with later.

Here’s some of what you can expect in the next post(s):

  • How the stack is controlled and how it behaves
  • Exception handling and stack walking
  • Performance considerations
  • RIP-relative addressing

Working with brands, utilizing player emotion, and other lessons in game monetization

Original Author: Tyler York

Last week, our SF Game Monetization meetup group hosted its second speaker event, Maximize Your Virtual Goods Revenue. We had over 150 people attend to socialize and watch three awesome speakers share what they’ve learned about game monetization. Check out each speaker’s presentation below.

(Author’s note: I can’t get the embedded Slideshare presentations doesn’t work. Please click on the title to go to each slide deck.)

 

Leveraging Branded Virtual Goods, by YuChiang Cheng – Founder & CEO of WGT

YuChiang Cheng is the co-founder of WGT (World Golf Tour), the #1 online golf game with a community of millions of online players, making it the largest golf website in the world. Prior to WGT, YuChiang served on the executive team at WagerWorks, launching Virgin Games, World Poker Tour and Hard Rock Casino. WagerWorks was acquired by IGT for $90 million in 2005.

In his presentation, YuChiang shared the pros and cons of working with large, established brands and licensing their IP for your game. This is a great primer for anyone looking into IP for your game so that you can maximize your return on investment, negotiate appropriately and avoid the potential pitfalls of a licensing deal.

 

Best Practices for Maximizing Revenue in Free-to-Play Games, by Josh Burns – Associate Director, Products at 6Waves

Josh is a Associate Director, Products for 6waves, the largest global publisher of independent games on Facebook, iOS, and Android. Josh has worked with developers on more than 50+ games across Facebook, iOS, and Android to provide game advisory, including Kingdoms of Camelot, Ravenwood Fair and Mall World. Prior to joining 6waves in early 2010, Josh held a hybrid market research, analytics and product management role at Electronic Arts.

In his talk, Josh shared his learnings from working with some of the top social games of the last 5 years on their monetization and virality strategies. With 50 slides and 11 Best Practices, this deck is a great playbook of tactics for any free-to-play game.

 

Color of Money (Monetization of Emotion), by Max Skibinsky – Founder of Hive7 (Sold to Playdom), Founder of Inporia

Max Skibinsky is serial entrepreneur, angel investor & start-up mentor for past 17 years in Silicon Valley. He bootstrapped his first consulting startup over decade ago working with clients such as Netscape, AOL, and Electronic Arts. He founded Hive7, one of the very first social gaming companies, which produced first Facebook MMOG Knighthood that grew over 6 million players. In 2010 Hive7 was sold to Playdom/Disney. Most recently, Max co-founded mobile e-commerce startup Inporia that secured investments from Y-Combinator, Ron Conway, NEA, Clearstone & 500 Startups.

Max’s presentation, Color of Money, talks about the psychology behind game monetization and what really drives the user’s intent to purchase. When a player’s emotions are involved (such as their investment in a character, or their desire for revenge against a foe), they have a much stronger incentive to pay.

This was originally posted on the Betable blog.


That’s Not Normal–the Performance of Odd Floats

Original Author: Bruce-Dawson

Denormals, NaNs, and infinities round out the set of standard floating-point values, and these important values can sometimes cause performance problems. The good news is, it’s getting better, and there are diagnostics you can use to watch for problems.

In this post I briefly explain what these special numbers are, why they exist, and what to watch out for.

This article is the last of my series on floating-point. The complete list of articles in the series is:

The special float values include:

Infinities

Positive and negative infinity round out the number line and are used to represent overflow and divide-by-zero. There are two of them.

NaNs

NaN stands for Not a Number and these encodings have no numerical value. They can be used to represent uninitialized data, and they are produced by operations that have no meaningful result, like infinity minus infinity or sqrt(-1). There are about sixteen million of them, they can be signaling and quiet, but there is otherwise usually no meaningful distinction between them.

Denormals

Most IEEE floating-point numbers are normalized – they have an implied leading one at the beginning of the mantissa. However this doesn’t work for zero so the float format specifies that when the exponent field is all zeroes there is no implied leading one. This also allows for other non-normalized numbers, evenly spread out between the smallest normalized float (FLT_MIN) and zero. There are about sixteen million of them and they can be quite important.

If you start at 1.0 and walk through the floats towards zero then initially the gap between numbers will be 0.5^24, or about 5.96e-8. After stepping through about eight million floats the gap will halve – adjacent floats will be closer together. This cycle repeats about every eight million floats until you reach FLT_MIN. At this point what happens depends on whether denormal numbers are supported.

If denormal numbers are supported then the gap does not change. The next eight million numbers have the same gap as the previous eight million numbers, and then zero is reached. It looks something like the diagram below, which is simplified by assuming floats with a four-bit mantissa:

image

With denormals supported the gap doesn’t get any smaller when you go below FLT_MIN, but at least it doesn’t get larger.

If denormal numbers are not supported then the last gap is the distance from FLT_MIN to zero. That final gap is then about 8 million times larger than the previous gaps, and it defies the expectation of intervals getting smaller as numbers get smaller. In the not-to-scale diagram below you can see what this would look like for floats with a four-bit mantissa. In this case the final gap, between FLT_MIN and zero, is sixteen times larger than the previous gaps. With real floats the discrepancy is much larger:

image

If we have denormals then the gap is filled, and floats behave sensibly. If we don’t have denormals then the gap is empty and floats behave oddly near zero.

The need for denormals

One easy example of when denormals are useful is the code below. Without denormals it is possible for this code to trigger a divide-by-zero exception:

float GetInverseOfDiff(float a, float b)
 
  {
 
      if (a != b)
 
          return 1.0f / (a - b);
 
      return 0.0f;
 
  }

This can happen because only with denormals are we guaranteed that subtracting two floats with different values will give a non-zero result.

To make the above example more concrete lets imagine that ‘a’ equals FLT_MIN * 1.125 and ‘b’ equals FLT_MIN. These numbers are both normalized floats, but their difference (.125 * FLT_MIN) is a denormal number. If denormals are supported then the result can be represented (exactly, as it turns out) but the result is a denormal that only has twenty-one bits of precision. The result has no implied leading one, and has two leading zeroes. So, even with denormals we are starting to run on reduced precision, which is not great. This is called gradual underflow.

Without denormals the situation is much worse and the result of the subtraction is zero. This can lead to unpredictable results, such as divide-by-zero or other bad results.

Even if denormals are supported it is best to avoid doing a lot of math at this range, because of reduced precision, but without denormals it can be catastrophic.

Performance implications on the x87 FPU

The performance of Intel’s x87 units on these NaNs and infinites is pretty bad. Doing floating-point math with the x87 FPU on NaNs or infinities numbers caused a 900 times slowdown on Pentium 4 processors. Yes, the same code would run 900 times slower if passed these special numbers. That’s impressive, and it makes many legitimate uses of NaNs and infinities problematic.

Even today, on a SandyBridge processor, the x87 FPU causes a slowdown of about 370 to one. I’ve been told that this is because Intel really doesn’t care about x87 and would like you to not use it. I’m not sure if they realize that the Windows 32-bit ABI actually mandates use of the x87 FPU (for returning values from functions).

The x87 FPU also has some slowdowns related to denormals, typically when loading and storing them.

Historically AMD has handled these special numbers much faster on their x87 FPUs, often with no penalty. However I have not tested this recently.

Performance implications on SSE

Intel handles NaNs and infinities much better on their SSE FPUs than on their x87 FPUs. NaNs and infinities have long been handled at full speed on this floating-point unit. However denormals are still a problem.

On Core 2 processors the worst-case I have measured is a 175 times slowdown, on SSE addition and multiplication.

On SandyBridge Intel has fixed this for addition – I was unable to produce any slowdown on ‘addps’ instructions. However SSE multiplication (‘mulps’) on Sandybridge has about a 140 cycle penalty if one of the inputs or results is a denormal.

Denormal slowdown – is it a real problem?

For some workloads – especially those with poorly chosen ranges – the performance cost of denormals can be a huge problem. But how do you know? By temporarily turning off denormal support in the SSE and SSE2 FPUs with _controlfp_s:

#include <float.h>
 
  // Flush denormals to zero, both operands and results 
 
  _controlfp_s( NULL, _DN_FLUSH, _MCW_DN );// Put denormal handling back to normal. 
 
  _controlfp_s( NULL, _DN_SAVE, _MCW_DN );

This code does not affect the x87 FPU which has no flag for suppressing denormals. Note that 32-bit x86 code on Windows always uses the x87 FPU for some math, especially with VC++ 2010 and earlier. Therefore, running this test on a 64-bit process may provide more useful results.

If your performance increases noticeably when denormals are flushed to zero then you are inadvertently creating or consuming denormals to an unhealthy degree.

If you want to find out exactly where you are generating denormals you could try enabling the underflow exception, which triggers whenever one is produced. To do this in a useful way you would need to record a call stack and then continue the calculation, in order to gather statistics about where the majority of the denormals are produced. Alternately you could monitor the underflow bit to find out which functions set it. See this paper.

Don’t disable denormals

Once you prove that denormals are a performance problem you might be tempted to leave denormals disabled – after all, it’s faster. But if it gives you a speedup means that you are using denormals a lot, which means that if you disable them you are going to change your results – your math is going to get a lot less accurate. So, while disabling denormals is tempting, you might want to consider investigating to find out why so many of your numbers are so close to zero. Even with denormals in play the accuracy near zero is poor, and you’d be better off staying farther away from zero. You should fix the root cause rather than just addressing the symptoms.

Playing (with) Video

Original Author: Niklas Frykholm

So you want to play some video? Shouldn’t be too hard, right? Just download some video playing library and call the play_video() function. Easy-peasy-lemon-squeezy.

Well, you have to make sure that the video is encoded correctly, that the library works on all platforms and plays nice with your memory, file, sound and streaming abstractions, and that the audio and video doesn’t desynchronize, which for some inexplicable reason seems to be a huge problem.

But this is just technical stuff. We can deal with that. What is worse is that video playback is also a legal morass.

There are literally thousands of broad patents covering different aspects of video decompression. If you want to do some video coding experiments of your own you will have to read, understand and memorize all these patents so that you can carefully tip-toe your code and algorithms around them.

Of course, if you had a big enough pool of patents of your own you might not have to care as much, since if someone sued you, you could sue them right back with something from your own stockpile. Mutually assured destruction through lawyers. Ah, the wonderful world of software patents.

So, creating your own solution is pretty much out of the question. You have to pick one of the existing alternatives and do the best you can with it. In this article I’m going to look at some different options and discuss the advantages and drawbacks of each one:

  • Just say no

  • Bink

  • Platform specific

  • H.264

  • WebM

There are other alternatives that didn’t make it to this list, such as Dirac, Theora, and DivX. I’ve decided to focus on these five, since in my view H.264 is the best of the commercial formats and WebM the most promising of the “free” ones.

An initial idea might be: Why not just do whatever it is VLC does? Everybody’s favorite video player plays pretty much whatever you throw at it and is open source software.

Unfortunately that doesn’t work, for two reasons. First, VLC:s code is a mix of GPL and LGPL stuff. Even if you just use the LGPL parts you will run into trouble on platforms that don’t support dynamic linking. Second, the VLC team doesn’t really care about patents and just infringe away. You can probably not afford to do the same. (As a result, there is a very real threat that VLC might be sued out of existence.)

A quick introduction

Before we start looking at the alternatives I want to say something short about what a video file is, since there is some confusion in the matter, even among educated people.

A video file has three main parts:

  • Video data (H.264, DivX, Theora, VP8, …)

  • Audio data (MP3, AAC, Vorbis, …)

  • A container format (Avi, Mkv, MP4, Ogg, …)

The container format is just a way of packing together the audio and video data in a single file, together with some additional information.

The simplest possible container format would be to just concatenate the audio data to the video data and be done with it. But typically we want more functionality. We want to be able to stream the content, i. e. start playing it before we have downloaded the whole file, which means that audio and video data must be multiplexed. We also want to be able to quickly seek to specific time codes, so we may need an index for that. We might also want things like audio tracks in different languages, subtitling, commentary, DVD menus, etc. Container formats can become quite intricate once you start to add all this stuff.

A common source of confusion is that the extension of a video file (.avi, .mkv, .mp4, .ogg) only tells you the container format, not the codecs used for the audio and video data in the container. So a video player may fail to play a file even though it understands the container format (because it doesn’t understand what’s inside it).

Option 1: Just say no

Who says there has to be video in a game? The alternative is to do all cut scenes, splash screens, logos, etc in-game and use the regular renderer for everything. As technology advances and real-time visuals come closer and closer in quality to offline renders, this becomes an increasingly attractive option. It also has a number of advantages:

  • You can re-use the in-game content.

  • Production is simpler. If you change something you don’t have to re-render the entire movie.

  • You don’t have to decide on resolution and framerate, everything is rendered at the user’s settings.

  • You can dynamically adapt the content, for example dress the players in their customized gear.

  • Having everything be “in-game visuals” is good marketing.

If I was making a game I would do everything in-game. But I’m not, I’m making an engine. And I can’t really tell my customers what they can and cannot do. The fact is that there are a number of legitimate reasons for using video:

  • Some scenes are too complex to be rendered in-game.

  • Producing videos can be simpler than making in-game content, since it is easier to outsource. Anybody can make a video, but only the core team can make in-game content and they may not have much time left on their hands.

  • Playing a video while streaming in content can be used to hide loading times. An in-game scene could be used in the same way, but a high-fidelity in-game scene might require too much memory, not leaving enough for the content that is streaming in.

As engine developers it seems we should at least provide some way of playing video, even if we recommend to our customers to do their cutscenes in-game.

Option 2: Bink

Bink from RAD game tools is as close as you can get to a de facto standard in the games industry, being used in more than 5800 games on 14 different platforms.

The main drawback of Bink is the pricing. At $ 8500 per platform per game it is not exactly expensive, but for a smaller game targeting multiple platforms that is still a noticeable sum.

Many games have quite modest video needs. Perhaps they will just use the video player for a 30 second splash screen at the start of the game and nothing more. Paying $ 34 000 to get that on four platforms seems excessive.

At Bitsquid our goal has always been to develop an engine that works for both big budget and small budget titles. This means that all the essential functionality of an engine (animation, sound, gui, video, etc) should be available to the licensees without any additional licensing costs (above what they are already paying for an engine). Licensees who have special interest in one particular area may very well choose to integrate a special middleware package to fulfill their needs, but we don’t want to force everybody to do that.

So, in terms of video, this means that we want to include a basic video player without the $ 8500 price tag of Bink. That video player may not be as performant as Bink in terms of memory and processor use, but it should work well enough for anyone who just wants to play a full screen cutscene or splash screen when the CPU isn’t doing much else. People who want to play a lot of video in CPU taxing situations can still choose to integrate Bink. For them, the price and effort will be worth it.

Option 3: Platform specific

One approach to video playing is to not develop a platform-independent library but instead use the video playing capabilities inherent in each platform. For example, Windows has Windows Media Foundation, MacOS has QuickTime, etc.

Using the platform’s own library has several advantages. It is free to use, even for proprietary formats, because the platform manufacturers have already payed the license fees for the codecs. (Note though, that for some formats you need a license not just for the player, but for the distribution of content as well.) The implementation is already there, even if the APIs are not the easiest to use.

The biggest advantage is that on low-end platforms, using the built-in platform libraries can give you access to special video decoding hardware. For example, many phones have built-in H.264 decoding hardware. This means you can play video nearly for free, something that otherwise would be very costly on a low-end CPU.

But going platform specific also has a lot of drawbacks. If you target many platforms you have your work cut out for you in integrating all their different video playing backends. It adds an additional chunk of work that you need to do whenever you want to add a new platform. Furthermore, it may be tricky to support the same capabilities on all different platforms. Do they all support the same codecs, or do you have to encode the videos specifically for each platform? Do all platforms support “play to texture” or can you only play the videos full screen? What about the sound? Can you extract that from the video and position it as a regular source that reverbs through your 3D sound world? Some platforms (i.e. Vista) have almost no codecs installed by default, forcing you to distribute codecs together with your content.

Since we are developing a generic engine we want to cover as many platforms as possible and minimize the effort required to move a project from one platform to another. For that reason, we need a platform independent library as the primary implementation. But we might want to complement it with platform specific libraries for low end platforms that have built-in decoding hardware.

Option 4: H.264 (MPEG-4, AVC)

Over the last few years H.264 has emerged as the most popular commercial codec. It is used in Blu-ray players, video cameras, on iTunes, YouTube, etc. If you want a codec with good tool support and high quality, H.264 is the best choice.

However, H.264 is covered by patents. Patents that need to be licensed if you want to use H.264 without risking a lawsuit.

The H.264 patents are managed by an entity known as MPEG LA. They have gathered all the patents that they believe pertain to H.264 in “patent pool” that you can license all at once, with a single agreement. That patent pool contains 1700 patents. Yes, you read that right. The act of encoding/decoding a H.264 file is covered by 1700 patents. You can find the list in all its 97 page glory at The Bitsquid blog.


The devolution of gaming culture

Original Author: Kyle-Kulyk

Gaming culture has a problem and that problem has a lot to do with gamers themselves.  To be clear, I’m not talking about all gamers but rather a subset of gamers whose antisocial behaviour and habits drive people away from gaming.  Analysts at Piper Jaffray recently conducted a survey that found nearly 66% of high school students surveyed across the US claimed they were losing interest in traditional videogames with slightly over 66% stating they were interested in social, mobile games which was an increase from 34% who answered the same question the year prior.  Gaming as we know it is changing for a variety of reasons and one of those reasons is gamers have chosen to turn on each other as well as the people who make the videogames they play.  While gaming culture tries to evolve and leave the primordial seas, certain gamers are busy running along the shore with sharpened sticks trying to force us all back in.

The problem lies with the internet and the anonymity it affords its users.  This effect certainly isn’t limited to just gaming circles but as gamers tend to be a largely wired group of individuals the impact is pronounced.  Gaming has always had a social side but over the decades that’s changed, and you could certainly argue, not for the better.  Back in the 80’s and 90’s, gamers would flock to arcades or journey to friend’s houses to partake in the hottest, latest releases.  To illustrate how it’s changed, imagine four friends over for an afternoon session of GoldenEye sitting in their family den.  Now imagine one of those children lets loose with a barrage of profanity laced, racist, homophobic rants aimed at his fellow gamers.  Or imagine someone’s little sister is also invited to play and subjected to a stream of masturbation and rape jokes.  There’s a very good chance that the child would simply never be invited back for another GoldenEye marathon.  There’s also a chance that little Jimmy’s mother, having overheard the obscene rant would never allow that child in her house ever again and would make a quick phone call to inform the offending child’s parents of their unacceptable behaviour.

This type of antisocial behaviour infects network gaming and social interactions across the internet and therein lies the difference between gaming culture now and gaming culture then.  There are few, if any, social repercussions in gaming today and the impact of these behaviours eats at the fun factor of gaming for a large number of gamers, children and adult alike.  It doesn’t matter if you’re looking to game socially on your console or if you’re looking to partake in a general gaming discussion on the internet, odds are your experience will be sullied by another gamer hiding behind their internet pseudonym.

Researchers refer to this as “toxic disinhibition”.  The anonymity the internet and online gaming networks offers often results in the complete abandonment of social restrictions that would generally be present in face to face interactions such as in the days when we gamed locally with other people in the room.  The result of this “trolling” is we see more and more gamers being turned off of gaming, or seeing their enjoyment of games lessened, thus inhibiting the growth of gaming culture.  The impact of this online disinhibition also affects developers who can find themselves loathe to engage their own fan base for fear of fanboy backlash through internet flaming.

Recently, gamers made headlines for their disproportionate backlash against Mass Effect 3 developer Bioware.  The actions of certain gamers painted all gamers as whiney, entitled children prone to screaming fits when denied their pacifier.  Bioware found itself facing a FTC complaint while its writers and staff were targets of hate campaigns and death threats as some found the end of their latest game offering to be unsatisfactory.  Other gamers and non-gamers alike shook their collective heads in disbelief.  Blizzard also suffered a ridiculous backlash from gamers when screenshots of their now released Diablo 3 title were deemed “too bright” by some prior to the game’s launch, prompting Blizzard to mock the users by releasing screenshots containing unicorns shooting rainbows from their posteriors.  While Blizzard used the experience to have a bit of fun, the example illustrates a growing trend among gamers to instantaneously and viciously attack developers and other gamers alike for even perceived slights and as a whole, the gaming community becomes a less inviting place.

The online disinhibition effect certainly isn’t limited to gaming forums either as the development community itself isn’t immune from unprofessional behaviour.  I’ve seen my own personal blog postings regarding my development experiences targeted by other developers leveling harsh and often unfair criticisms.  For example, I’ve had a developer lambast the simply inclusion of our company logo on a splash page because “no one cares about your company”.  I’ve even had a local developer I didn’t know and had never met criticize my company online for the slight of not consulting with their group prior to launching our first game.  This type of challenging behaviour is far more likely to be witnessed online than in face to face interactions or official business communication and unfortunately it is becoming more prevalent.

They say in general you need a thick skin to blog but backlash I received from a recent blog post made me question the value of blogging my own experiences as a developer.  I posted a personal blog listing some of the complete, all in one game engines available that may be of interest to independent developers.  While researching game engines for my company I would have found such a blog useful as I looked for a game engine that offered features I required, such as Android and iOS porting and clearly the blog was not meant as an in depth review piece.  As I had not the opportunity to try each engine I listed, I made sure to note that where I was unfamiliar with the engine I was simply relaying information and opinions from various reviews I had come across and I provided links to each product so users could conduct further research.  Rather than promote a thoughtful discussion on the merits of various game engines as I had intended or to provide a starting point for further research, the resulting comments were almost all attacks against myself personally and my attempt to inform other indie developers.

The comments included people calling me a liar, posters comparing my blog to vulgar activities, writers incensed that I didn’t whole heartedly endorse their particular favorite engine.  I even read claims that I was intentionally trying to harm product reputations despite the fact I noted these opinions were sometimes not even mine but were simply being passed along when I lacked particular knowledge of the product being discussed.  I was frankly shocked by the lack of decorum I witnessed in response to a personal blog intended to simply inform and facilitate further research, and if other developers hadn’t contacted me directly to offer their support (with one commenting he would have done it publicly if not afraid of being “flamed” himself) I most likely would have never written another blog regarding my game development experiences.  This general lack of professionalism in a workplace environment would never be tolerated.  Indeed, when gamers and developers are afraid to share ideas due to fear of reprisal it’s time to take a hard look at the current situation and what repercussions this could have to our industry as a whole.  As a community, this type of behaviour should not be allowed to propagate.  I’ve been subjected to all manner of hate mail and threats from casual gamers and stalking fanboys alike over the years writing opinion pieces regarding the games industry, however the lack of professionalism I’ve witnessed since becoming a developer myself truly surprised me.

Gaming culture is suffering due to experiences like this, due to experiences like those Bioware recently endured and due to the ongoing profane, racist and homophobic behaviour tolerated every day in online gaming matches and in internet gaming forums.  The anonymity of the internet mixed with complacency among gamers and developers has led to this situation and the associated cyberbullying that goes along with it but as the genie is out of the bottle with regards to the internet, there is little that we can do to curb its impact.  The removal of anonymity in online gaming by the companies that operate these networks could potentially result in fewer incidents as people are less inclined to act in socially unacceptable manners when their real names and locations are attached to their actions, however this system would still rely on reporting tools that already exist are underutilized by the majority of gamers.  Most prefer to simply ignore the problem, and this does nothing to stem the rise of anti-social behaviour in the gaming community.

As more teens are turning towards social gaming where they can exercise more direct control over their social interactions through use of things like Facebook friend lists, as more potential gamers are turned off by what they see of gamers in the news and more core gamers turn away from online gaming and game forums based on the sliding social environment, today’s gaming culture must change or it will face decline.  We’re already seeing traditional game sales slide as gamers look elsewhere and a shift towards mobile games is evident.  Partial blame falls on gamers themselves for creating and tolerating an increasingly toxic game culture that runs contrary to the social spirit that videogames created for many of us while gaming in the 80’s and 90’s, and even some developers themselves are letting professionalism standards slump in their online communications which itself is the start of a slippery slope.  We can never go back to the way gaming was, but we can shape the future of gaming culture for the better by being conscious of where it went wrong and why.

SQL Server: High performance inserts

Original Author: Ted Spence

In order to write high performance SQL code, we need to know more about what’s going on underneath the hood. We won’t need to go into exhaustive details on internals, but we’ll go in depth into a simple challenge and show you how you can create robust scalability one feature at a time.

Other articles in this series:

  • Part 1: Effective SQL Server Techniques
  • Storing Data In SQL Server

    Let’s imagine that we’re building the data layer for a multiplayer game. The data layer probably contains tons of different persistent storage devices – user logs, game activity diagnostic data, auction houses, persistent user data, and so on. For today’s lesson we’ll create a database of item drops from monster kills.

    To start I will assume you’re not using an why ORM systems aren’t always the right idea, so proceed there at your own risk.

    Revision One: A Bad Design

    I’ll present, first of all, the kind of database design we may all have seen at one time or another. Here’s a table that’s not really optimized for massive data storage; it’s ugly but workable:

    1
     
      2
     
      3
     
      4
     
      5
     
      6
     
      7
     
      8
     
      9
     
      10
     
      11
     
      12
     
      13
     
      14
     
      15
     
      16
     
      17
     
      18
     
      
    CREATE TABLE item_drops_rev1 (
     
       
     
          -- This is the reference to the Items system ID number; it's not a foreign key 
     
          -- because the "Items" table is on a different database server
     
          item_id BIGINT, 
     
       
     
          -- This is the internal monster class name, e.g. "Creature.Wolf" or "Player.NPC"
     
          monster_class VARCHAR(50), 
     
       
     
          -- The Zone ID refers to a database on the same server, so we built it as a 
     
          -- foreign key
     
          zone_id INT FOREIGN KEY REFERENCES world_zones(zone_id),
     
       
     
          -- The position and time where and when the kill happened
     
          xpos REAL, 
     
          ypos REAL,
     
          kill_time datetime DEFAULT GETDATE()
     
      )

    We can probably live with this table structure. It’s not normalized, but it gets the job done. I’ll explain in a moment what we can do to improve on it.

    On the other hand, for the client code, I’m going to show you the wrong way to get things done: I’ll create a SQL injection vulnerability. Any web-facing application or service layer is at risk for SQL injection, and frankly it’s easy to overlook something and get lazy and only discover your server hacked a month later. If you ever see this kind of code, this is by far the worst thing you can ever do:

    1
     
      2
     
      3
     
      4
     
      5
     
      6
     
      7
     
      8
     
      9
     
      10
     
      11
     
      12
     
      
    SqlConnection conn = new SqlConnection(my_connection_string);
     
      conn.Open();
     
      // This is a massive SQL injection vulnerability - don't ever write your own SQL statements with string formatting!
     
      String sql = String.Format(@"INSERT INTO item_drops 
     
          (item_id, monster_class, zone_id, xpos, ypos) 
     
          VALUES 
     
          ({0}, '{1}', {2}, {3}, {4}, {5})", item_id, monster_class, zone_id, xpos, ypos);
     
      SqlCommand cmd = new SqlCommand(sql, conn);
     
      cmd.ExecuteNonQuery();
     
      // Because this call to Close() is not wrapped in a try/catch/finally clause, it could be missed if an 
     
      // exception occurs above.  Don't do this!
     
      conn.Close();

    What’s bad about it? Do you see how the SQL statement is being assembled by a string parser? SQL statements are basically dynamic code. You would never let a random user type code into a web browser and execute it on your server, so don’t do the same thing with your database. Because SQL as a language has lots of features for combining multiple statements, someone could write a malicious monster_class string that would have terrible side effects. A simple attack might be “‘;UPDATE players SET gold=gold+10000;–”, but there are lots of other more subtle attacks too. SQL Server even has a feature called xp_cmdshell() that can allow a SQL injection vulnerability to turn into a sitewide security breach.

    So don’t ever write your own SQL statements using string parsers; you instead want to use using statement. When the .NET execution environment exits a “using() { }” clause, it automatically disposes all resources that were obtained in the initialization. So those two items said, here’s what good code looks like:

    1
     
      2
     
      3
     
      4
     
      5
     
      6
     
      7
     
      8
     
      9
     
      10
     
      11
     
      12
     
      
    using (SqlConnection conn = new SqlConnection(my_connection_string)) {
     
          conn.Open();
     
          string sql = "INSERT INTO item_drops_rev1 (item_id, monster_class, zone_id, xpos, ypos) VALUES (@item_id, @monster_class, @zone_id, @xpos, @ypos)";
     
          using (SqlCommand cmd = new SqlCommand(sql, conn)) {
     
              cmd.Parameters.Add("@item_id", item_id);
     
              cmd.Parameters.Add("@monster_class", monster_class);
     
              cmd.Parameters.Add("@zone_id", zone_id);
     
              cmd.Parameters.Add("@xpos", xpos);
     
              cmd.Parameters.Add("@ypos", ypos);
     
              cmd.ExecuteNonQuery();
     
          }
     
      }

    Besides being easier to read, this approach guarantees that SQL injection will not be a problem. So this is where I’ll start.

    This code, using the SQLCommand and parameters, gets the job done and it’s fast. Running a test on an old laptop I have around, I was able to insert 10,000 records in 7.768 seconds. This kind of performance is good enough for most companies; running on a proper server it would be 3-4 times faster and everyone would be happy. For a business application this code is fine, easy to maintain, and reliable.

    That said, we’re a gaming company, and what works for us is a bit different than the code used by an accounts payable system that processes a few tens of thousands of records per day. You’re probably going to put this code up in front of a web service that gets called every time a zone server in your game records a kill, so maybe 10,000 kills in 7-8 seconds isn’t good enough. Let’s see what we can find to make improvements.

    Performance Limitations Of Revision One

    In order to make this database insert process fast, let’s start looking at what happens when you execute this command. I’ll overlook all the internal stuff that we don’t have control over, and show you instead a list of a few problems this code suffers from:

    • connection pooling to retrieve a previously opened connection with an identical connection string. If you’re not using C# for your client library, make certain that your environment provides pooling!
    • configure a VLAN to do this, and most servers nowadays have lots of spare ethernet ports.
    • SQL Server Parser is so fast its timing is often negligible; but we can still help cut down on its work by using stored procedures instead of straight SQL.
    • ACID compliance, SQL Server has to ensure that all database changes are serialized. It does this by using locks to ensure that database inserts don’t conflict with each other. Most SQL Servers are multi-user machines: in addition to your “INSERT” statement, the SQL server is probably handling dozens or hundreds of other clients at the same time. Maybe client A is inserting item drop records, while client B is running reports on these records, and client C is sending stats to the community manager in realtime.

      Well, when SQL Server is doing all of these things, it generates locks to ensure that A, B, and C each receive sensible results. SQL Server guarantees that each statement is executed while the database is in a fully predictable state, and managing those locks takes time. Additionally, if SQL Server is just overloaded with too many clients, it doesn’t matter how well written your query is – it will be slow! You need to take a holistic look at the server’s load levels, and continue to monitor each and every client that connects to the server to see where optimizations can take place.

    • constraint (a foreign key, or a default, or a “check” statement) takes a non-zero amount of time to test. SQL server guarantees that each record inserted, updated, or deleted will continue to satisfy all constraints on the table. Do you really need foreign keys for large, high volume tables?
    • Hadoop cluster on Amazon EC2 that you keep paying for each month.
    • Varchars. Varchars are essential tools, but they can also lead to lots of unexpected overhead. Each time you store a variable length column, SQL Server has to do more memory management to pack records together. Strings can easily consume hundreds of bytes of memory each. If you put a VARCHAR column in an index, SQL Server has to execute tons of O(string length) comparisons as it searches through the B-tree, whereas integer compare instructions are limited only by memory latency and processor frequency.
    • wage constant war between normalizing and denormalizing; don’t ever let yourself get stuck on one side of the fence or the other.

    After all this overhead, your query will return. In most cases, the query is so fast you won’t even realize all this overhead happened. However, there’s one more thing to keep in mind:

    • Disk IO. SQL Server will eventually write the data you’ve just inserted to disk. In SQL Server, data is first written to a transaction log, then the transaction log is merged with the permanent database file when a backup is executed. This happens in the background, so it won’t delay your query directly; but each transaction that occurs must have its own lump of disk IO. You can reduce this overhead by giving SQL server physically separate disks for transaction logs and main DB files. Of course, the best solution is to actually reduce the number of transactions you execute.

    As you can see, there’s a lot to consider when optimizing SQL. The good news is that these limitations are manageable, provided you pay attention to them.

    Revision Two: Free Improvements

    Given what we’ve learned, let’s rewrite the SQL Insert command and see what benefit we can get. First, we can normalize our database by moving “monster_class” into its own table. We can then load into our client application the list of monster classes and use a hashtable to find the ID number to insert into the database directly. That reduces the amount of data stored in each record, and it makes each record identical in size. It offloads some of the work that SQL server has to perform and distributes it to the application tier.

    With these improvements, we’ll see a reduction in data storage size per record:

    One row in memory

    The old vs new record layouts

    These changes cut down the size of the record noticeably. The original record used to take between 41 and 91 bytes, probably averaging 60 or so bytes per record; the new record takes exactly 35 bytes. Since SQL server stores records in 8K pages, that means that you can now store over 200 records per page, whereas the old system could only hold about 130. This increases the number of rows you can insert before SQL Server succumbs to memory pressure, it speeds up your inserts, and since each record is a fixed length, SQL Server reduces the amount of offset calculation work that’s done to rapidly scan through records in memory.

    Next, let’s eliminate our constraints. This should be done when you’ve tested your code and know that removing the constraint won’t make your data integrity suffer. In the initial table, we had two constraints: a foreign key and a default value. If we’re not really worried about ensuring that each zone ID is rigidly correct, we can forego the foreign key. That means for each insert we no longer need to test the zone database to ensure each ID exists. For a business application, that kind of foreign ID testing is valuable; but for my game I’ll just write a test script to check once per day that no bad zone IDs are being written.

    Finally, I’ll replace the custom SQL text with a stored procedure. This can potentially reduce parser overhead, although our program is simple enough that the parser overhead is pretty low and highly likely to be cached anyways.

    The revised table, stored procedure, and client program is now capable of inserting 10,000 records into my ancient laptop in about 6.552 seconds, for a performance boost of ~15%. The code now looks like these snippets:

    1
     
      2
     
      3
     
      4
     
      5
     
      6
     
      7
     
      8
     
      9
     
      10
     
      11
     
      12
     
      13
     
      14
     
      15
     
      16
     
      17
     
      18
     
      19
     
      20
     
      21
     
      22
     
      23
     
      24
     
      25
     
      26
     
      
    CREATE TABLE item_drops_rev2 (
     
          item_id BIGINT,
     
       
     
          -- This is now a pointer to a lookup table
     
          monster_class_id INT, 
     
       
     
          -- Zone no longer has a FOREIGN KEY constraint.  SQL Server will allow bad
     
          -- values to be loaded, and it's our responsibility to handle them.
     
          zone_id INT, 
     
          xpos REAL, 
     
          ypos REAL,
     
       
     
          -- No longer has a DEFAULT constraint; this means we have to insert the date
     
          -- ourselves, but it reduces the work on SQL server
     
          kill_time datetime 
     
      )
     
       
     
      -- This procedure allows SQL server to avoid virtually all parser work
     
      CREATE PROCEDURE insert_item_drops_rev2
     
           @item_id BIGINT, @monster_class_id INT, @zone_id INT, @xpos REAL, @ypos REAL 
     
      AS
     
       
     
      INSERT INTO item_drops_rev2 
     
          (item_id, monster_class_id, zone_id, xpos, ypos, kill_time) 
     
      VALUES 
     
          (@item_id, @monster_class_id, @zone_id, @xpos, @ypos, GETDATE())
    1
     
      2
     
      3
     
      4
     
      5
     
      6
     
      7
     
      8
     
      9
     
      10
     
      11
     
      12
     
      13
     
      14
     
      
    using (SqlConnection conn = new SqlConnection(my_connection_string)) {
     
          conn.Open();
     
          using (SqlCommand cmd = new SqlCommand("insert_item_drops_rev2", conn)) {
     
       
     
              // Setting the command type ensures that SQL server doesn’t need to do any complex parsing
     
              cmd.CommandType = CommandType.StoredProcedure;
     
              cmd.Parameters.Add("@item_id", item_id);
     
              cmd.Parameters.Add("@monster_class_id", monster_class_id);
     
              cmd.Parameters.Add("@zone_id", zone_id);
     
              cmd.Parameters.Add("@xpos", xpos);
     
              cmd.Parameters.Add("@ypos", ypos);
     
              cmd.ExecuteNonQuery();
     
          }
     
      }

    This is a great set of simple improvements. However, there’s one more change you can make, and that’s the big one.

    Revision Three: Lazy Writes

    Think about your data requirements. Is it essential that every record be written to the database immediately? Can it wait a minute? Five minutes? What would the penalty be if a record was lost? How many records could you lose before your business failed?

    For a bank, losing any monetary transactions can be life or death; but for a game you can define your own risk level. Let’s imagine that, for our monster drop table, we only want to risk losing at most five minutes’ worth of data. This gives us an incredible opportunity for performance improvement: we’ll keep a list of records to insert and batch-insert them once every five minutes! SQL Server offers us a great way to reduce overhead when we batch up work like this: the Transaction.

    A transaction is basically an atomic lump of work. SQL Server guarantees that each transaction will either succeed and be written to the database permanently, or fail completely and no data will be written to the database. This sounds complicated, but keep in mind that transactions also allow SQL Server to forego lots of extra overhead. If you execute ten statements all by themselves, you get ten lock-based overheads; but if you wrap them in a transaction you only have to do your lock-based overhead once.

    The only downside to batching is that we can’t use GETDATE() anymore, since records may be inserted a few minutes after they were generated. Instead we must preserve all the data for all records in memory while waiting for the time to insert a batch. The new code looks like this:

    1
     
      2
     
      3
     
      4
     
      5
     
      6
     
      7
     
      8
     
      9
     
      10
     
      11
     
      12
     
      13
     
      14
     
      15
     
      16
     
      17
     
      18
     
      19
     
      20
     
      21
     
      22
     
      23
     
      24
     
      25
     
      26
     
      27
     
      28
     
      
    using (SqlConnection conn = new SqlConnection(my_connection_string)) {
     
          conn.Open();
     
       
     
          // This transaction tells SQL server to obtain locks once and keep them on hand until the transaction is committed
     
          SqlTransaction trans = conn.BeginTransaction();
     
          for (int i = 0; i < INSERT_SIZE; i++) {
     
              using (SqlCommand cmd = new SqlCommand("insert_item_drops_rev3", conn)) {
     
       
     
                  // Setting the command type ensures that SQL server doesn’t need to do any complex parsing
     
                  cmd.CommandType = CommandType.StoredProcedure;
     
       
     
                  // Joining a transaction means that SQL Server doesn't have to close and re-open locks
     
                  cmd.Transaction = trans;
     
                  cmd.Parameters.Add("@item_id", item_id[i]);
     
                  cmd.Parameters.Add("@monster_class_id", monster_class_id[i]);
     
                  cmd.Parameters.Add("@zone_id", zone_id[i]);
     
                  cmd.Parameters.Add("@xpos", xpos[i]);
     
                  cmd.Parameters.Add("@ypos", ypos[i]);
     
                  cmd.Parameters.Add("@kill_time", kill_time[i]);
     
                  cmd.ExecuteNonQuery();
     
              }
     
          }
     
       
     
          // This statement tells SQL server to release all its locks and write the data to disk.
     
          // If your code had thrown an exception above this statement, SQL Server would instead
     
          // do a "rollback", which would undo all the work since you began this transaction.
     
          trans.Commit();
     
      }

    With this change, we’re now able to insert 10,000 records in 2.605 seconds. That’s about a 66% performance improvement – pretty major! Even more importantly, wrapping sensible objects into a transaction significantly reduces database contention and can really alleviate concurrency bottlenecks. Of course, if we weren’t batching these inserts together, adding a transaction is a slight performance penalty; but whenever you’re grouping lots of commands into one transaction you’ll cut down overhead.

    Revision Four: Table Parameters

    You can see that the above approach really saved us a ton of time. However, to insert 10,000 records into the database we’re still contacting the database 10,000 times. In fact, each call to “cmd.ExecuteNonQuery” generates a roundtrip message from your client application to the database. What if there was a way that we could insert all ten thousand records, but only contact the database server once?

    The good news is that SQL Server 2008 introduced an incredible new capability called “Table Parameters”. Table Parameters work by grouping tons of records together into a single parameter into a stored procedure or SQL statement. This essentially converts overhead performance penalties from O(number of records) to O(number of times you batch insert). Additionally, by reducing the number of SQL commands being executed, you dramatically reduce database contention and improve performance for other programs.

    Here’s the final insert code including table parameters. You may notice that I’ve removed the BeginTransaction() and Commit() calls – those only boost our performance when we’re doing more than one ExecuteNonQuery() command at a time. So here goes:

    1
     
      2
     
      3
     
      4
     
      5
     
      6
     
      7
     
      8
     
      9
     
      10
     
      11
     
      12
     
      13
     
      14
     
      15
     
      16
     
      17
     
      18
     
      19
     
      
    CREATE TYPE item_drop_bulk_table_rev4 AS TABLE (
     
          item_id BIGINT,
     
          monster_class_id INT,
     
          zone_id INT,
     
          xpos REAL,
     
          ypos REAL,
     
          kill_time datetime
     
      )
     
       
     
      CREATE PROCEDURE insert_item_drops_rev4
     
          @mytable item_drop_bulk_table_rev4 READONLY
     
      AS
     
       
     
      INSERT INTO item_drops_rev4 
     
          (item_id, monster_class_id, zone_id, xpos, ypos, kill_time)
     
      SELECT 
     
          item_id, monster_class_id, zone_id, xpos, ypos, kill_time 
     
      FROM 
     
          @mytable
    1
     
      2
     
      3
     
      4
     
      5
     
      6
     
      7
     
      8
     
      9
     
      10
     
      11
     
      12
     
      13
     
      14
     
      15
     
      16
     
      17
     
      18
     
      19
     
      20
     
      21
     
      22
     
      23
     
      24
     
      25
     
      
    DataTable dt = new DataTable();
     
      dt.Columns.Add(new DataColumn("item_id", typeof(Int64)));
     
      dt.Columns.Add(new DataColumn("monster_class_id", typeof(int)));
     
      dt.Columns.Add(new DataColumn("zone_id", typeof(int)));
     
      dt.Columns.Add(new DataColumn("xpos", typeof(float)));
     
      dt.Columns.Add(new DataColumn("ypos", typeof(float)));
     
      dt.Columns.Add(new DataColumn("timestamp", typeof(DateTime)));
     
       
     
      for (int i = 0; i < MY_INSERT_SIZE; i++) {
     
          dt.Rows.Add(new object[] { item_id, monster_class_id, zone_id, xpos, ypos, DateTime.Now });
     
      }
     
       
     
      // Now we&#039;re going to do all the work with one connection!
     
      using (SqlConnection conn = new SqlConnection(my_connection_string)) {
     
          conn.Open();
     
          using (SqlCommand cmd = new SqlCommand("insert_item_drops_rev4", conn)) {
     
              cmd.CommandType = CommandType.StoredProcedure;
     
       
     
              // Adding a "structured" parameter allows you to insert tons of data with low overhead
     
              SqlParameter param = new SqlParameter("@mytable", SqlDbType.Structured);
     
              param.Value = dt;
     
              cmd.Parameters.Add(param);
     
              cmd.ExecuteNonQuery();
     
          }
     
      }

    And what does the performance of this logic look like for inserting 10,000 records? We’re now down to 0.218 seconds, an improvement of over 97% from our initial attempt.

    Conclusions

    Although this calculation isn’t perfect, I’m going to suggest the following estimates for the overhead we encountered in this test. Out of a total 97% speedup we obtained, I’m guessing the breakdown goes roughly as follows:

    • First priority: Reducing contention and removing unnecessary locks (a 51 percentage point gain)
    • Second priority: Reducing the general SQL statement processing overhead by combining multiple queries together (a 31 percentage point gain)
    • Third priority: Removing unnecessary constraints, VARCHARs, and SQL parser overhead (a 15 percentage point gain)

    As you can see, you don’t need to abandon SQL Server to get massive performance improvements. However, it’s easy to forget about SQL Server performance because the technology is so effective. We use database wrappers and database abstraction layers that speed our development time, but separate us from the fine details. If you want to get lots of high performance work out of SQL server, use the same process you’d do if you were tuning C++ code: profile it, track the most frequently called functions, optimize the heck out of them, and finally rethink your work so you call them less frequently.

    Finally let me pass some credit to Tom Gaulton and Rich Skorski who provided valuable feedback in the editing of this article.