#include

Original Author: Don Olmstead

There is nothing that can break a programmer’s concentration quite like a long compile time. As a project grows a simple change can go from a second wait, to a trip to the coffee pot; A complete rebuild becomes something that is only reasonable to do on the way out the door for lunch. And woe to those who forgot a semicolon in a header before they left, coming back to a compilation stopped due to number of errors message and another long wait.

From xkcd

When a file is included in a header it becomes a dependency. Any files that include that particular file then become dependent on its dependencies, and so on down the line. This is expected behavior, but if the file isn’t actually required then the compiler is doing more work than necessary and the build time will suffer. Now once compile times get unbearable there are tools to help, such as our very own Niklas Frykholm’s Header Hero. But rather than treat the disease some preventative care can make all the difference in a code base. By following some simple guidelines compile times can be kept manageable.

Rule #1: Prefer forward declarations

To do its job the compiler needs to be aware of types. Some things it knows about implicitly, these are the built in types such as integers, pointers, and array, while others it needs to be informed of, such as functions, structures and classes. However, the compiler can be treated on a need to know basis. It only needs to know about a type when it actually has to generate code for it. This is where forward declarations come into play.

Assume we have two classes Foo and Bar. In this case Foo holds an instance of Bar within it.

#include <Bar.hpp>
 
class Foo
{
	public:
		Foo(const Bar& bar);
 
	private:
		Bar _bar;
} ;

This is all pretty standard stuff. The Foo class needs to include the declaration of Bar. So in terms of our compile times anything that needs access to Foo will end up including Bar, and anything Bar includes, and so on down the line.

Now if Foo contained a pointer to Bar internally we could rewrite the code to the following and use a forward declaration for Bar.

class Bar;
 
class Foo
{
	public:
		Foo(Bar* bar);
 
	private:
		Bar* _bar;
} ;

So now anyplace that includes Foo, just includes Foo. It doesn’t have to include Bar or any of Bar’s dependencies. We just move the include to the source file and this will compile without issue.

Now what if we rewrote the first bit of code to use a forward declaration?

class Bar;
 
class Foo
{
	public:
		Foo(const Bar& bar);
 
	private:
		Bar _bar;
} ;

If we compile this we’ll get an error. Visual Studio outputs the following.

error C2079: 'Foo::_bar' uses undefined class 'Bar'

Why this occurs goes back to what the compiler implicitly knows about. The compiler knows what a pointer is, but has no idea what Bar is unless it’s explicitly declared.

Now in this case we could wonder why this is actually an issue. Nothing is being implemented within the header so why does the compiler care?

Well for one thing what is the size of Foo? We have no idea since we don’t know the size of Bar. This is why the compile will fail. The compiler does know what the size of a pointer is though, so with a forward declaration there’s enough information to determine the number of bytes in an instance of Foo.

This is also the reason why a forward declaration cannot be used for a base class. The size of the derived class is dependent on the size of the base class. Because of this an include is required for any base classes.

There is also the matter of references. C++ added reference types in addition to the value and pointer types that C supports. If you ask a C programmer about references they’d probably say they were nothing more than semantic sugar for a pointer, and to a degree they’re correct. Under the hood a reference type is a pointer without the possibility of a null value that is treated with value semantics. With regards to forward declaration of reference types they function the same as pointer types.

Templates also throw a wrench in the gears, but the same rules apply. If Bar was a templated class, Bar, and if Foo contains an instance of Bar then both Bar and BarImp needed to be included by Foo. Now if we had a reference to Bar we can use a forward declaration for both classes.

template <typename Imp>
class Bar;
 
class BarImpl;
 
class Foo
{
	public:
		Foo(const Bar<BarImp>& bar);
 
	private:
 
		const Bar<BarImp>& _bar;
} ;

The following table summarizes whether an include or forward declaration is required. The same rationale also applies to function and method declarations.

Case Include of Forward declaration
Foo contains Bar Include Bar
Foo contains Bar* Declare Bar
Foo contains Bar& Declare Bar
Foo descends from Bar Include Bar
Foo contains Bar Include Bar and BarImp
Foo contains Bar Include Bar and Declare BarImp
Foo contains Bar* Declare Bar and BarImp
Foo contains Bar& Declare Bar and BarImp

Rationale

If there is one rule you should abide by it’s this one. By using forward declarations when possible the number of included files will go down resulting in reduced build times.

Rule #2: Include the associated header first

Header files should include all the files they depend on and nothing more. One way to ensure that all the dependencies are there is to always include the associated header first thing within the source file. By doing this the compiler will generate an error if the dependencies are not included.

If we go back to the example of the Foo class containing an instance of Bar there is a way to make the forward declaration work. The following code will compile without issue.

#include <Foo.hpp>
#include <Bar.hpp>
 
Foo::Foo(const Bar& bar)
: _bar(bar)
{ }
 
// Other method definitions

The reason why this works is due to the include order. Since Bar is included first the compiler knows about it before it begins parsing Foo’s header file. While this works the effort required to manage the dependencies quickly becomes a nightmare. Because of this include the associated header first to make sure all the dependencies are referenced within the header.

Rationale

This rule ensures that a header file always includes all its dependencies.

Rule #3: Separate declaration and implementation

In C/C++ source code is organized into header files, which contain declarations, and source files, which contain definitions. If we restrict ourselves to these two file types, .h/.hpp and .c/.cpp, there are ways that the definitions can start to creep into the declaration. This is especially true for templates, as the entirety of the implementation needs to be visible to the compiler, but also code that is marked inline. We can add another file type to the mix to handle these cases, the inline implementation file.

An inline implementation file acts the same as a source file but contains only those functions and methods that need to be visible to the compiler. It also contains dependencies to other inline files. These files are then included within the source files.

So if we go back to our Foo containing an instance of Bar example and each of the classes had a corresponding inline implementation file then Foo.ipp would look like the following.

#ifndef PROJECT_FOO_IPP_INCLUDED
#define PROJECT_FOO_IPP_INCLUDED
 
#include <Foo.hpp>
 
inline const Bar& Foo::getBar() const
{
	return _bar;
}
 
#endif

In terms of convention .inl and .ipp are typical file extensions for inline implementations. Another convention appends impl to the associated header file name, e.g. Foo_impl.hpp.

Rationale

This ensures the header file only contains declarations which will keep the file size down, which helps build time. It also has the added benefit of keeping interface and implementation completely separate.

Rule #4: Hide your includes

The include directive has two variants, #include and #include “file”, which have different behaviors. The form used specifies the order the directories are searched by the compiler. Here’s what MSDN has to say about the behavior of Visual Studio.

Syntax Form Action
Quoted form The preprocessor searches for include files in the following order:

  1. In the same directory as the file that contains the #include statement.
  2. In the directories of any previously opened include files in the reverse order in which they were opened. The search starts from the directory of the include file that was opened last and continues through the directory of the include file that was opened first.
  3. Along the path specified by each /I compiler option.
  4. Along the paths specified by the INCLUDE environment variable.
Angle-bracket form The preprocessor searches for include files in the following order:

  1. Along the path specified by each /I compiler option.
  2. When compiling from the command line, along the paths that are specified by the INCLUDE environment variable.

So if you use #include “file” the compiler will end up finding it no matter where the file lives. The issue isn’t the functionality it’s the semantics.

By using quotes the include functions as an internal definition. An excellent example of this brand of structuring is Lua. If you were to package Lua 5.1.4 for distribution as a binary library you would need to expose four header files out of the twenty-three in the source code for it to be usable by the client. The rest of the header files correspond to internal functionality that don’t need to be accessed by customer.

This plays into the pimpl, pointer to implementation, idiom. You may have noticed this idiom in use above during the forward declaration example, though it was never explicitly stated. To reiterate — and save us from scrolling up — if we had a class Foo that contained a pointer to Bar we only require a forward declaration. If Bar is only needed by Foo and is not accessed otherwise it can be made private. So the source file would resemble this.

#include <Foo.hpp>
#include “Bar.hpp”
 
Foo::Foo(Bar* bar)
: _bar(bar)
{ }
 
// Other method definitions

And the library would look like this.

Any cases where code is only used internally it should not be visible to external clients. If Bar was accessible and could be acted on by callers then it needs to be moved into the include directory for the library.

Rationale

This rule has to do with the structure of the project. By storing the include file within the source itself we are hiding its implementation. In terms of compile times, if it can’t be accessed then it can’t be included accidentally.

Rule #5: Use precompiled headers

To combat compile times some vendors offer an optimization technique known as precompiled headers. The compiler converts the specified header files into an “intermediate format” that is easier to parse. Since the file is easier for the compiler to parse this can result in drastically reduced compile times.

“Intermediate format” is a bit of a misnomer as the format the precompiled header is transformed into is dependent on the compiler. Within Visual Studio a precompiled header creates memory mapped files, which aren’t very intermediate at all. In recent versions these files are stored in the ipch directory by default.

In terms of specifying a precompiled header it’s nothing more than a header file containing common header files within the project. This file is included first within each source file contained within the project by convention, which means the header file associated with the source file gets bumped to the number two slot.

When creating a Visual Studio project the option to use precompiled headers is presented. If selected a file, stdafx.h, is generated. For a Win32 project it creates the following.

// stdafx.h : include file for standard system include files,
// or project specific include files that are used frequently, but
// are changed infrequently
//
 
#pragma once
 
#include "targetver.h"
 
#define WIN32_LEAN_AND_MEAN             // Exclude rarely-used stuff from Windows headers
// Windows Header Files:
#include <windows.h>
 
// C RunTime Header Files
#include <stdlib.h>
#include <malloc.h>
#include <memory.h>
#include <tchar.h>
 
// TODO: reference additional headers your program requires here

To get the best results from precompiled headers the care and feeding is very important. Misuse of the functionality can actually result in build times that are worse.

The most important guideline is to only specify files that never change, or change infrequently. If the file was changing frequently then the compiler will have to keep regenerating the precompiled header which takes a considerable amount of time. By default windows.h is added to the precompiled header which gives insight on what qualifies. Really any third party libraries used within a project pass the litmus test for inclusion. This includes the STL which is also a prime candidate assuming you’re using it.

The other guideline to follow is to specify files that are used frequently. If you were to include everything but the kitchen sink within the precompiled header then there’s that much more for the compiler to look for within the precompiled header. If we thought of it like a cache then by including too much we’re pulling in more information than needed and causing misses which cause performance to tank.

It should be noted that not all compilers support precompiled headers. To work around this we can provide a guard around the headers to precompile. For compilers supporting precompiled headers, such as GCC and Visual Studio, we would just use a define to denote this, which would look like the following.

#ifndef PROJECT_PCH_HPP_INCLUDED
#define PROJECT PCH_HPP_INCLUDED
 
#ifdef PROJECT_USE_PRECOMPILED_HEADER
 
// Insert include files
 
#endif
 
#endif

Rationale

Precompiled headers are such a performance gain that they should be used on any platform that supports them.

Rule #6: Prefer include guards over pragma once

Compilers makers often add features to improve their performance, and allow more control of their output. Some features get adopted by other competing compilers till they end up being “standard” but without a stamp of approval from the language. The directive pragma once is one of those features. Its origins are murky but legends speak of it being a GCC addition that was then co-opted by other vendors.

The intent of pragma once is the same as an include guard, to ensure that a compilation unit is only brought in a single time. It has the added benefit of avoiding the naming collisions that could occur with include guards. As pragma once was created to help speed up compile time it also has a reputation of being faster than an include guard.

Since its inception compilers have matured and the validity of pragma once improving compile times has come into question. GCC deprecated the directive, but later reneged on the decision. Intel claims no noticeable performance gains by using it. And in an experiment Noel Llopis found no difference between the the two in Visual Studio, minus some extreme cases.

Rationale

With include guards offering the same performance as pragma once it is recommended to stick with the standard. This is especially true if your code will be used externally as pragma once isn’t everywhere. Though feel free to disregard this tip if your code is only used internally and it’s your experience that compile times are lessened via the pragma.

Rule #7: Don’t use redundant include guards

Another technique purported to speed up compilation is the redundant include guard. This looks exactly as one would expect, all includes are wrapped in an additional guard like the following.

#ifndef PROJECT_BAR_HPP_INCLUDED
#include <Bar.hpp>
#endif
 
class Foo
{
	public:
		Foo(const Bar& bar);
 
	private:
		Bar _bar;
} ;

The rationale for the redundant include guard is to prevent the compiler from opening the file to find out that it doesn’t need to open the file. Since the guard contains the same logic as the header file it can save on the number of include files the compiler needs to parse.

This is another case of compiler technology advancing enough that the technique isn’t the silver bullet it once was. The case has been optimized for and provides no additional speedup.

Rationale

Redundant include guards put an additional burden on the programmer and clutter code files. They also require the guards to be updated if the include guard for the file changes. As with pragma once if you’re experience speaks otherwise feel free to use them, but modern compilers should perform the same with or without them.

Summary

Compile times can be managed before they become the bane of your existence as a programmer. Remember the following.

  1. Prefer forward declarations
  2. Include the associated header first
  3. Separate declaration and implementation
  4. Hide your includes
  5. Use precompiled headers
  6. Prefer include guards over pragma once
  7. Don’t use redundant include guards

By following these simple rules the amount of time waiting for the compiler will be minimized and the build time will decreased. There is also the matter of linker with regards to the build time but we’ll save that discussion for another time.

Also posted to my personal blog.

Top Ten Technologies of 2011

Original Author: Bruce Dawson

It seemed wrong to do a ‘normal’ blog post on December 23rd, and my programming themed “Night Before Christmas” rap medley never quite came together, so instead I’m doing a top-ten list.

Well, not really a “top-ten” list, but a list of ten indispensable programming technologies I’ve used this year. I’ve avoided numbering my top-ten list because I couldn’t come up with any sensible way to compare these disparate programs. They have all been crucial. I’ve blogged about most of them, and if you’re not using them or some equally good equivalent you should fix that. I use every one of these technologies both at work (on projects with dozens of developers and industrial strength infrastructure) and at home (on projects with one developer and with the ‘build machine’ being a different enlistment on my laptop).

I develop on Windows so there is a significant bias towards Microsoft products. The good news is that six of the seven Microsoft products on this list are available for free (and even the seventh can be free), and two of the three non-Microsoft products are also free (and even the third is free for limited use).

Without further ado, here is my list:

  • Visual Studio 2010 – I’m enjoying using parts of C++0x including move constructors, static_assert, and the ‘auto’ and ‘override’ keywords. The “Navigate To” functionality (Ctrl+,) lets me explore source code more effectively, and there are other hidden improvements as well.
  • /Analyze – I’ve written about this feature a number of times already, but this is a good place to summarize its effect. By using /analyze for static code analysis I have found and fixed over a thousand serious bugs this year, and it now monitors our code to prevent entire classes of bugs from ever returning.
  • App Verifier – I’ve written about this as well and, while it’s only helped me find a few dozen bugs, these were all serious (race conditions and memory corruption) and would have been painfully difficult or impossible to find without such a tool.
  • Symbol server – having symbols show up when I need them is magically essential. I now even put locally built symbols into a local symbol server so that I never again invalidate a crash dump by rebuilding the DLLs that it depends on.
  • Source indexing – every day I save time because the source file that I need shows up automatically. I love the fact that, no matter what product, branch, or version the code is from, the debugger will automatically find the file instantly. If you ever debug code that is not built on your machine then you need this.
  • Xperf – knowing how to use xperf is like having x-ray vision while everyone else is blindfolded. This tool has let me quickly find and fix innumerable performance problems, in our software and in the tools that we use.
  • WinDbg – the debugger I love to hate. It’s UI is execrable but it does a couple of things that Visual Studio’s debugger doesn’t, so I keep it around for the clarifying second opinion that is occasionally crucial. More on this in the new year.
  • Perforce – change lists, labels, file history that I can believe in, I love this version control software. I haven’t used anything else for more than a decade, so I can’t compare it to any of its worthy competitors, but it is excellent.
  • sysinternals – for exploring or monitoring of processes these tools are tough to beat. Many problems can be solved by proper applying one of these.
  • Python – a good scripting language is indispensable and it’s good to be at a company where Python is one of the main choices. Indenting for scope makes me smile.

I thought about having a top 10 list with xperf and /analyze but a top 1010 list seemed more fun.

What’s missing from this list? What deserves to be removed? Let me know.

Static Code Analysis

Original Author: John-Carmack

The most important thing I have done as a programmer in recent years is to aggressively pursue static code analysis.  Even more valuable than the hundreds of serious bugs I have prevented with it is the change in mindset about the way I view software reliability and code quality.

It is important to say right up front that quality isn’t everything, and acknowledging it isn’t some sort of moral failing.  Value is what you are trying to produce, and quality is only one aspect of it, intermixed with cost, features, and other factors.  There have been plenty of hugely successful and highly regarded titles that were filled with bugs and crashed a lot; pursuing a Space Shuttle style code development process for game development would be idiotic.  Still, quality does matter.

I have always cared about writing good code; one of my important internal motivations is that of the craftsman, and I always want to improve.  I have read piles of books with dry chapter titles like “Policies , Standards, and Quality Plans”, and my work with Armadillo Aerospace has put me in touch with the very different world of safety critical software development.

Over a decade ago, during the development of Quake 3, I bought a license for PC-Lint and tried using it – the idea of automatically pointing out flaws in my code sounded great.  However, running it as a command line tool and sifting through the reams of commentary that it produced didn’t wind up winning me over, and I abandoned it fairly quickly.

Both programmer count and codebase size have grown by an order of magnitude since then, and the implementation language has moved from C to C++, all of which contribute to a much more fertile ground for software errors.  A few years ago, after reading a number of research papers about modern static code analysis, I decided to see how things had changed in the decade since I had tried PC-Lint.

At this point, we had been compiling at warning level 4 with only a very few specific warnings disabled, and warnings-as-errors forced programmers to abide by it.  While there were some dusty reaches of the code that had years of accumulated cruft, most of the code was fairly modern.  We thought we had a pretty good codebase.

Coverity

Initially, I contacted Coverity and signed up for a demo run.  This is serious software, with the licensing cost based on total lines of code, and we wound up with a quote well into five figures.  When they presented their analysis, they commented that our codebase was one of the cleanest of its size they had seen (maybe they tell all customers that to make them feel good), but they presented a set of about a hundred issues that were identified.  This was very different than the old PC-Lint run.  It was very high signal to noise ratio – most of the issues highlighted were clearly incorrect code that could have serious consequences.

This was eye opening, but the cost was high enough that it gave us pause.  Maybe we wouldn’t introduce that many new errors for it to catch before we ship.

Microsoft /analyze

I probably would have talked myself into paying Coverity eventually, but while I was still debating it, Microsoft preempted the debate by incorporating their /analyze functionality into the 360 SDK.  /Analyze was previously available as part of the top-end, ridiculously expensive version of Visual Studio, but it was now available to every 360 developer at no extra charge.  I read into this that Microsoft feels that game quality on the 360 impacts them more than application quality on Windows does. 🙂

Technically, the Microsoft tool only performs local analysis, so it should be inferior to Coverity’s global analysis, but enabling it poured out mountains of errors, far more than Coverity reported.  True, there were lots of false positives, but there was also a lot of scary, scary stuff.

I started slowly working my way through the code, fixing up first my personal code, then the rest of the system code, then the game code.  I would work on it during odd bits of free time, so the entire process stretched over a couple months.  One of the side benefits of having it stretch out was that it conclusively showed that it was pointing out some very important things – during that time there was an epic multi-programmer, multi-day bug hunt that wound up being traced to something that /analyze had flagged, but I hadn’t fixed yet.  There were several other, less dramatic cases where debugging led directly to something already flagged by /analyze.  These were real issues.

Eventually, I had all the code used to build the 360 executable compiling without warnings with /analyze enabled, so I checked it in as the default behavior for 360 builds.  Every programmer working on the 360 was then getting the code analyzed every time they built, so they would notice the errors themselves as they were making them, rather than having me silently fix them at a later time.  This did slow down compiles somewhat, but /analyze is by far the fastest analysis tool I have worked with, and it is oh so worth it.

We had a period where one of the projects accidentally got the static analysis option turned off for a few months, and when I noticed and re-enabled it, there were piles of new errors that had been introduced in the interim.  Similarly, programmers working just on the PC or PS3 would check in faulty code and not realize it until they got a “broken 360 build” email report.  These were demonstrations that the normal development operations were continuously producing these classes of errors, and /analyze was effectively shielding us from a lot of them.

Bruce Dawson has blogged about working with /analysis a number of times.

PVS-Studio

Because we were only using /analyze on the 360 code, we still had a lot of code not covered by analysis – the PC and PS3 specific platform code, and all the utilities that only ran on the PC.

The next tool I looked at was PVS-Studio. It has good integration with Visual Studio, and a convenient demo mode (try it!). Compared to /analyze, PVS-Studio is painfully slow, but it pointed out a number of additional important errors, even on code that was already completely clean to /analyze. In addition to pointing out things that are logically errors, PVS-Studio also points out a number of things that are common patterns of programmer error, even if it is still completely sensible code. This is almost guaranteed to produce some false positives, but damned if we didn’t have instances of those common error patterns that needed fixing.

There are a number of good articles on the PVS-Studio site, most with code examples drawn from open source projects demonstrating exactly what types of things are found. I considered adding some representative code analysis warnings to this article, but there are already better documented examples present there. Go look at them, and don’t smirk and think “I would never write that!”

PC-Lint

Finally, I went back to Visual Lint for IDE integration.  In the grand unix tradition, it can be configured to do just about anything, but it isn’t very friendly, and generally doesn’t “just work”.  I bought a five-pack of licenses, but it has been problematic enough that  I think all the other developers that tried it gave up on it.  The flexibility does have benefits – I was able to configure it to analyze all of our PS3 platform specific code, but that was a tedious bit of work.

Once again, even in code that had been cleaned by both /analyze and PVS-Studio, new errors of significance were found.  I made a real effort to get our codebase lint clean, but I didn’t succeed.  I made it through all the system code, but I ran out of steam when faced with all the reports in the game code.  I triaged it by hitting the classes of reports that I worried most about, and ignored the bulk of the reports that were more stylistic or potential concerns.

Trying to retrofit a substantial codebase to be clean at maximum levels in PC-Lint is probably futile.  I did some “green field” programming where I slavishly made every picky lint comment go away, but it is more of an adjustment than most experienced C/C++ programmers are going to want to make.  I still need to spend some time trying to determine the right set of warnings to enable to let us get the most benefit from PC-Lint.

Discussion

I learned a lot going through this process.  I fear that some of it may not be easily transferable, that without personally going through hundreds of reports in a short amount of time and getting that sinking feeling in the pit of your stomach over and over again, “we’re doing OK” or “it’s not so bad” will be the default responses.

The first step is fully admitting that the code you write is riddled with errors.  That is a bitter pill to swallow for a lot of people, but without it, most suggestions for change will be viewed with irritation or outright hostility.  You have to want criticism of your code.

Automation is necessary.  It is common to take a sort of smug satisfaction in reports of colossal failures of automatic systems, but for every failure of automation, the failures of humans are legion.  Exhortations to “write better code” plans for more code reviews, pair programming, and so on just don’t cut it, especially in an environment with dozens of programmers under a lot of time pressure.  The value in catching even the small subset of errors that are tractable to static analysis every single time is huge.

I noticed that each time PVS-Studio was updated, it found something in our codebase with the new rules.  This seems to imply  that if you have a large enough codebase, any class of error that is syntactically legal probably exists there.  In a large project, code quality is every bit as statistical as physical material properties – flaws exist all over the place, you can only hope to minimize the impact they have on your users.

The analysis tools are working with one hand tied behind their back, being forced to infer information from languages that don’t necessarily provide what they want, and generally making very conservative assumptions.  You should cooperate as much as possible – favor indexing over pointer arithmetic, try to keep your call graph inside a single source file, use explicit annotations, etc.  Anything that isn’t crystal clear to a static analysis tool probably isn’t clear to your fellow programmers, either.  The classic hacker disdain for “bondage and discipline languages” is short sighted – the needs of large, long-lived, multi-programmer projects are just different than the quick work you do for yourself.

NULL pointers are the biggest problem in C/C++, at least in our code.  The dual use of a single value as both a flag and an address causes an incredible number of fatal issues.  C++ references should be favored over pointers whenever possible; while a reference is “really” just a pointer, it has the implicit contract of being not-NULL.  Perform NULL checks when pointers are turned into references, then you can ignore the issue thereafter.  There are a lot of deeply ingrained game programming patterns that are just dangerous, but I’m not sure how to gently migrate away from all the NULL checking.

Printf format string errors were the second biggest issue in our codebase, heightened by the fact that passing an idStr instead of idStr::c_str() almost always results in a crash, but annotating all our variadic functions with /analyze annotations so they are properly type checked kills this problem dead.  There were dozens of these hiding in informative warning messages that would turn into crashes when some odd condition triggered the code path, which is also a comment about how the code coverage of our general testing was lacking.

A lot of the serious reported errors are due to modifications of code long after it was written.  An incredibly common error pattern is to have some perfectly good code that checks for NULL before doing an operation, but a later code modification changes it so that the pointer is used again without checking.  Examined in isolation, this is a comment on code path complexity, but when you look back at the history, it is clear that it was more a failure to communicate preconditions clearly to the programmer modifying the code.

By definition, you can’t focus on everything, so focus on the code that is going to ship to customers, rather than the code that will be used internally.  Aggressively migrate code from shipping to isolated development projects.  There was a paper recently that noted that all of the various code quality metrics correlated at least as strongly with code size as error rate, making code size alone give essentially the same error predicting ability.  Shrink your important code.

If you aren’t deeply frightened about all the additional issues raised by concurrency, you aren’t thinking about it hard enough.

It is impossible to do a true control test in software development, but I feel the success that we have had with code analysis has been clear enough that I will say plainly it is irresponsible to not use it.  There is objective data in automatic console crash reports showing that Rage, despite being bleeding edge in many ways, is remarkably more robust than most contemporary titles.  The PC launch of Rage was unfortunately tragically flawed due to driver problems — I’ll wager AMD does not use static code analysis on their graphics drivers.

The takeaway action should be:  If your version of Visual Studio has /analyze available, turn it on and give it a try.  If I had to pick one tool, I would choose the Microsoft option.  Everyone else working in Visual Studio, at least give the PVS-Studio demo a try.  If you are developing commercial software, buying static analysis tools is money well spent.

A final parting comment from twitter:

Dave Revell @dave_revell  The more I push code through static analysis, the more I’m amazed that computers boot at all.

SOPA and the Entertainment Software Association

Original Author: Kevin Gadd

A lot of people have been talking about a bill called the Entertainment Software Association, or ESA.

The ESA is a lobbying association that acts on behalf of game developers, game publishers, and other industry groups, and you may be familiar with the convention they organize – E3. I won’t tell you what you should think about SOPA, but I will say this: If game development is an important part of your life, you should let the ESA know what you think about SOPA and whether you think they should support it on your behalf. An easy way to do so would be to email them at their contact address, esa@theesa.com. Doing so will make it easier for the ESA to act on our behalf in an informed fashion and help make sure justice is served when the House decides whether to pass the bill.

I now return you to your regularly scheduled ADBAD programming. Thanks for reading.


#include

Original Author: Don Olmstead

There is nothing that can break a programmer’s concentration quite like a long compile time. As a project grows a simple change can go from a second wait, to a trip to the coffee pot; A complete rebuild becomes something that is only reasonable to do on the way out the door for lunch. And woe to those who forgot a semicolon in a header before they left, coming back to a compilation stopped due to number of errors message and another long wait.

From xkcd

Compulsion Loops in the Short, Medium and Long-term

Original Author: Pete Collier

Compulsion loops in the short, medium and long-term, if visualised, would resemble helm chain

Introduction:

Compulsion loop is a very fancy term isn’t it? It’s a recently coined term, perhaps a couple of years old, I’m unsure of its exact origin. However, there is no official definition of it, so as a result it’s often bandied around with slight nervousness lest someone asks you what it actually is. This of course is the point at which you run away screaming, mumbling crazily something about Zynga and the horrors of manipulation, death and pestilence or whatever.

My definition of a compulsion loop is that it is a construct designed to keep someone engaged by marrying their action/s with an appropriate level of reward. But in essence it has been around forever, we just didn’t have a name for it. The greatest creators of entertainment have had an innate understanding of it since we all sat around the fire scratching our heads as Neanderthals. It’s nothing new. Different mediums achieve it in their own way, but all essentially reward for attention and effort and when this equilibrium falters the loop dissipates.

Here is the rub though; reward for effort in the short-term is very different to what we want in the long-term. Whack-a-mole forever would be hell on earth right? At some point we want a deeper sense of reward from an activity beyond the short-term thrills and if we don’t get it, we move on. It is diminishing returns.

So we need to examine what keeps people compelled beyond the here and now. To this end I believe there needs to be three compulsion loops engaging us at the same time, tapping into our desire for different kinds of gratification in the short, medium and long-term. The most compelling experiences keep on giving through all these timeframes allowing people to form a deep and lasting attachment, which is of course a quality that society and all of us individually cherish in an experience.

So how do we aim to achieve this with our games? To answer this I think we need to look at what drives us in each of these timeframes, then ask how that can help inform the way we structure our games. Of course there have been special games we all hold dear that have already demonstrated a masterful understanding of this topic and I don my hat to them. Hopefully then, these examples may pop into your head as you read further.

 

Short-term: Preoccupation and instant gratification

This is, of course, familiar territory to us all; we live it moment to moment. It is this glorious realm in which instant gratification rules supreme and we revel in our actions having direct consequence and reward.

Of all the timeframes this is perhaps the easiest to reduce the human being to monkey-mash-button-for-banana (I’m not going to mention the games guilty of this!). It’s here where the player’s actions should have a tangible effect on the game world and be sufficient as to preoccupy the player.

Preoccupation is important in the short-term because it is also the point at which us humans are most likely to take flight and find the next pretty flower to buzz to. Your grip on the player is at its most vulnerable so you need to be sticky and your compulsion loops tight to keep the player occupied.

The aim here is for the player to form a sense of attachment. Preoccupying the player can achieve this because it means the player will sink time into the experience. This is significant because investment inevitably seeks a return and the player will naturally develop a growing curiosity on what form this may take.

The point at which the player begins to care is crucial as it ushers in the opportunity to loosen the compulsion loops a bit and build toward something with a bit more substance in the longer term.

Bee a sticky flower

 

Medium-term: Construction and deferred gratification

People are constantly looking for reasons to give up on something. If we don’t feel we’re getting back what we put in, we’ve become very adept at recognising the signals and moving on.

Our lives are now crammed with many opportunities promising instant reward and yes many of them are false and empty, but their potential for distraction is very real. You could indeed argue many people are permanently distracted by this chase (I’m looking at you western culture).

Now more than ever your game needs to give compelling reasons for the player to return. This needs to involve long-term attachment because short-term compulsion has such a weak hold, even when done well. A deeper hook is needed to keep the player caught.

The transition point between the short and long-term is therefore a critical juncture. The fragile attachment developed by players in the short-term can easily shatter and curiosity wane. That deeper hook needs to be alluded to in the medium-term by giving context to the player’s actions. Framing short-term activity as building blocks contributing toward a greater structure should be a key goal for the designer.

It is the job of compulsion loops in the medium-term to convert the player from folly and a dazed state of preoccupation to a sense of clear purpose. Action needs to be rewarded with a sense of contribution toward something, rather than being disparate, unconnected and meaningless.

If you still have the player in the medium-term you are affording more leeway with your compulsion loops but it is crucial that they are leveraged as much as possible. The medium-term is about deferred gratification because the action here is not about offering a direct reward but the promise of a greater one if you continue building.

Frame the contribution toward a greater structure…

 

Long-term: Legacy and reflective gratification

As in life, players want their short and medium term activities to amount to something in the long-term. We all seek meaning, if the short-term can be the worst of us; the long-term can be the most noble in our pursuit of it.

Building toward something is therefore a hugely compelling force. Through leaving something behind we add definition to the universe and therefore meaning to others. Perhaps legacy is too lofty a concept for games to achieve or indeed for game designers to dare talk about in those terms. But actually, it is through play that we express our understanding of this world and our part in it. Games more than deserve their seat at that particular hallowed table.

So in practical terms what do we need to achieve in this timeframe? What can repay the player devotion exercised up to this point? The answer actually is that the journey repays itself, it is the reward; the long-term structures we allow the player to build toward are simple reflections and distillations of this. Yes it is corny, yes it’s a platitude uttered in many a Disney film but dammit it’s true!

Long-term compulsion loops then, are afforded the greatest size but it is imperative that the structure reflects accomplishment, allows for demonstration of mastery and serves as a meaningful embodiment of the player’s journey. In an ironic final twist the long-term compulsion loop is but a mere apparition. It never fulfils its loop, but it doesn’t matter.

What does the player leave behind?

 

Conclusion:

We are compelled to draw more from life, as we are in a game, the more we’re engaged by it. It is our conviction of purpose in the short and medium-term that ultimately justify our achievements in the long-term and keep us motivated to keep pushing forward. It’s when we lose this that things stagnate and we seek change.

Compulsion loops are often cited as the worst of game development, that it is somehow bringing too much science and manipulation into the art of making games. In my opinion it is actually the opposite and quite beautiful. It lies at the very beating heart of human nature and what we want from our experiences. I say, what better way is there, in fact, to frame how we should approach the way in which we make our games?

 

This article was originally posted on my blog: Compulsion Loops in the Short, Medium and Long-term


C / C++ Low Level Curriculum Part 4: More Stack

Original Author: Alex Darby

Welcome to the 4th part of my C/C++ Low Level Curriculum – more Stack!

The last post was a mammoth and took me ages so this post is going to be significantly shorter, and will consequently cover less ground. Specifically we’re going to look at how more than one parameter is passed in compiler generated unoptimised x86 assembler using the stdcall calling convention.

This post assumes that you have read the previous post on the Stack (or know how the Stack works in “vanilla” x86 assembler already).

If you missed the previous posts here are backlinks:

  1. /2011/11/09/a-low-level-curriculum-for-c-and-c/
  2. /2011/11/24/c-c-low-level-curriculum-part-2-data-types/
  3. /2011/12/14/c-c-low-level-curriculum-part-3-the-stack/

I’ve also dropped in a good link to some IBM resources on the PowerPC ABI which explain in detail (and at the assembler level) how the Stack is used on PowerPC based CPUs as it is actually more different from x86 than I remembered. You may find this particularly useful if you work on Current Gen consoles and want to understand how they use the Stack, call functions, and pass parameters.

 

More than one function parameter

As before, I’m using a win32 console application made by the “new project” wizard in VS2010 with the default options. The disassembly we’ll be looking at is from the debug build configuration, which generates vanilla unoptimised stdcall x86 code.

The only change I make is to turn off “Basic Runtime Checks” to make the generated assembler more legible (and significantly faster…) see the previous post for details on how to do this.

We’re going to update the very simple program used for the last article so that the function it calls requires 3 parameters.

int SumOf( int iParamOne, int iParamTwo, int iParamThree )
 
  {
 
      int iLocal = iParamOne + iParamTwo + iParamThree;
 
      return iLocal;
 
  }
 
  
 
  int main( int argc, char** argv )
 
  {
 
      int iValOne   = 1;
 
      int iValTwo   = 2;
 
      int iValThree = 4;
 
      int iResult   = SumOf( iValOne, iValTwo, iValThree );
 
      return 0;
 
  }

and here’s the assembler it generates for main() (as before the addresses of the instructions will almost certainly differ for you):

     7: int main( int argc, char** argv )
 
       8: {
 
  00401280  push        ebp
 
  00401281  mov         ebp,esp
 
  00401283  sub         esp,50h
 
  00401286  push        ebx
 
  00401287  push        esi
 
  00401288  push        edi
 
       9:     int iValOne        = 1;
 
  00401289  mov         dword ptr [ebp-4],1
 
      10:     int iValTwo        = 2;
 
  00401290  mov         dword ptr [ebp-8],2
 
      11:     int iValThree    = 4;
 
  00401297  mov         dword ptr [ebp-0Ch],4
 
      12:     int iResult        = SumOf( iValOne, iValTwo, iValThree );
 
  0040129E  mov         eax,dword ptr [ebp-0Ch]
 
  004012A1  push        eax
 
  004012A2  mov         ecx,dword ptr [ebp-8]
 
  004012A5  push        ecx
 
  004012A6  mov         edx,dword ptr [ebp-4]
 
  004012A9  push        edx
 
  004012AA  call        00401127
 
  004012AF  add         esp,0Ch
 
  004012B2  mov         dword ptr [ebp-10h],eax
 
      13:     return 0;
 
  004012B5  xor         eax,eax
 
      14: }
 
  004012B7  pop         edi
 
  004012B8  pop         esi
 
  004012B9  pop         ebx
 
  004012BA  mov         esp,ebp
 
  004012BC  pop         ebp
 
  004012BD  ret

 

Calling SumOf()

As we saw in the last article, we know we can safely ignore the function preamble and postamble (lines 3-8, and lines 28-33; also known as the prologue and epilogue respectively) which set up and tear down the function’s Stack Frame as we know they’re not involved in passing the parameters to SumOf().

A quick look at the disassembly initialising the local variables tells us that iValOne, iValTwo, and iValThree are stored at [ebp-4], [ebp-8], and [ebp-0Ch] respectively.

The disassembly relevant to the function call and the assignment of its return value is this part:

    12:     int iResult        = SumOf( iValOne, iValTwo, iValThree );
 
  0040129E  mov         eax,dword ptr [ebp-0Ch]
 
  004012A1  push        eax
 
  004012A2  mov         ecx,dword ptr [ebp-8]
 
  004012A5  push        ecx
 
  004012A6  mov         edx,dword ptr [ebp-4]
 
  004012A9  push        edx
 
  004012AA  call        00401127
 
  004012AF  add         esp,0Ch
 
  004012B2  mov         dword ptr [ebp-10h],eax

As in the case with a single argument, copies of the function parameters’ values are pushed onto the Stack – but note that they are pushed on in the opposite order to the order in which the function’s parameter list expects them in the C++ code.

The final thing to note, is that following the call instruction on line 22 (i.e. immediately before the assembler for SumOf() is executed) the copy of iValOne that was pushed onto the stack in line 21 is at [esp+4] because call pushes the return address onto the Stack.

Just in case, here’s what the stack looks like immediately after line 22 is executed, but before any code in SumOf() is executed:

Stack after line 22, before any of SumOf() is executed

 

Accessing the parameters

Here’s the disassembly for SumOf():

     1: int SumOf( int iParamOne, int iParamTwo, int iParamThree )
 
       2: {
 
  00401250  push        ebp
 
  00401251  mov         ebp,esp
 
  00401253  sub         esp,44h
 
  00401256  push        ebx
 
  00401257  push        esi
 
  00401258  push        edi
 
       3:     int iLocal = iParamOne + iParamTwo + iParamThree;
 
  00401259  mov         eax,dword ptr [ebp+8]
 
  0040125C  add         eax,dword ptr [ebp+0Ch]
 
  0040125F  add         eax,dword ptr [ebp+10h]
 
  00401262  mov         dword ptr [ebp-4],eax
 
       4:     return iLocal;
 
  00401265  mov         eax,dword ptr [ebp-4]
 
       5: }
 
  00401268  pop         edi
 
  00401269  pop         esi
 
  0040126A  pop         ebx
 
  0040126B  mov         esp,ebp
 
  0040126D  pop         ebp
 
  0040126E  ret

We can see that the function prologue code pushes ebp which moves esp on another 4 bytes, then moves esp to ebp – so after line 4 the copy of iValOne’s value is now at [ebp+8].

Here’s another Stack snapshot showing the state after the function prologue (i.e. after line 8):

apshot1

state of the Stack after the prologue of SumOf()

Looking at lines 10-12 we can see that the assembler is accessing the function parameters as follows:

  • iParamOne (iValOne) from [ebp+8]
  • iParamTwo (iValTwo) from [ebp+0Ch]
  • iParamThree (iValThree) from [ebp+10h]

Which, unsurprisingly, is exactly where the values main() pushed onto the Stack before calling this function ended up after the function prologue.

Now we can see why the function parameters are pushed onto the stack in reverse order by main() – because functions called expect them to be stored in the Stack in parameter list order starting from [ebp+8] and incrementing in offset from ebp for each parameter.

As before the return value (iLocal, stored at [ebp-4]) is moved into eax before the function’s epilogue code in order to return it to main(), and since we know how the epilogue and return work from the last article we’re done with vanilla stdcall with multiple parameters. Joy.

 

Summary

We’ve looked in some detail at how the Stack is used to call functions in vanilla unoptimised compiler generated stdcall x86 assembler, this should leave you in a pretty good place to go mooching about in disassembly windows with a fair idea of which parts of the disassembly for each function is most likely to be relevant.

For extra information, and to show you how different the Stack use can be (whilst still being basically the same in principle), here’s a link to the 4th in a series of articles on the IBM Technical Library site dealing with PowerPC assembler, and in particular with the 64 bit PowerPC ABI:

http://www.ibm.com/developerworks/linux/library/l-powasm4/index.html

In all likelihood you’ll need to read the first three articles to make sense of the 4th, but the 4th one is where most of the juicy info is 🙂

 

Next Time

Next time, I’m going to look at the x86 thiscall calling convention used when C++ member functions (where the ‘this’ pointer is passed in ecx), and we’ll also have a look in overview at how the exciting sounding ‘fastcall’ x86 calling convention uses the Stack.

Oh, and Merry Christmas!

 

Beyond intrinsics to code idiom

Original Author: Andy Thomason 

Beyond intrinsics to code idiom

A brief history of SIMD

I first encountered SIMD architectures back in the early 1990′s when I was writing MPEG and JPEG codecs for a variety of CPUs.

I was asked by a couple of chip manufacturers to come up with some extra instructions to help with the DCT (discrete cosine transform) calculation.

I had been using the 8087 FPU to do several calculations in parallel by dividing the 63 bit mantissa into groups of bits and doing three DCT calculations in the same time of one. A practice now called “structures of arrays”.

[zeros] [Y0] [zeros] [Y1] [zeros] [Y2]

The numbers were biased to make the result positive.

We could add these values and multiply by small constants, but had to constantly re-bias the numbers to prevent the YUV values from leaking into adjacent numbers.

For our SIMD design, we added a mode that disabled the carry between parts of the accumulator creating what David May, designer of the transputer, called a “split accumulator”.

SIMD architectures later appeared on other CPUs, eventually making it into X86 architectures in the form of 3D Now! and SSE. Now pretty much every flavour of CPU has a SIMD architecture, VMX/Altivec, NEON, Paired float are examples and SPUs and GPUs are exclusively SIMD.

SIMD now!

Today, most programmers either use inline assembler or intrinsics to implement vector and matrix classes.

Inline assembler is very much frowned upon these days, partly because of maintainability, partly because it prevents the compiler from optimizing around the asm statements and so intrinsics tend to be the norm.

Here is an example from an SSE vector class:

class vec4 {
 
    __m128 v128;
 
  public:
 
    vec4() {}
 
    vec4(__m128 value) { v128 = value; }
 
    vec4(float x, float y, float z, float w) { v128 = _mm_setr_ps(x, y, z, w); };
 
    float &operator[](int i) { return v128.m128_f32[i]; }
 
    const float &operator[](int i) const { return v128.m128_f32[i]; }
 
    vec4 operator*(float r) const { return vec4(_mm_mul_ps(v128, _mm_set_ss(r))); }
 
    vec4 operator*(const mat4 &r) const;
 
    vec4 operator+(const vec4 &r) const { return vec4(_mm_add_ps(v128, r.v128)); }

The _mm_add_ps and other functions generate the required vector operations to add all four lanes of the vector value together. On some compilers, these then participate in optimization by combining multiplies and adds, for example, into fused operations.

The advantages of this are obvious, we can perform four times the work of the scalar FPU in the same number of cycles.

Beware the SIMD of doom!

There are a very large number of caveats, however, and it is fact very rare to get any acceleration at all from SIMD classes, despite this obvious advantage.

First, in order to get scalar values into the right place in vectors, we need to execute “permutes” and copies to move scalar data to SIMD registers. So unless your compiler is able to convert FPU instructions to SIMD instructions behind the scene, you will end up with slower code than just writing scalar code.

Second, FPU and SIMD code does not mix, with a pipeline flush usually required between FPU and SIMD operations. This mode switch will usually cripple performance unless the SIMD code is run in a very clean environment.

Also, SIMD code requires the presence of an expert to debug it and must be re-written for every platform.

Moving on to idiom

To address these and other issues, compiler developers are moving towards the concept of “idiom” to represent corners of the target instruction set that can’t be reached by regular C code.

A code idiom is a patten of C code that will be translated into a specific machine instruction, if it is cost effective to do so. Often the patterns are quite complex reflecting the complex behaviour of the underlying architecture.

Often CPUs have complex instructions that can do many things at once. For example, most X86 instructions have a combination of address calculation, load, arithmetic and store. We are used to these being assembled correctly by the compiler.

But on some architectures, for example, there are instructions to load and byte swap or instructions to compute population counts and reverse bit patterns and most importantly, there are instructions to operate on vector data types.

class vec4 __attribute__( ( aligned( 16 ) ) {
 
    float v[4];
 
  public:
 
    ...
 
    vec4 operator+(const vec4 &r) const {
 
      return vec4(v[0]+r.v[0], v[1]+r.v[1], v[2]+r.v[2], v[3]+r.v[3]);
 
    }
 
  };

Here we have some code that will run on any platform that supports GCC-style alignment. Apart from the alignment of 16 bytes, this code is just regular C++ code.

The alignment is crucial for performance. In fact alignment in general is a useful optimization to avoid data structures falling across cache line boundaries, but that is another issue.

In this case having an aligned structure allows the compiler to combine all the loads of the four elements into one vector load.

On the ARM platform, any consecutive loads are candidates for multiple loads, both integer and floating point, up to a very large limit, and as over 50% of instructions executed are loads in a typical game, this is a considerable advantage. We also hope that in future architectures, scatter-gather DMA will be available for little cost in userland as it is on the Cell SPU and many GPUs.

Once loaded, it is a no-brainer to combine the four adds into a SIMD add and likewise the four stores. So by expressing our code in straight C code, we have achieved the same performance as writing platform specific intrinsics.

Here it is in a formal grammar.

{ p[0], p[1], p[2], p[3] } -> sse load
 
   
 
  { a[0]+b[0], a[1]+b[1], a[2]+b[2], a[3]+b[3] } -> sse add

Inside the compiler, the combiner is perpetually searching for patterns that it knows it can reduce without penalty to a simpler set of instructions.

Given a target architecture, we try to provide combines that can generate every instruction available such as byte-swapping loads, combined shift and add, bitfield insertion and so on.

In an ideal compiler, we would not need to have intrinsics for any target instruction as we would simply have to publish the idiom.

Examples of idioms:

rotate = ( x << n) | ( x >> (32-n) );
 
  byte_swap = ( x << 24 ) | ( x << 8 ) & 0xff0000 | ( x >> 8 ) & 0xff00 | ( x >> 24 );

Infix vector operations

In addition to idioms, many compilers, such as CLang and GCC, support infix vector operations.

float __attribute__ ((vector_size (16))) a, b, c;
 
   
 
  c = a + b;

Here we have used the GCC vector_size notation to define a platform independent vector type.

We can add these types to avoid using intrinsics.

typedef float v4f __attribute__ ((vector_size (16)));
 
   
 
  v4f a, b, c;
 
   
 
  c = (v4f){ a[0], a[1], b[2], b[3] };

This is an example of a platform-independent permute. Although GCC supports this, it may not be efficient, but CLang gets it right.

Clang also allows vectors to be of arbitrary length, such as 64 elements.

Designing a good idiom

The ideal idiom for a CPU instruction should be one that is frequently observed in real code examples and ideally that represents an efficient alternative to that instruction, should the CPU not support it.

For example, although we could convert a loop that counted bits into a population count instruction, we would prefer to convert a set of shift/masks for the log(n) solution. Either should work, however, and the analysis is key to making this work.

Idioms are often disrupted by other optimizations, such as strength reduction, so a robust idiom detector is essential, for example by examining the origins of every bit of a result, searching for patterns.

LLVM provides some simple pattern matching in its instruction selection phase, and will be able to generate a wide variety of instructions. However, phase-based compilers, although easy to construct and maintain, are notorious for missing opportunities and often do not have a good cost model. The next generation of AI-driven cost-sensitive technology will hopefully perform much better than human generated heuristic models.

It is also worth noting that in many cases, using a specific machine instruction is not the most efficient thing to do. Although chip makers constantly extend the instruction set, they often provide little support for older instructions which get relegated to long pipeline delays.

Some we just can’t do

Some machine instructions are so deep that we just can’t represent them in C code.

Examples are access to special machine registers and pipeline syncronization instructions such as data memory barriers.

Atomics are also difficult to represent in the language, although many attempts have been made to add transacted memory blocks and atomic variables to the language like the Java volatile.

One thing is certain, that we don’t want to add layer upon layer of library code to the language to represent corner cases, that just makes the language harder to understand for beginners and presents barriers for optimization.

Example of infix vector notation

The following shows two ways of generating a vector cross product function, one with generic vector operations and one with machine-specific intrinsics.

I hasten to add that I haven’t tried the latter as I couldn’t find any working documentation on GCC’s SSE intrinsics.

The former compiles on CLang and another compiler, the latter shows what is required from intrinsics. I wanted to show how to do this using ARM’s intrinsics, but after twenty minutes or so of looking up functions, I gave up.

#ifdef GENERIC_VECTOR
 
   
 
  typedef float __attribute__( ( vector_size( 16 ) ) ) float32x4_t;
 
   
 
  float32x4_t cross_product(float32x4_t x, float32x4_t y) {
 
    // yz - zy, zx - xz, xy - yx, ??
 
    float32x4_t x1200 = (float32x4_t){ x[1], x[2], x[0], x[0] };
 
    float32x4_t y1200 = (float32x4_t){ y[1], y[2], y[0], y[0] };
 
    float32x4_t x2010 = (float32x4_t){ x[2], x[0], x[1], x[0] };
 
    float32x4_t y2010 = (float32x4_t){ y[2], y[0], y[1], y[0] };
 
    return x1200 * y2010 - x2010 * y1200;
 
  }
 
   
 
  #else
 
   
 
  typedef __m128 float32x4_t;
 
   
 
  float32x4_t cross_product(float32x4_t x, float32x4_t y) {
 
    float32x4_t x1200 = _mm_shuffle_ps(x, x, 9);
 
    float32x4_t x1200 = _mm_shuffle_ps(y, y, 9);
 
    float32x4_t x2010 = _mm_shuffle_ps(x, x, 18);
 
    float32x4_t x2010 = _mm_shuffle_ps(y, y, 18);
 
    float32x4_t a = _mm_mul_ps(x1200, y2010);
 
    float32x4_t b = _mm_mul_ps(y1200, x2010);
 
    return _mm_sub_ps(a, b);
 
  }
 
   
 
  #endif

Example of rotate idiom on two compilers

unsigned rotate(unsigned value, unsigned amount) {
 
    return (value << amount) | (value >> (32-amount));
 
  }
 
   
 
  This code generates the following from these two compilers:
 
   
 
  Microsoft:
 
   
 
          mov     eax, DWORD PTR _value$[esp-4]
 
          mov     ecx, DWORD PTR _amount$[esp-4]
 
          rol     eax, cl
 
   
 
  CLang:
 
   
 
          movb    8(%esp), %cl
 
          movl    4(%esp), %eax
 
          roll    

PDF

Walking in the Void

Original Author: Michael A. Carr-Robb-John

Building systems that allow A.I. characters to move intelligently around an environment is a common challenge encountered by game play programmers. One of the main data requirements needed to do this is a level map of some kind, it might be a grid of squares or more common these days a navigation mesh. However when this information is missing or un-available as can happen in a streaming world how can you make A.I. characters walk around in places that are not currently loaded?

Let’s consider an example scenario, here is our demonstration world:

Steamed zones are indicated by the named sections and the only zones that are loaded in memory are those that are directly connected to the players current location.

Let’s say that the player is standing in the blue zone “Central Court”.

The currently loaded zones are Central Court, Guard Room, Drawbridge control room, Kitchen, Entrance hall and Barracks.

An enemy A.I. Character is guarding the grounds of the castle by walking a patrol from the “Clerk’s Office” to the “Audience Chamber” and then to the “Guard Room”.

  • How do we know when the guard should enter a loaded zone and from which door?
  • If the player moves to an adjacent zone causing a zone to be loaded that the A.I. Character is currently in, how do we know where in the room he should be placed?

Navigation

Solving this scenario given that parts of the navigational mesh might be missing or at times completely non-existent requires for lack of a better phrase a mini-map. The mini-map contains which zones are connected together and apart from some additional data I will talk about later it would pretty much look very similar to the demonstration map above. It always exists in memory and hence is always available regardless of which zones are currently loaded.

In a traditional navigation system you would provide the x, y, z coordinates of where the character currently is and specify the x, y, z of where you want to go. The returned route would be a list of optimized waypoints that the character’s piloting system can use.

Plotting a route from Start position to Goal on a navigation mesh.

Navigation Route:
 
        Waypoint 1 - [X, Y, Z]
 
        Waypoint 2 - [X, Y, Z]
 
        Goal - [X, Y, Z]

With a streaming world we now require an additional parameter for each coordinate, the zone identification.

Navigation changes slightly, it now becomes a two phase process. The first phase calculates a high level route based upon the mini-map, in order to do this the mini-map needs a few more bits of information than simply which rooms are connected together. It needs to know the location of all entry / exit points for each zone, as well as the piloting distance from each point to every other point in that zone.

The more accurate the distance calculation is, the better results you will get from this system. Hence why the piloting distance is best, the worst case value is the direct distance between the two points. There are reasons that might prevent accurate piloting distances being worked out, i.e. the zone design isn’t one that allows a route to be plotted because of moving platforms or barriers, etc.

The diagram below indicates the sort of information that is required for each zone, it’s entrance / exits and the distances from that point to every other point.

With all the data stored in the mini-map it now becomes possible to calculate the quickest route from one room to another using your favourite search algorithm.

Looking at the scenario example of a guard traveling from “Clerk’s Office” [sX, sY, sZ] to the “Audience Chamber” [dX, dY, dZ] the first phase navigation would generate the following path:

Navigation Route:
 
        Clerk’s Office – 60m
 
        Great Hall – 40m
 
        Entrance Hall – 30m
 
        Waiting Room – 25m
 
        Audience Chamber
 
        Goal - [ X, Y, Z ]

The second phase takes the high level route and if any zone is actually resident in memory it will convert that part into a normal navigation route. The route in our example would be expanded like so:

 Navigation Route:
 
        Clerk’s Office – 60m
 
        Great Hall – 40m
 
        Entrance Hall - Enter [ X, Y, Z ] (These coordinates come from the mini-map)
 
        Waypoint 1 - [ X, Y, Z ]
 
        Waypoint 2 - [ X, Y, Z ]
 
        Waypoint 3 - [ X, Y, Z ]
 
        Entrance Hall – Exit [ X, Y, Z ] (These coordinates come from the mini-map)
 
        Waiting Room – 25m
 
        Audience Chamber
 
        Goal - [ X, Y, Z ]

Of course as zones come and go this can have a knock-on effect to the generated routes, expect to have a mechanism in place that triggers an update/refresh of the generated routes if a zone gets loaded or unloaded.

Piloting

Piloting  is the process of actually moving a character along the generated navigation route usually applying some form of dynamic obstacle avoidance as it goes. In many respects piloting through zones that are not loaded is very simple, just wait for a certain amount of time to expire. The duration is dependent upon how fast your character moves, if he or she moves at 1m/s then it will take a character 15 seconds to walk the 15 meters from zone entrance to zone exit at which point you then move onto the next zone.

If the next zone happens to be loaded you simply relocate the character into the world at the door position and use your normal piloting techniques to move the character to the exit door, where the character disappears (leaves the zone) and continues travelling the unloaded zones again.

Now what happens if the character is walking through an unloaded zone when it gets loaded? Let’s say the character had got 7.5 seconds inside the zone and it was supposed to take 15 seconds to travel? Well that is actually halfway along the path, build the waypoints for the room from entrance point to exit point and actually snap the character to halfway along the path. It’s not an exact process but you will find that it is a pretty good estimation, the only time this might fail is if there is no valid path from the entrance to the exit in which case you must have a fail safe option and that will be dependent upon your actual game.

Generating the Mini-map

In order to auto generate the mini-map I have a game entity called a Zone Marker, one of these markers is placed at each entrance / exit point in every zone, the marker contains information of what zone it leads too.

When each Zone is built I take the Zone Markers and do a traditional A* from each marker to every other marker on the zone navigation mesh. This gives me the distances between each entrance / exit point which I can then update to the Mini-map.

Additional data you might want to include in the mini map is if certain routes can become blocked for example because of a passage collapse, locked door, etc. As long as the state of the connection can be verified without loading a specific zone then you can apply that to the first phase route generation. I usually have a section in my global data manager that stores the locked / unlocked state of all doors and connection states in the world for this purpose.

As I mentioned earlier it is possible when calculating the piloting distance from one entry / exit point to another that there is no path available.There are various solutions I have used in the past but it all really depends on what is actually contained in the zone. I usually set a flag on each zone marker on how to deal with these scenarios since that allows me the most customization, the options I have supported are:

  • Use the direct distance between the two zone markers.
  • Same as above but with an additional 25%, 50%, 75%, 100% and 200% distance modifier.
  • Disable the route if no valid path can be generated.
  • Disable the route regardless of the path generation.

A Traditional Solution

Another way that this can be achieved of course is by doing it manually using event timers and scripts to trigger character appearances, etc. Sometimes when you don’t have the time or resources available this can be a good solution, even more so if you only have one or two cases where characters need to appear to achieve world movement.

Platform Specific Resources

Original Author: Niklas Frykholm

I recently added a new feature to the BitSquid tool chain – support for source and destination platforms in the data compiler. What it means is that you can take the data for one platform (the source) and compile it to run on a different platform (the destination). So you can take the data for the mobile version of a game (with all its content optimizations) and compile it so that it runs on your development PC.

This is nice for two reasons. First, access to target hardware can be limited. In a perfect world, every artist would have a dev kit for every target platform. In practice, this might not be economically possible. It might not even be electrically possible (those main fuses can only take so much). Being able to preview and play console/handheld content on PC is better than nothing, in this less-than-perfect world.

Second, since all our editors use the engine for visualization, if we have specified a handheld device as our source platform, all the editors will automatically show the resources as they will appear on that device.

This new feature gives me a chance to talk a little bit about how we have implemented support for platform specific resources, something I haven’t touched on before in this blog.

The BitSquid Tech uses the regular file system for its source data. A resource is identified by its name and type, both of which are determined from the path to the source file:

Note that even though the name is a path, it is not treated as one, but as a unique identifier. It is hashed to a 64-bit integer by the engine and to refer to a resource you must always specify its full name (and get the same hash result). In the compiled data, the raw names don’t even exist anymore, the files are stored in flat directories indexed by the hash values.

In addition to name and type a resource can also have a number of properties. Properties are dot-separated strings that appear before the type in the file name:

Properties are used to indicate different variants of the same resource. So all these files represent variants of the same resource:

buttons.texture
 
  buttons.ps3.texture
 
  buttons.en.x360.texture
 
  buttons.fr.x360.texture

The two most important forms of properties are platforms and languages.

Platform properties (x360, ps3, android, win32, etc) are used to provide platform specific versions of resources. This can be used for platform optimized versions of units and levels. Another use is for controller and button images that differ from platform to platform. Since BitSquid is scripted in Lua and Lua files are just a resource like any other, this can also be used for platform specific gameplay code:

PlayerController.android.lua

Language properties (en, fr, jp, it, sv, etc) are used for localization. Since all resources have properties, all resources can be localized.

But the property system is not limited to platforms and languages. A developer can make up whatever properties she needs and use them to provide different variants of resources:

bullet_hit.noblood.particle_effect
 
  foilage.withkittens.texture

Properties can be resolved either at data compile time or at runtime.

Platform properties are resolved at compile time. When we compile for PS3 and a resource has ps3 specific variants, only those variants are included in the compiled data. (If the resource doesn’t have any ps3 variants, we include all variants that do not have a specified platform.)

Language properties and other custom properties are resolved at runtime. All variants are compiled to the runtime data. When running, the game can specify what resource variants it wants with a property preference order. The property preference order specifies the variants it wants to use, in order of preference.

Application.set_property_preference_order {”withkittens”, ”noblood”, ”fr”}

This means that the game would prefer to get a resource that has lots of kittens, no blood and is in French. But if it can’t get all that, it will rather have something that is kitten-full than blood-free. And it prefers a bloodless English resource to a bloody French one.

In other words, if we requested the resource buttons.texture with these settings, the engine would look for variants in the order:

buttons.withkittens.noblood.fr.texture
 
  buttons.withkittens.noblood.texture
 
  buttons.withkittens.fr.texture
 
  buttons.withkittens.texture
 
  buttons.noblood.fr.texture
 
  buttons.noblood.texture
 
  buttons.fr.texture
 
  buttons.texture

To add support for different source and destination platforms to this system all I had to do was to add a feature that lets the data compiler use one platform for resolving properties and a different platform as the format for the runtime files it produces.

(This has also been posted to the BitSquid blog.)