PixelJunk Shooter 2 lighting : My one (so far) regret

Original Author: Jaymin Kessler

Disclaimer: I am not on the PixelJunk team, so if you liked the game I probably had zero to do with it! Conversely, if there were parts you disliked, I’m probably not the person to talk to 🙂

By now, everyone on the planet should have played PixelJunk Shooter 2, but in case the PSN outage stopped you from downloading it, it looks something like this:

One of the more interesting things in the sequel was the introduction of dark stages, something I think added an interesting new dimension to the game. I won’t do an in-depth description of how the SPU lighting in those stages works, but basically we calculate a real-time dynamic distance field that is also used for fluid particle collisions, and use that to get the length of light rays for occlusion and shadow casting. The lighting itself consists of three stages: get ray length, rasterize out, merge with other lights. The second stage was by far the slowest due to inefficient memory accesses, but I will save my ideas for that for another day. Its the first stage I want to talk about today, but first we need some background.

Distance Field: it slices, it dices, it makes thousands of julienne fries!

Distance fields are one of those things that, to me at least, seem to have endless interesting uses. They are related to the skeleton transform, which I believe is the process in which girls become models. Lets start off with a simple 2D world, in which there are two objects: an oval and a square. You start with a “binary” image (that doesn’t have to be binary at all) where certain pixel values denote objects and others free space. The end result of the distance transform should be another image where each pixel’s value gives the distance to the closest object. As you get closer and closer to an object, the distance gets smaller and smaller, but what happens inside an object? Well, that depends. In an unsigned distance field, pixels that represent an object tend to be all zero, since the distance to the closest object (itself) is zero. In a signed distance transform, the distances become zero at object edges, and then go negative as you move towards the center of an object. Its actually quite useful, for example, when you want to know how far an objects penetrates a wall and figure out how much you need to back up.

There are many methods used to calculate them on CPUs, GPUs, and some mythical fake-sounding possibly theoretical machines like SIMD Hypercube Computers (Parallel Computation of the 3-D Euclidean Distance Transform on the SIMD Hypercube Computer), LARPBS (Fast and Scalable Algorithms for the Euclidean Distance Transform on the LARPBS), and EREW PRAM (Parallel Computation of the Euclidean Distance Transform on a Three-Dimensional Image Array). GPU algorithms tend to be multi-pass and slow to converge, while CPU algorithms tend to be very parallel-unfriendly, and the algorithms that are parallel tend to be for the weird architectures mentioned above.

I’ll now briefly go over the Chamfer signed distance transform. For a handy reference, be sure to check out the excellent “The Dead Reckoning Signed Distance Transform” by George J. Grevera. First (obviously) is initialization. Go through every pixel in your texture and if that pixel is inside an object, give it a value of -1.f, otherwise give it a value of FLT_MAX. Then pass over one more time looking for object edges, and assign them values of zero. The second pass is a forward sweep from left to right and top to bottom. You have a sliding window that looks something like this


?2 1 ?2
1 C -
- - -

where C is the center of the window (and the pixel whose value we want to fill in). So for each pixel in the surrounding 8 neighbors, we take its distance value and add the corresponding offset in the window (1 for the pixels directly above and to the left, ?2 for the pixels to the upper right and upper left, skip pixels marked with a -). Out of those 4 values, find the min and make that the distance for C. You can see we are starting with known distances to objects and then propagating. The second pass is almost identical, except we start at the bottom right corner and go right to left, down to up. This time the window looks something like this


- - -
- C 1
?2 1 ?2

Thats it. By now you’ll have a lovely approximation of the distance from any pixel in your map to the closest object. If you check out Grevera’s paper you can see the results from experimenting with different window sizes and distance offsets, and read about dead reckoning which is useful for keeping track of real distances.

that one regret

One fine day, Jerome (my boss) sent me a copy of a PDF he thought I’d be interested in. It was called “Rendering Worlds with Two Triangles with raytracing on the GPU in 4096 bytes” by Inigo Quilez (http://www.iquilezles.org/www/material/nvscene2008/rwwtt.pdf). Its the paper that introduced me to raymarching, and kicked off my obsession with procedurally generated on the fly distance fields. The really obvious thing he mentions is that for any point in the distance field, its “guaranteed” that there won’t be any objects closer than the distance field value at that point. So if you’re marching along a ray, you no longer have to check every single pixel for occluders, but rather can just jump ahead by the distance to the closest object. It would have been absolutely perfect for the first pass of the Shooter 2 lighting system… if only I had actually used it! The only drawback is when you have a ray running parallel to a close by wall. Because the closest object is always right next to you, you can’t jump ahead so far.

The approach I took in Shooter 2 was slightly more… um… low level. I decided to calculate light ray length by loading between 4 and 16 pixels at a time into a vec_uchar16, and then in parallel check for the earliest occluder giving me the total ray length (see http://6cycles.maisonikkoku.com/6Cycles/6cycles/Entries/2010/4/17_n00b_tip__Psychic_computing.html). Of course I was too busy unrolling and software pipelining and microoptimizing to care about the insane cost of loading sparse pixels along a ray and packing them into a vector. Actually, thats not entirely accurate. I put a lot of work into coming up with an offline-generated table of ordered indices that would minimize the redundant loads and packing, but the overall cost of the first stage was still dominated by inefficient (some would say unnecessary and avoidable) data transforms. (note: I experimented with ways to get around this like http://6cycles.maisonikkoku.com/6Cycles/6cycles/Entries/2010/11/27_n00b_tip__Combining_writes.html but none ended up shipping with Shooter 2)


testing light occlusion against the oval and box defined by the distance field

So, as a joke I decided to hack together a particle free Shooter 2-like lighting demo on a platform far less powerful than the PS3 and the results were pretty amazing. Not only was I able to get a large number of lights going, but I was also able to add reflection and refraction, something I must admit would have looked insanely sexy with the Shooter fluid system 🙂

There is no such thing in life as normal

Even if you’re a Johnny Marr fan, you have to admit Morrissey has a point. The geometry for the objects used to define my distance field doesn’t exist, and there are times I want the normals. For example, when doing the reflection and refraction mentioned above. I thought back to basic calculus and remembered how to calculate gradients

http://www.khanacademy.org/video/gradient-1?playlist=Calculus

http://www.khanacademy.org/video/gradient-of-a-scalar-field?playlist=Calculus

Testing my newfound normals, I found something disturbing. When bouncing off the oval, there were certain points when the reflected ray would totally go nuts (see below where bouncing off two very close points gives two different results).

Interesting. I tried rendering some of the normals and suddenly the problem became clear

OK. So the distance field itself is a low resolution noisy approximation of the true distance, and calculating the normals is an approximation from the distance field, so I’d expect it to be crap but we should be able to do better. I researched all kinds of interesting ways of improving the normals, things like edge curve fitting and bilinear filtering, but in the end I was able to get close enough but still maintain acceptable performance by a combination of blurring the distance field values and increasing the distance from the current pixel of the pixels used to get the gradient. Below are some things I tried and the results


averaging the normals


averaging the normals and increasing the gradient distance from 1 to 2

Additional unrelated topic: moving raymarching into 3D

One last thing. Ray marching is an insanely cool technique that has uses in dimensions other than 2. It can also be used to do stuff in 3D! Since I’m not a graphics programmer and I suck at making stuff look good, I won’t waste too too much time talking about the cute little guy I was able to make

It took me about 15 minutes to get that first little demo up and running. I’m still experimenting with procedural on the fly distance fields, and I might post again after a bit more math research. By the way, here is what it looks like when someone who knows what they are doing uses raymarching

Ready, Set, Allocate! (Part 6)

Original Author: Paul Laska

Links to other parts:

In this part I will cover three tests. The first test examines how long it takes to do a set number of allocations wth a given size. The second test examines how long it takes to exhaust thirty-two megabytes of memory with four kilobyte allocations a set number of times. The third test examines how long it takes to allocate a set number of times in a worst case scenario. Each of the tests takes a file pointer as a parameter, so that it can write the results of the test out in a comma seperated value (CSV) format.

Due to my old laptop giving up the ghost I’m having to change the specs for the system I am using. I’ve done the best with what I have to keep the specs as close to the original ones as possible. The system now used for development and testing is an old desktop running Windows XP Pro Service Pack 3, with a Pentium-4 3.0GHz 32-bit processor, 800MHz FSB, and 512MB of RAM.

First, here are a few important details.

static const double MAXALLOCATIONS = 10000;
 
  static const u32 STEPMULTIPLIER = 10;
 
  static const u32 HEADERSIZE = 8; // sizeof(TALLOCATION_HEADER)
 
  static const u32 NUMREPORTS = (u32)ceil(log(MAXALLOCATIONS)/log((d64)STEPMULTIPLIER)) + 1;

The MAXALLOCATIONS is the maximum set number of allocations to run for each test. The STEPMULTIPLIER is the multiplicand to use when stepping up the number of allocations to run for each subsequent iteration (i.e. going from 100 allocations to 1000 allocatons by multiplying 100 * STEPMULTIPLIER). I’ve pre-calculated HEADERSIZE as a convenience, so it doesn’t need to keep being calculated. Also since TALLOCATION_HEADER is declared in a separate .cpp file, with no declaration in the corresponding header, I’ve gone ahead and hard coded the value here. Last the NUMREPORTS represents the number of iterations that will be performed and reported for each test.

The structure for recording the times and reporting them later is a simple collection of values, with the size used for the allocations, and the hours, minutes and seconds for both allocating and freeing the memory. The seconds are stored as a double to keep the milliseconds.

struct Record
 
  {
 
  	s32 allocationSize;
 
  	s32 allocHours;
 
  	s32 allocMinutes;
 
  	d64 allocSeconds;
 
  	s32 freeHours;
 
  	s32 freeMinutes;
 
  	d64 freeSeconds;
 
  };

Two other structures are used to associate the number of allocations, or repetitions of allocating / freeing the memory, performed at the given allocation size. The first structure is used with the first test and stores an array of records, so that differing allocation sizes (1Byte to 32MB, by multiples of 2) can be associated with the number of times the allocation is performed. The second structure is used with the second and third tests, associating a single record with the number of times the test is performed.

struct AllocReportSizes
 
  {
 
  	static const s32 NUMALLOCATIONSIZES = 25;
 
   
 
  	s32 numAllocations;
 
  	Record records[NUMALLOCATIONSIZES];
 
  };
 
   
 
  typedef struct AllocReportExhaustion
 
  {
 
  	s32 numRepetitions;
 
  	Record record;
 
  } AllocWorstCase;

1. How long does it take to allocate X times at size Y?

First thing to do is set up an array of reporting structures, one for each number of allocations to be tested, along with an entry for each size to be reported (1Byte to 32MB). The number of allocations increase by a multiple of STEPMULTIPLIER for each successive reporting structure.

// Allocate the given size, then free it, a specified number of times
 
  bool TestSizes(FILE* pOutputFile)
 
  {
 
  	if (!pOutputFile)
 
  		return false;
 
   
 
  	AllocReportSizes* pReports = new AllocReportSizes[NUMREPORTS];
 
   
 
  	for (u32 i = 0, numAllocs = 1; i < NUMREPORTS; ++i, numAllocs *= STEPMULTIPLIER)
 
  	{
 
  		pReports[i].numAllocations = numAllocs;
 
  		pReports[i].records[0].allocationSize = MEM_1B;
 
  		pReports[i].records[1].allocationSize = MEM_4B;
 
  		pReports[i].records[2].allocationSize = MEM_8B;
 
  		pReports[i].records[3].allocationSize = MEM_16B;
 
  		pReports[i].records[4].allocationSize = MEM_32B;
 
  		pReports[i].records[5].allocationSize = MEM_64B;
 
  		pReports[i].records[6].allocationSize = MEM_128B;
 
  		pReports[i].records[7].allocationSize = MEM_256B;
 
  		pReports[i].records[8].allocationSize = MEM_512B;
 
  		pReports[i].records[9].allocationSize = MEM_1KB;
 
  		pReports[i].records[10].allocationSize = MEM_2KB; // Header space in a block doesn't become an issue until 4KB.
 
  		pReports[i].records[11].allocationSize = MEM_4KB - HEADERSIZE;
 
  		pReports[i].records[12].allocationSize = MEM_8KB - HEADERSIZE;
 
  		pReports[i].records[13].allocationSize = MEM_16KB - HEADERSIZE;
 
  		pReports[i].records[14].allocationSize = MEM_32KB - HEADERSIZE;
 
  		pReports[i].records[15].allocationSize = MEM_64KB - HEADERSIZE;
 
  		pReports[i].records[16].allocationSize = MEM_128KB - HEADERSIZE;
 
  		pReports[i].records[17].allocationSize = MEM_256KB - HEADERSIZE;
 
  		pReports[i].records[18].allocationSize = MEM_512KB - HEADERSIZE;
 
  		pReports[i].records[19].allocationSize = MEM_1MB - HEADERSIZE;
 
  		pReports[i].records[20].allocationSize = MEM_2MB - HEADERSIZE;
 
  		pReports[i].records[21].allocationSize = MEM_4MB - HEADERSIZE;
 
  		pReports[i].records[22].allocationSize = MEM_8MB - HEADERSIZE;
 
  		pReports[i].records[23].allocationSize = MEM_16MB - HEADERSIZE;
 
  		pReports[i].records[24].allocationSize = MEM_32MB - HEADERSIZE;
 
  	}
 
   
 
  	// Write the headers.
 
  	fprintf(pOutputFile, "Action,Num Allocations,MEM_1B,MEM_4B,MEM_8B,MEM_16B,MEM_32B,MEM_64B,MEM_128B,MEM_256B,MEM_512B,MEM_1KB,MEM_2KB,MEM_4KB,MEM_8KB,MEM_16KB,MEM_32KB,MEM_64KB,MEM_128KB,MEM_256KB,MEM_512KB,MEM_1MB,MEM_2MB,MEM_4MB,MEM_8MB,MEM_16MB,MEM_32MBn");

Next iterate through each of the reporting structures to determine how long it takes to allocate the specified number of allocations with each of the given sizes. The allocation is performed, then the memory is released, so the next allocation doesn’t operate under different conditions. The time tracked is only the time spent in the allocation and free functions, and those two are tracked seperately.

	for (u32 i = 0; i < NUMREPORTS; ++i)
 
  	{
 
  		s32 numAllocations = pReports[i].numAllocations;
 
  		fprintf(pOutputFile, "Allocate,%i", numAllocations);
 
  		for (s32 j = 0; j < AllocReportSizes::NUMALLOCATIONSIZES; ++j)
 
  		{
 
  			s32 allocationSize = pReports[i].records[j].allocationSize;
 
   
 
  			CHighPerfTimer timerAlloc;
 
  			CHighPerfTimer timerFree;
 
   
 
  			void* pMem;
 
  			for (s32 k = 0; k < numAllocations; ++k)
 
  			{
 
  				timerAlloc.Resume();
 
  				pMem = malloc(allocationSize);
 
  				timerAlloc.Pause();
 
  				timerFree.Resume();
 
  				free(pMem);
 
  				timerFree.Pause();
 
  			}
 
   
 
  			timerAlloc.GetTime(pReports[i].records[j].allocHours, pReports[i].records[j].allocMinutes, pReports[i].records[j].allocSeconds);
 
  			timerFree.GetTime(pReports[i].records[j].freeHours, pReports[i].records[j].freeMinutes, pReports[i].records[j].freeSeconds);
 
   
 
  			fprintf(pOutputFile, ",%02i:%02i:%02.4f", pReports[i].records[j].allocHours, pReports[i].records[j].allocMinutes, pReports[i].records[j].allocSeconds);
 
  		}
 
  		fprintf(pOutputFile, "n");
 
  	}
 
   
 
  	for (u32 i = 0; i < NUMREPORTS; ++i)
 
  	{
 
  		s32 numAllocations = pReports[i].numAllocations;
 
  		fprintf(pOutputFile, "Free,%i", numAllocations);
 
  		for (s32 j = 0; j < AllocReportSizes::NUMALLOCATIONSIZES; ++j)
 
  		{
 
  			fprintf(pOutputFile, ",%02i:%02i:%02.4f", pReports[i].records[j].freeHours, pReports[i].records[j].freeMinutes, pReports[i].records[j].freeSeconds);
 
  		}
 
  		fprintf(pOutputFile, "n");
 
  	}
 
  	return true;
 
  }

Allocating a bunch of objects of the same size can add up in time spent allocating, and if lots of them are done at the same time, it can be a performance issue. The problem becomes more prevalent with larger allocations when the system has to search for larger blocks of contiguous memory.

When the test is run using the heap system here, allocation times range from less than a millisecond, for 1 Byte being allocated 1 time, to 27.3 milliseconds, for 32 MB being allocated 10000 times. However, when the test is run using the standard memory allocator, allocation times range from less than a millisecond, for 1 Byte being allocated 1 time, to 9 minutes 17 seconds 442.7 milliseconds, for 32 MB being allocated 10000 times. Now that’s a pretty sensational performance improvement, but something a bit more realistic in real world use might be 64KB allocated 1000 times, and with that the improvement still shows through with 0.7 milliseconds and 66.3 milliseconds respectively.

Windows XP cannot be limited to allocating the requested memory from a specific range of addresses, short of implementing a memory allocation system, which lends to increased allocation times while searching for available memory with the standard allocation system, and that is partly what is being demonstrated here.

2. How long does it take to allocate all the memory X times at 4KB per allocation?

Similar to the previous test, an array of reporting structures is setup, but this time it is for each number of repetitions to exhaust the memory. Only the smallest size possible from the system is tested, since larger allocations decrease the amount allocations and time.

// Allocate one block at a time, until memory is full, then free the memory, a specified number of times
 
  bool TestExhaustion(FILE* pOutputFile)
 
  {
 
  	if (!pOutputFile)
 
  		return false;
 
   
 
  	u32 blockSize = _KB(4);
 
  	AllocReportExhaustion* pReports = new AllocReportExhaustion[NUMREPORTS];
 
   
 
  	for (u32 i = 0, numRepetitions = 1; i < NUMREPORTS; ++i, numRepetitions *= STEPMULTIPLIER)
 
  	{
 
  		pReports[i].numRepetitions = numRepetitions;
 
  		pReports[i].record.allocationSize = blockSize - HEADERSIZE; // blockSize - sizeof(TALLOCATION_HEADER)
 
  	}
 
   
 
  	// Write the headers.
 
  	fprintf(pOutputFile, "Action,Num Repetitions,Time to exhaust 32MB using 4KB per allocationn");

Then the reporting structures are iterated over to determine how long it takes to allocate all the memory, at 4KB per allocation, the specified number of repetitions. Each time the memory is exhausted it is released and the next repetition begins.

	u32 uTotalMem = _MB(32);
 
  	for (u32 i = 0; i < NUMREPORTS; ++i)
 
  	{
 
  		s32 numRepetitions = pReports[i].numRepetitions;
 
  		s32 allocationSize = pReports[i].record.allocationSize;
 
   
 
  		CHighPerfTimer timerAlloc;
 
  		CHighPerfTimer timerFree;
 
   
 
  		void** ppMem = new void*[uTotalMem / blockSize];
 
  		for (s32 j = 0; j < numRepetitions; ++j)
 
  		{
 
  			for (u32 k = 0; k < (uTotalMem / blockSize); ++k)
 
  			{
 
  				timerAlloc.Resume();
 
  				ppMem[k] = malloc(allocationSize);
 
  				timerAlloc.Pause();
 
  			}
 
  			for (u32 k = 0; k < (uTotalMem / blockSize); ++k)
 
  			{
 
  				timerFree.Resume();
 
  				free(ppMem[k]);
 
  				timerFree.Pause();
 
  			}
 
  		}
 
  		timerAlloc.GetTime(pReports[i].record.allocHours, pReports[i].record.allocMinutes, pReports[i].record.allocSeconds);
 
  		timerFree.GetTime(pReports[i].record.freeHours, pReports[i].record.freeMinutes, pReports[i].record.freeSeconds);
 
   
 
  		fprintf(pOutputFile, "Allocate,%i,%02i:%02i:%02.4fn", numRepetitions, pReports[i].record.allocHours, pReports[i].record.allocMinutes, pReports[i].record.allocSeconds);
 
  		fprintf(pOutputFile, "Free, %i,%02i:%02i:%02.4fn", numRepetitions, pReports[i].record.freeHours, pReports[i].record.freeMinutes, pReports[i].record.freeSeconds);
 
  	}
 
   
 
  	return true;
 
  }

Allocating all the memory at 4KB per allocation demonstrates some good and bad allocation conditions, though not the absolute worst (that’s saved for the last test), .When no memory is previously allocated, available memory is discovered quickly, and because each subsequent allocation is contiguous whole blocks eventually end up getting examined with a single check, reducing the time to discover available memory. Performing lots of small allocations means that more searches are performed than if larger allocations were made, because larger allocations would exhaust the memory quicker, and more searches adds up to more time spent searching for available memory.

Exhausting 32 MB of memory at 4KB per allocation with the heap system here ranges in time from 10.1 milliseconds, to do it 1 time, to 1 minute 39 seconds 995 milliseconds, to do it 10000 times. When using the standard memory allocation system, the times range from 101.5 milliseconds, to exhaust 32MB of memory 1 time, to 15 minutes 52 seconds 593.3 milliseconds, to do it 10000 times. Even more interesting is the increase in performance of the Free method. To free the same memory that was allocated, the heap system here ranges from 6.1 milliseconds to 1 minute 1 second 891.4 milliseconds, while the standard memory allocation system clocks in a range from 613.6 milliseconds to 1 hour 43 minutes 35 seconds 161.5 milliseconds. At first I didn’t believe it took more than an hour and a half to free the memory, so I re-ran the test and came up with a similar time (1:41:58.453.5). While the idea of exhausting the memory 10000 times in a row may not be a real world problem, it still highlights the time gains that can be made over many allocations and de-allocations with a specialized system.

Allocating all the memory in Windows XP isn’t the same as allocating all the memory in the system described herein, since Windows XP will begin page-swapping once the physical memory has been exhausted. However, the time it takes Windows XP to allocate 32MB worth of memory at 4KB per allocation versus the time it takes the system described herein to do the same can still be compared for time to complete.

3. How long does it take to allocate 8KB of memory X times in the worst case?

As previously done, an array of reporting structures is created, but this one is used to report the time it takes to allocate an 8KB request in a worst case scenario.

// With every other block available, until the end of the memory where two blocks are available, allocate two blocks of memory, a specified number of times
 
  bool TestWorstCase(FILE* pOutputFile)
 
  {
 
  	if (!pOutputFile)
 
  		return false;
 
   
 
  	u32 blockSize = _KB(4);
 
  	AllocWorstCase* pReports = new AllocWorstCase[NUMREPORTS];
 
   
 
  	for (u32 i = 0, numRepetitions = 1; i < NUMREPORTS; ++i, numRepetitions *= STEPMULTIPLIER)
 
  	{
 
  		pReports[i].numRepetitions = numRepetitions;
 
  		pReports[i].record.allocationSize = (2* blockSize) - HEADERSIZE; // (2 * blockSize) - sizeof(TALLOCATION_HEADER)
 
  	}
 
   
 
  	// Write the headers.
 
  	fprintf(pOutputFile, "Action,Num Repetitions,Worst case allocation time for an 8KB blockn");

The worst case sets up the memory to be entirely allocated in 4KB blocks, then goes back through and frees up every other block. At the end of the memory the test ensures an 8KB block is available to fullfill the request. Any searches for free memory will discover there are free blocks within each page and will search through each page to check if enough contiguous blocks exist to fullfill the requested 8KB.

	u32 uTotalMem = _MB(32);
 
  	void** ppMem = new void*[uTotalMem / blockSize];
 
   
 
  	// Allocate all the memory, one block at a time, then free every other block, then free the next to last block,
 
  	// creating two contiguous blocks at the end of the memory.
 
  	for (u32 i = 0; i < uTotalMem / blockSize; ++i)
 
  	{
 
  		ppMem[i] = malloc(blockSize - HEADERSIZE);	// blockSize - sizeof(TALLOCATION_HEADER)
 
  	}
 
  	for (u32 i = 1; i < uTotalMem / blockSize; i += 2)
 
  	{
 
  		free(ppMem[i]);
 
  		ppMem[i] = NULL;
 
  	}
 
  	free(ppMem[uTotalMem / blockSize - 2]);

The test iterates through the reporting structures, performing the 8KB request for the specified number of repetitions in each one. Every time the available 8KB is found at the end of the memory it is immediately deallocated so the test can be performed for the next iteration.

	// Allocate two blocks of memory, with the only two contiguous blocks of memory being the last two, then free it a given number of times.
 
  	for (u32 i = 0; i < NUMREPORTS; ++i)
 
  	{
 
  		s32 numRepetitions = pReports[i].numRepetitions;
 
  		s32 allocationSize = pReports[i].record.allocationSize;
 
   
 
  		CHighPerfTimer timerAlloc;
 
  		CHighPerfTimer timerFree;
 
   
 
  		for (s32 j = 0; j < numRepetitions; ++j)
 
  		{
 
  			void* pMem;
 
  			for (u32 k = 0; k < (uTotalMem / blockSize); ++k)
 
  			{
 
  				timerAlloc.Resume();
 
  				pMem = malloc(allocationSize);
 
  				timerAlloc.Pause();
 
  				timerFree.Resume();
 
  				free(pMem);
 
  				timerFree.Pause();
 
  			}
 
  		}
 
  		timerAlloc.GetTime(pReports[i].record.allocHours, pReports[i].record.allocMinutes, pReports[i].record.allocSeconds);
 
  		timerFree.GetTime(pReports[i].record.freeHours, pReports[i].record.freeMinutes, pReports[i].record.freeSeconds);
 
   
 
  		fprintf(pOutputFile, "Allocate, %i,%02i:%02i:%02.4fn", numRepetitions, pReports[i].record.allocHours, pReports[i].record.allocMinutes, pReports[i].record.allocSeconds);
 
  		fprintf(pOutputFile, "Free, %i,%02i:%02i:%02.4fn", numRepetitions, pReports[i].record.freeHours, pReports[i].record.freeMinutes, pReports[i].record.freeSeconds);
 
  		printf("n");
 
  	}
 
   
 
  	// Free the memory used to setup the testcase
 
  	for (u32 i = 0; i < uTotalMem / blockSize - 1; ++i)
 
  	{
 
  		if (ppMem[i] != NULL)
 
  			free(ppMem[i]);
 
  	}
 
   
 
  	return true;
 
  }

The worst case test gives a sense of the time an allocation can take with this heap system at its worst, and shows how important it is to avoid fragmentation of the memory. Times to perform the test range from 390.5 milliseconds, to complete 1 time, to 1 hour 5 minutes 12 seconds 634.8 milliseconds, to complete 10000 times. In the real world one hopes to never get into a fragmentation predicament like the one setup here, and careful planning of heaps is usually done to help avert this kind of situation.

Since there is no way to limit where Windows XP will allocate memory, and because Windows XP will start page swapping to the harddrive when it runs out of physical memory, this last test isn’t something that can be fairly compared, so it has been skipped for the standard memory allocator.

While this heap system has its advantages, it can be improved upon in a number of ways. Here are three ideas that spring to mind . The heap system could be made thread-safe. Second, small allocations, less than the block size, need to be handled at the sub-block level to avoid wasting space. One way that could be done is to allocate a heap containing a set number of blocks (e.g. 128 blocks (512KB)), then any time a small request comes in, it can be redirected to that heap. This method will require some overhead to track the sub-block allocations. Third, this heap system could be improved for debugging by incorporating tracking information to determine where an allocation was made.

I hope this series has been helpful, and look forward to hearing all the great ideas other people have on how to improve upon what has been shown here.


Re-Education (Through Leaderboards)

Original Author: David Czarnecki

TL;DR

In the last 2 weeks I’ve ported my leaderboard library, originally written in Ruby, to PHP and Java. It has been an extremely fulfilling endeavor for a number of reasons.

99 PROBLEMS

Let’s face it, we’ve all been on epic e-mail threads where someone poses a problem, e-mail armchair development ensues and people argue back and forth about what will and what not work, and at the end of the thread, the original author is left with more questions than answers, more what could be than what should be. That happened to me about two weeks ago. It all started with a Friday evening e-mail to engineering, “We have a vexing leaderboard issue I’m opening up for suggestions.” It was for a property we didn’t write, but one that was in dire need of help. The thread was 20 e-mails in before I responded at 9 AM the next day with (paraphrasing here): “I made this for you. Use it or don’t. At the very least, it can be tested right now as a potential solution.”

And so the php_leaderboard package).

If it had turned out, given the constraints, that php-leaderboard wasn’t going to be a viable solution, it really wasn’t going to phase me. I learned some PHP. I was like Jojo the idiot circus boy with a pretty new pet. Now the pet is my possible solution. Oh, my pretty little pet, I love you. And then I stroke it, and I pet it, and I massage it. Hehe, I love it, I love my little naughty pet, you’re naughty! And then I take my naughty pet and I go …” Wait … what? The point is I found some inspiration to try something new and to present a reasonably complete solution to the problem at hand.

ONE MORE CUP OF COFFEE

You could say that after the PHP port, I developed Search for it and you’ll find it.

MY ANACONDA DON’T WANT NONE UNLESS …

The PHP port was also inspiration for two colleagues to release the python-leaderboard package.

COMFORTABLY NUMB

It felt really good to work in a new language and to re-connect with an old language. And for whatever reason, I need that mental disruption to continue. And so it shall. My current in-progress ports of the leaderboard library are to Go. Porting the leaderboard library works for me because of a number of factors:

1. It is well-defined in terms of behavior/functionality.

2. It requires integration of a service and a service client library.

3. It requires some knowledge of control statements, types and higher-level data structures.

4. It has a well-defined set of tests.

I’ve had to become comfortable with being uncomfortable. New syntax, new packaging, new test framework, new release mechanism. Conformity be damned!

FIN

I would encourage anyone who feels they’re set in their ways or in need of a change to undertake a similar endeavor. I can imagine this applies to any number of disciplines in the video game industry. Find a well-defined, testable problem that you’ve solved again and again and solve it differently.

It’s that simple.

You can find more hilarity over on my Twitter account, @CzarneckiD.

Am I Doing Meaningful Game Design Work?

Original Author: Emmeline Pui Ling Dobson

Throughout my career I’ve experienced peaks and troughs. In the name of professionalism I always try to find something inspiring to grasp on to, a reason to do my task(s) that links the work to some internal motivation. I love to do work from my heart. When I’m not for several months I start to question “Why?” Especially because, back in 2003, I became a game designer for the love, not the money. Being a reflective sort, I’ve examined a lot of different work that I’ve done and considered what makes me happiest and, when not naturally enthusiastic about a responsibility, better mind hacks to help me apply myself. I’d encourage anybody in a situation where you effectively take a pay cut because you care passionately about your work to ensure you really are doing what you love and, if not, to examine what you could do to change your situation.

This is an illustration I first mentioned during a presentation to students on the Games Cultures course at London Southbank University in 2009. It helped me explain the difference between the aspirations we have and what we imagine working in game design will be like and, once we get in, the actual day-to-day roles that junior or intermediate designers typically get assigned. The axes represent aspects of being creative, whether for work or play, that keep cropping up in my thoughts time and time again on my journey:

Creativity_space

Space representing two aspects of the application of creativity

The x-axis represents a spectrum from creating from the inside-out on the left and creating for an external audience on the right. When people say, “you should make the games you want to make” they are expressing that being in the left-hand side of the space is important to them.

On the other hand, one Game Designer I respected once said to me, “The difference between an artist and a designer is that a designer works to a brief.” The far right-hand side of the spectrum shown here is design for other people, working to a brief, keeping the audience constantly in mind to lead the design process.

The y-axis in the space represents high-level design work at the top – aesthetic content, the game world, theme and tone, characters. What the game is about. At the bottom is design work implementing the nuts-and-bolts workings of the game. Tweaking frame data in a fighting game, placing enemies into a level or playtesting and iterating the physics properties of a vehicle could be examples of this type of work.

High-level design work looks at the big picture; low-level design work is focused on the details. There’s no implication that game concept design is “above” implementation. Both are indispensable, and either could be the starting-point for a game project.

From reading different disciplines writing or speaking about their work, you might get the following impression of design roles in the AAA boxed retail games section of the industry:

Creativity_space_industry

Domains that different design department job brackets tend to work in

How can you use this as a tool?

Take stock of your career goals by mapping out the area you want to be working in. Then map where you are in your current role and think if you’re learning the skills and have the opportunities that will get you to where you want to be.

Much of my own games design experience has been implementing or overseeing the implementation of minute-to-minute gameplay, work on story and characters and contributions to pitches1:

Creativity_space_b_AAA

Work as a seasoned game designer – mostly implementation, some concept stuff, not often very well-integrated

However I enjoyed the chances to influence the bigger picture more; I wanted to apply myself to a project with a greater sense of ownership, something with my own stamp on it. Connecting with an audience is part of my creative make-up, too:

Creativity_space_b_dream

Is this your dream job?

It seems that many creative souls enter the games industry with dreams of working in this area.2 Then they find that somebody else is actually making these decisions and in order to stave off frustration if their voice is stifled at the vision level of the project, they start to tell themselves that wanting to work with the game vision is a naïve desire to be grown out of. For me, I came to a point where I decided I needed to find my own angle on getting satisfaction out of my low-level, audience-focused work, and use other avenues for being artistic.

Kareem Ettouney of Media Molecule spoke at Playful 2009 about the importance of allowing your creative staff time and respect for their personal projects. I later saw that this applied to me, that investing the energy to be creative into something, especially when it wasn’t finding at outlet at work, was key to feeling a better balance in life. As a result of my taking inventory, I became more relaxed and focused on what I was doing in my job rather than anxious about what I wasn’t. I started self-directed projects at home, getting creative satisfaction from drawing and painting. I also made plans for areas of the chart I could see myself developing valuable expertise in. I’m now heading towards the bottom-right of the chart:

Creativity_space_b_user

Should most professional game designers be working in this area?

In recent roles I have been designing games for kids and designing learning for 16-19 year old college students. I think most people who enter the games industry, at least in the AAA boxed retail games sector, have ambitions to be doing high-level work with their own personal stamp on it; this is why 90% of articles filed under “game design” in Gamasutra or GameDev.net are about high-level design and vision, not about nuts-and-bolts game design or areas like pacing, player-character progression and designing core mechanics that occupy the middle territory. Designing for an audience and a focus on detail also lend themselves to more reliable quality measurements, helping me reflect on my own work and receive more useful feedback. There are also a lot of learning resources available in this area; cross-over with other design fields is stronger.

I would unlikely turn down an opportunity to work on new swords-and-sorcery AAA games, especially if aimed at younger teenage girls and boys, like the Zelda series. I grew up on airships3, pegasus knights and vespene gas, and that’s an undeniable part of my DNA. But in the current global situation where work in the UK is trending towards motion control games and the march away from new IP continues, I believe it is adaptive to be exploring other ways of fulfilling those creative drives that brought me into games in the first place, while developing my existing design experience with skills that are sought-after both in the AAA sector and emerging markets for game design expertise.

What else could you use this chart for?

You could adapt this tool with any axes that are meaningful for you. Perhaps it’s more important to you to think about whether your company’s vision is for producing great products or for becoming an incredible workplace? (This is usually with the hope that the other will emerge as an effect of progress towards your primary target!) I find it interesting that I could map the Creative / Design / Development departments behind Magic: the Gathering as described extensively on their rich public website. Creative would be in the high-level territory, perhaps with a bit more creating from their own hearts and minds; Design covering a large area in the middle, but leaning more towards the audience; Development further towards the bottom-right of the chart. When building your design team, how about ensuring coverage of most of the space, with clearly-defined, smaller spaces occupied by each job role? Does this just work for design, or for technical art specialisms, programming and production, too?

Footnotes

1 I’ve also been heavily concerned with the meta-game design of how best to do team communication, particularly written specs ready for implementation in code and design change records.

2 When the game design syllabus I taught last Autumn says in its summary, “Game design is about daydreams”, they are promoting high-concept as the whole of game design and potentially misleading students.

3 Unfortunately I did not literally grow up on an airship.


Your engine is deaf and your tool is dumb. What can you do about it? (Part 1)

Original Author: Nicolas Fournel

Paradoxically, most audio engines are deaf. They take decisions that will impact the audio output of a game without listening to what they are playing. The same can be said of sound tools: they will randomize, pitch up and down, loop and combine sounds but they don’t really have any knowledge about the assets they manipulate. To follow up on my previous post “Putting the audio back in ‘audio programmer’”, this series of articles will examine how audio analysis algorithms can be used to create “smarter” (hopefully ;-)) audio engines and tools.

Analyze this!

In game audio, when we talk – rarely – about audio analysis, we often refer to the RMS (Root Mean Square) which gives an indication of the average loudness of a signal, or the FFT (Fast Fourier Transform) which allows us to examine that signal in the spectral domain. Both may be used to debug the audio during the mixing stage and the FFT will sometimes also be used for music visualizers or as the basis of pitch detection algorithms in singing games (more on audio analysis for game design in a coming post).

But besides the RMS, there are many other useful features you can extract from audio signals. For example, you can calculate the spectral flux, rolloff, spread, flatness and centroid. You can detect transients, evaluate pitch and amplitude envelopes, extract resonant modes, separate a signal into source and filter with LPC (Linear Predictive Coding) and so on… You can also go beyond the FFT and use more appropriate analysis functions. For example the Constant Q-Transform is especially well suited for music analysis (e.g. chord detection), the MFCCs (Mel Frequency Cepstrum Coefficients) will give you a condensed representation of a sound – very useful to compare it with others -, the Goertzel algorithm is more efficient if you want to detect the presence of a single frequency with precision etc…

This is why I developed an audio features extraction library at work. Without entering into too much details (I probably can’t anyway), there is a core library of DSP functions, on top of which features extraction plug-ins can be built. Developers can either use the low level math functions, the provided plug-ins, or develop their own. There are also .Net controls allowing the visualization of waveforms, spectrograms, collections of resonant modes etc… The main goal is to help with the research and development of audio algorithms for the runtime (e.g. analysis of the signal coming from the microphone or from one of the busses of the audio engine, intelligent mixing) and to create “smart” audio tools.

On that topic, let’s start with an easy way you can improve the workflow of your sound designers…

A picture is worth a thousand clicks

Imagine you have all these sound effects to design for a new game. What do you think will be one of the most time-consuming tasks you will be performing? Scripting? Adding random variations? Creating an amplitude envelope? Setting an EQ? Think again… A lot less glamorous and creative than audio processing, it is simply browsing and selecting samples… Unless some material has been recorded or synthesized specifically for a given asset, sound designers will indeed spend an awful lot of time browsing their sound effects library, selecting samples and listening to them in order to find the best candidate for that particular sound, or the one which will be perfect to layer with another sample.

This can be quite a cumbersome process. Let’s say you are using a tool which is only offering the regular “open file” dialog box. This is what you will get:

A list of names, and that’s it. You could be selecting anything, from your tax return to pictures of your vacations on Oahu as long as it has a “.wav” extension. You will need to open a file, play it, close it, and then go to the next one etc… Hopefully your in-house audio tool or your middleware will offer a window similar to the ones you can find in a DAW (Digital Audio Workstation) or you will actually use your DAW. For example, here you can see the windows used to open a file in both SoundForge and Protools. More information about the audio format of the file is displayed, and an auto-play feature limits the number of clicks necessary to browse your entire collection…

Now, this is very good when you have 5 samples to choose from, but what if you are looking for the perfect animal growl among two hundreds of them? You will have to click on all these files and listen to all of them before being able to take a decision or to find what you were looking for. Hardly a creative endeavour… If you are using a sample database such as Netmix, you might be able to search per keywords but you will still be dependent on other sound designers to tag the sounds correctly. Also, you will not be able to find samples which have the same perceptual characteristics but have totally different sources and therefore different keywords (more on that in the next post).

Leveraging our audio features extraction system, I developed a few things to help with sample selection. One of them is a small file browser. We can use it as a regular “open file” dialog box from any C# tools. It can display a list of files as any conventional file selector, although you will notice it already indicates the duration, sample rate, resolution and number of channels of the files and has an autoplay feature.

But you can also switch it to a thumbnail mode. In this mode, the waveforms of the various samples will be displayed, giving you an immediate insight about the type of sound: is it a single impact or are there a lot of audio events, how is the amplitude evolving? You can change the size of the waveforms to view more details, or on the contrary get an overview of a whole folder.

More importantly – and that’s where audio analysis comes into play – you can also select any feature extraction plug-in to colour the waveforms (values are normalized at the folder level and results of the analyses and waveform peaks are cached). Below you can see an example using the pitch detection plug-in. However, it could be the spectral flux or any other feature of interest. For example you could be browsing your music loops and select the beat detection plug-in, in which case a loop with a low tempo would appear darker and a faster one would appear lighter.

In the case of the picture above, the palette goes from black / dark blue for low pitch to yellow / white for high pitch. Therefore it is very easy – just by looking at your whole folder – to find a sound that corresponds to what you are looking for (e.g. a sample that starts with a high pitch which slowly decreases, a vocalization with a lot of vibrato), without having to listen to dozens or even hundreds of sample files…

Here you can see that “Camel_groan.wav” is lower in pitch than the “Cat_angry_meow.wav”, which is itself lower than the “Coyote_howl.wav”. As expected after watching countless spaghetti westerns, the pitch of the latest goes up quickly, stays relatively stable for a while, and goes down again slowly. No animals were harmed during the analysis, by the way ;-).

To be continued…

This opens the door to a lot of interesting things. For example we could add a sorting button to rearrange the files based on some meta-feature such as “average pitch” or “pitch variation”. We could also add a way to query sounds with a slightly higher tempo, or with a longer attack than the one currently selected etc… Knowing more about your assets offers a lot of new opportunities to improve workflow, even when it is just about selecting samples. More about that and how to make sound effects databases more helpful (using neural networks) in the next post…


The Quest for the Ginkgo GUI

Original Author: Martin Pichlmair

This post is part one of a short series of posts about my progress in searching a viable GUI solution for our in-game/live editor. It is live coverage of my research and implementation. I welcome any and all feedback.

We’ve integrated live-editing into AntTweakBar. Now that the system has outgrown what AntTweakBar has on offer, I’ve started the epic quest of finding a suitable OpenGL-based GUI solution. Our requirements are straight-forward:

  • Platform independence (i.e. Windows, OSX, possibly Linux and maybe even iOS)
  • Unicode support
  • Simple skinning
  • Either scriptable or easy to abstract
  • Copy&paste and Undo supported, or easy to add

I’ve spent the last two months on and off evaluating GUI libraries and toolkits in order to find the perfect one. Sadly, all of them have proven to have their downsides. Let me summarize what I’ve found out so far.

A failed experiment but you can see AntTweakBar in the top right corner.

What’s Out There

Of course I started my research by looking at projects that already have a working live editor and trying to find out what tools they’re using. Berkelium that uses Google’s Chromium (itself a modified WebKit, as far as I can tell) instead. Both offer the benefit of being easy to skin and script since they are windowless browsers rendering HTML and executing JavaScript. On the downside, they are pretty big, adding 80MB of foreign code to your project. Since we’re eager to keep control over our source code and would love to maintain portability, even for the in-game editor, both of those sound like a less than perfect solution for us. Of course, one could substitute Chromium/WebKit with EA’s open sourced portable WebKit, but that looks like a major engineering task while all I wanted was a window with some sliders and buttons.

A somewhat similar solution two the above is libRocket, the poor man’s ScaleForm. It’s a freshly implemented wrapper for a subset of XML that approximates a subset of HTML. In other words: You can build your GUI in HTML and the library is less than 80MB large. Sadly, I found it unnecessarily difficult to get the library running and it is more suited for in-game GUIs than for procedurally generated editor panes.

On the other side of the spectrum lies Capcom Game Studio Vancouver’s tool chain for Dead Rising 2 which they explained in great detail over at Pure Data, the open source audio programming language / live instrument I used to work on years ago. You’ve got a server – the game, the audio engine, … – and a client GUI that connects to the server. All communication between the two goes over sockets. The upside of this approach is that you can live-edit your game even on a console, with the editor running on your workstation. The downside is that this approach is less suitable for in-game editing (where the editor lives in a window in your game context). Every coin has two sides.

Apart from these two there are numerous GUI libraries out there of varying quality and simplicity. CEGUI seems behind the curve nowadays, with guichan are two packages of straight and simple C++ libraries I’ve recently had a look at.

A Quick Rundown Of What I’ve Found So Far

I’ve looked at a number of GUI libraries and attached a couple of them to our renderer – or at least configured their OpenGL rendering and hooked them up with our resource manager. Here’s a quick rundown of the pros and cons of those libraries:

  • AntTweakBar: Perfectly suited for live-tweaking values. I’ve rarely seen source code as dense. Reminds me of Pearl more than of C++. Due to that it is very hard to extend and not suited for anything else but tweaking strongly hierarchically displayed data systems.
  • Awesomium: A comfortable WebKit-based package for OSX and Windows. Closed source, unless you pay $$$. Can’t say how hard it would be to port it to Linux and/or consoles. 80MB large, which is hefty for a downloadable game. Easy to script the GUI in HTML5 and JavaScript. Simple skinning through CSS.
  • Berkelium: Same as Awesomium plus Open Source. Linux support available, yet similarly unfathomably hard to port.
  • libRocket: Does it’s own rendering but there’s not JavaScript support. I found it impossible to hook the system up with our resource system. But it sounds intriguing. Simple skinning with CSS. Lively community.
  • Qt: Replaces your application framework like so many good tools out there. Visual designer available. Lively community. All bells and whistles, but we’ve built our game around the easily substitutable and portable SFML instead.
  • Cocoa: OSX-only but, boy this library is good. Visual editor, beautiful and well working widgets, easy to integrate bindings. Object C-based and closed-source, so importable to other platforms.
  • MFC: Insert random joke here.
  • guichan: A small and portable GUI lib. Makes heavy use of exceptions but features quite good code quality. Widget set is rather small and there’s no copy&paste or undo management.
  • Gwen: Like guichan, Gwen is a straight C++ GUI lib. It does support copy&paste and does not require exceptions. Very good code quality from what I can tell. Made by the guys who make Garry’s Mod.
  • Tkinter: The standard python GUI. While the ugly standard flow layout might not be the perfect fit for everyone, scripting a GUI is the only viable replacement for a visual GUI editor.

Browsing my object hierarchy over telnet.

What I’m Doing Next

This week and the coming week I’ll implement a socket-based interface to our game engine. Then I’ll design a GUI that hooks up with that. What I’m planning is more of a browser than a full-fledged level editor. That way I hope to maintain the live-editing features of AntTweakBar but make everything far easier to access. I’ll keep you covered how that goes.


Do we live for what our games give to people?

Original Author: Ann-Cudworth

As some of you may know, I have recently developed an interest in making the games that I design for Second Life and other virtual worlds as accessible to disabled players as possible.  Over the last 10 days, one of them has been shown at SL8B, the laggy, cacophonous, eye candy filled street fair that appears each Spring on the islands of Second Life to celebrate another year of their survival.  It was time, I felt, to introduce the beginning concept of our next big game effort, called “the gods that walk among us”, and what better way, than to bring in some of that to be experienced by the wandering and curious populations who having tired of eating virtual cake, need a riddle or two to cleanse their palettes?

So, hmm, how do you actually make a game that can be experienced equally by a person with low vision and a person with hearing issues?  Furthermore, how do you do that when you only have 10 sec low-fi sound clips, and limited space and geometry?

How do you make it redundant enough so they understand the need to use a private channel for each local chat answer, and yet not be distracted from hearing the clues, which are only presented once? What about sound bleed, and the confusion that will cause to a visitor in a tiny area.

And lastly, how do you keep the space  ”in architectural character”  an environment that says, “come in- all of you- you can play here”.

Tough questions, and a significant design challenge for two weeks of building time. Did I succeed?  On some levels perhaps, and on others, things took an interesting turn or two.

Our next game “the gods that walk among us” is about mental and emotional transformation, about how everyone can have “godlike” powers in virtual worlds, and about how the search for these qualities will carry us across the Metaverse.

I used the visual metaphor of masks, one for each of the ancient elements- Earth, Air, Fire and Water as the key ingredient. Images of them were displayed on the exhibit, and you would win each mask by figuring out the riddles presented to you by their elemental icons distributed around the exhibit.

Having some experience with Second Life game players, I expected to see some folks hang around for long periods while they played the game, and to get sent commentary on their progress, if they happened to notice me hanging around, which I did frequently.

What I did not expect, was for them to take the elemental masks and to make their own games out of them, and to be honest, this thrilled me.  It also got me to wondering, why do we like to make games so much?  Some deep seated need to control people? Or perhaps just the same kind of joy that anyone feels when they watch someone else play.

I was given art work with images of my masks in it.  I was involved in a prolonged photo session with a small red panda.  People took the masks to parties, and wore them home.  It interests me to note, that although the avatar is one kind of mask, putting a mask on top of that would elicit yet another kind of behavior, as if  it frees the avatar, just as it does a real person at a party.  I also noted that designing for accessibility was just a support structure eventually, not the focus, which is something I am pleased with. Seamless access, that is a good goal.

Watching people play- that is my reward.

Images are on my Flikr site here: 

Running a games company start-up – what I’ve learnt so far

Original Author: Pete Collier

It has been almost 6 months now since I co-founded our games company Hogrocket. I’ve learnt a lot over this period and it feels like the right time to write some of that down and share it. I hope that it’ll be of interest to you. Some of the points may be obvious; all that I’ll say is that I set up the company to learn and I certainly don’t claim to be any kind of authority. The following constitutes the most important discoveries and lessons I’ve learnt so far:

Making decisions is critical for momentum: Indecision and constant discussion is a comfort trap and a big time sink. Make decisions with conviction and then learn your lessons. You’ll learn far more than endlessly debating over the perfect course of action (which doesn’t exist anyway).

You’ll swiftly learn your strengths and weaknesses: Which is fantastic because knowing them is a strength in its own right! Working in a small team is like a boiling pot, things will bubble to the surface quickly. This tends to give you a heightened sense of everyone’s skills and character traits, including your own, which is a welcome side-effect to the volatility.

You’ve got to be in the same room together: Working from home seemed like a good idea, we would save on fuel, use Skype and not have distractions. However, creative collaboration needs face-face communication to effectively exchange ideas and the energy behind them. Even more important is focus, when working remotely it’s all too easy for team members to start pulling away and heading in their own direction. Working remotely becomes an exercise in reigning things in rather than getting stuff done.

Running your own company is a different kind of stress: Note that I didn’t say more stress, I can certainly attest to the stresses of working for someone else. The difference is the increase in stress you place upon yourself. There is more internal pressure than the usual external because things matter much more to you personally. As a result I’ve gained more self-discipline, more appreciation for the benefit of exercise and just as critically, the need for rest.

Let your experts be experts: If you’re contracting people to do work for you then give them clear direction on what you want to achieve and then get out of the way. Just because it’s your company or your project it doesn’t make you an expert on everything. Sometimes sticking your oar in can serve only to muddy the water. You’re paying them for a reason; because they can do something you can’t do yourself.

Make money: This attitude you may think is not becoming of an ‘indie’ studio doing things for the love. And although I care a lot about what we’re creating ‘love’ is not guaranteed to pay the bills and I’m getting kind of tired of the romanticism surrounding being ‘indie’. I want to create a viable business, let’s not fool ourselves here, whether making games or crackers if you’re not selling then you’re screwed. Seth Godin always talks about the fear of shipping and he is absolutely spot-on. I think we’ve created a great first game and hopefully it will make money but I’ll be a lot more comfortable once we’ve shipped it and next time I don’t plan for us to take nearly as long.

Humility: I’ve gained a much greater admiration for those who manage to run successful companies and projects. This has been particularly apparent for me because after writing my blog for a year now I’ve realised quite how much of a difference there is between saying and doing. I’ll be held accountable to what I have written by the quality of our games and quite honestly this scares me to death! But there’ll be no greater judge on those accounts than myself.

That final point is a good place to conclude this review because it ties in with a broader lesson that I’ve learnt – which is the need to make mistakes for yourself, its one thing to be told, but quite the other to learn the hard way. Lessons tend to sink in more when they affect you directly! So in this regard the life experience has been invaluable and worth taking the risk for.

I would love to hear about your own experiences, feel free to contact me or comment below.

This article was originally posted on my blog: Running a games company start-up – what I’ve learnt so far

RTFM – TL;DR

Original Author: James Podesta

When I was young, I used to read all manuals like I read my fantasy novels. Take them into my bedroom for the weekend and read from the first page to the last, in order, skipping nothing. I would devour every piece of information in the order they chose to present it to me, and miss nothing.  By the end I would have a clear understanding of all the pieces of the puzzle and how they fit together.

 

These days I just code with the knowledge I have,  guess how things might be and, when I get stuck, I jump into a reference manual or jump onto google and find the bit of information I’m missing. As a consequence, I only just found out a few days ago, purely by an accidental web link, that STL provided a bitset template.  I can see understand perfectly why I didn’t know this existed. It’s not that I haven’t had use for a bitset template in the past,  its just that its not something complex enough that I would ever needed to google for information on it.

 

The combined knowledge of the universe is now at my finger tips, kindly indexed by keywords. I know the answer to the ultimate question of life, the universe and everything is in cyberspace, if only I can find the right question.  But does this mean I don’t need to store information in my head anymore – just retask my brain to be a master of composing the correct keywords for a problem.  Is it really ok to be crippled if I have no internet connection.

 

And its not just knowledge that has been replaced by keyword access. When I was young, my pirate friends – not me, I would never – would trade games via mail (notice no ‘e’ prefix) on cassettes (young readers should google this if you don’t know what a cassette is). Now you just type in a keyword into Vuze on your Mac – again, not me, I don’t even know what Vuze is – and you get a list of every song, tv show, movie or game and within minutes its yours – well, not yours, but you have it.  Actually, even googling is too much effort.  Just highlight the word in your browser and Apture will find the information for you.

 

Admittedly, I appear to have a lot less time than I used to when I was young. I’ve done numerous experiments, including watching clocks in 70s videos, and it certainly appears that time hasn’t sped up, at least in no way I have been able to measure. Maybe its facebook, twitter, dayjob, family and nightjob – I certainly didn’t have all these when I was younger. I could just take a book into the bedroom for the weekend and emerge 3 days later a little bit smarter – yeah, it was a sad childhood really.

 

So are our new iBrains really a good thing?  Time to put on the opinion cap.  Yes, its awesome having access to blogs and white papers and answers to everyones questions,  but we need to temper that with discipline. The most common forum post these days is “read the manual”,  because most questions are clearly answered in the manual but no-one bothers reading now. Its easier and quicker to jump on a forum and ask if anyone else knows the answer – maybe someone who has actually read the manual.  (Actually, I have no idea what the most common forum post is, I totally made that up to support my argument. ).

 

So am I going to return to reading manuals front-to-back?  I was all ready to say “absolutely, I will make a point of always reading manuals properly from now on!”  But to be honest, I doubt it.  There’s a simple rule of the universe that all things follow the path of least resistance.  While humans are capable of using intellect to break this rule,  they generally don’t bother.  BTW, this is also a great rule to follow if your ever designing tools or interfaces that you want other people to use – make sure your stuff is the path of least resistance…

So what’s the morale to all this?

TL;DR

 


Platform Abstraction with C++ Templates

Original Author: Michael Tedder

In a post a few months ago here on #AltDevBlogADay, Aras Pranckevičius discussed three approaches to eliminating the virtual function call overhead when using different class implementations on different platforms.

One approach that wasn’t introduced in that post was one which utilizes templates.  Although there was some discussion in the comments about making use of templates, some questions remained unanswered and no code was given, so I figured I’d post about how I use templates in my engine and hopefully fill in any missing information.

I’ll also show how to keep the code within a source file, avoiding the need to expose the implementations from a header — a common complaint with templates.  To keep things simple, I’ll just be showing how to implement a multiplatform debug print() function which sends a string to the debugger or console.

#define Your Platforms

The first step is to give each of the platforms you support a unique ID, adding each as a #define to a globally-#included header file.  If you’re already doing multiplatform development, you most likely already have such a list.  For example, something like the following will suffice:

1
 
  2
 
  3
 
  4
 
  5
 
  
#define PLATFORM_WINDOWS	1
 
  #define PLATFORM_LINUX		2
 
  #define PLATFORM_MACOS		3
 
  #define PLATFORM_ANDROID	4
 
  #define PLATFORM_IOS		5

Next, you’ll need to do whatever is necessary to detect the platform you are compiling for and #define it to one of the above values:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  
#if defined(_WIN32)
 
   #define PLATFORM_ID		PLATFORM_WINDOWS
 
  #elif defined(__ANDROID__)		// must come before __linux__ as Android also #defines __linux__
 
   #define PLATFORM_ID		PLATFORM_ANDROID
 
  #elif defined(__linux__)
 
   #define PLATFORM_ID		PLATFORM_LINUX
 
  #elif defined(__MACH__)
 
   #include <TargetConditionals.h>
 
   #if (TARGET_OS_IPHONE == 1)
 
    #define PLATFORM_ID		PLATFORM_IOS
 
   #else
 
    #define PLATFORM_ID		PLATFORM_MACOS
 
   #endif
 
  #endif

How It Works

Instead of declaring a base interface class with virtual functions then deriving each platform with a different implementation, we declare a class with one template parameter — a platform ID — then specialize it to provide a different implementation for each platform.  The template class is then typedef‘d to expose the specialization for the platform ID being compiled to the application, allowing the implementation to be used without any virtual functions and also allow for inlining of functions as well.

One of the more interesting features of using a template-based approach is that if a specialization for a specific platform isn’t defined, then the initial template definition can be used as a ‘default’ or generic implementation.  This allows for:

  1. Platforms that do not require any specific implementation can use the generic implementation, cutting down on the amount of code necessary, and
  2. Easier porting to different platforms (as the generic implementation can be used until a specialization is provided), avoiding a mass of compilation errors when a new platform is added.

The default template definition and specialized declarations are placed in the header file.  We’ll put this code in a separate namespace to allow us to use the same class name for both the specializations and the interface exposed to the application:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  28
 
  29
 
  30
 
  31
 
  32
 
  33
 
  34
 
  35
 
  
#include <cstdio> 
 
   
 
  namespace Private
 
  {
 
  	// generic declaration (the base interface class)
 
   	template <int PlatformID>
 
  	class Debug
 
  	{
 
  	public:
 
  		static void print(const char *str);
 
  	};
 
   
 
  	// specialization for Windows platform (the derived class for Windows)
 
  	template<>
 
  	class Debug<PLATFORM_WINDOWS>
 
  	{
 
  	public:
 
  		static void print(const char *str);
 
  	};
 
   
 
  	// specialization for Android platform (the derived class for Android)
 
  	template<>
 
  	class Debug<PLATFORM_ANDROID>
 
  	{
 
  	public:
 
  		static void print(const char *str);
 
  	};
 
   
 
  	// generic platform (base interface class) implementation
 
  	template <int PlatformID>
 
  	void Debug<PlatformID>::print(const char *str)
 
  	{
 
  		::puts(str);
 
  	}
 
  }

In the code above, we declare two specializations: one for Windows and another for Android.  The generic implementation, which will be used on all other platforms (Linux, MacOS, and iOS) is also provided, and defined to simply chain to the C library’s puts() function.

Note that code for the generic implementation is required to be in the header.  This is so the compiler can supply a function for the template which does not have an explicit specialization.  For example, when compiling on Linux, the linker will look for a function named Private::Debug<2>::print(const char *).  If that function isn’t defined in any of the source files, and there isn’t any generic implementation, then the linker will give you a nice error message.

Next, we’ll add the typedef in the header to allow the application to utilize the proper implementation for the platform being compiled:

1
 
  
typedef Private::Debug<PLATFORM_ID>	Debug;

We also need to provide the specializations for both Windows and Android platforms.  These specializations can be placed in a source file, and defined by simply using our Debug class definition, since we typedef‘d it in our header above:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  
#include "debug.h"
 
   
 
  #if (PLATFORM_ID == PLATFORM_WINDOWS)
 
   
 
  #include <windows.h>
 
   
 
  // implementation for Windows
 
  void Debug::print(const char *str)
 
  {
 
  	::OutputDebugStringA(str);
 
  	::OutputDebugStringA("n");
 
  }
 
   
 
  #elif (PLATFORM_ID == PLATFORM_ANDROID)
 
   
 
  #include <android/log.h>
 
   
 
  // implementation for Android
 
  void Debug::print(const char *str)
 
  {
 
  	::__android_log_print(ANDROID_LOG_INFO, "MyApp", str);
 
  }
 
   
 
  #endif

Finally, a simple main() function to show how it’s used:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  
#include "debug.h"
 
   
 
  int main()
 
  {
 
  	Debug::print("hello world!");
 
  	return 0;
 
  }

… and you now have a cross-platform debug print function.

But Wait, This Wasn’t About Graphics!

Indeed, the post Aras Pranckevičius made earlier discussed about abstracting a graphics device based on the platform.  The same logic can be applied here as well, but only to those platforms which have only one kind of device.  For example, a PS3 might only ever use a GCM device, and could be a likely candidate for using this type of abstraction.  For a PC however, it is possible to have both OpenGL and DirectX rendering support (or software rendering, anyone?), and dynamically switching between these interfaces is something that cannot be done at runtime with templates.

Some good candidates for using this template-based abstraction would be OS low-level support classes, such as: events, mutexes, fibers, threads, and clock/timer handling code — classes which need to be efficient and only have a one-to-one mapping with the platform.  A vector math/SIMD class could also be a good candidate as well, as the member functions can be inlined.  For higher-level classes that provide graphics and sound support, using an interface class with virtual methods is sufficient enough.

The point to be made is that there it is important to use the right tool for the job.  The template abstraction method presented here is not intended to solve all of your problems, it is just one method of many to assist in cross-platform development.