Since the Xbox360 version of my game Jigsaw Guru was released on the marketplace a week ago, I thought I could write a post about the main changes I had to make to go from a mobile device to a living room console. Those changes basically fall under 3 categories: screen, input, and storage.

Screen

A TV screen is obviously bigger than the screen of a phone, but more importantly it has a different resolution and aspect ratio. The difference can actually be huge, if the game runs in portrait mode on the phone:

Jigsaw Guru on Xbox360, 1280*720

Jigsaw Guru Free on WP7, 480*800

Fortunately for me, the initial paid version of Jigsaw Guru runs in landscape mode on the phone, but going from 800*480 to 1280*720 still means all the graphics elements have to be scaled and positioned according to the width and height of the screen, which I do with code looking like this:

successRect.X = (int)(Renderer.Instance.DeviceWidth * 0.1f);

successRect.Width = (int)(Renderer.Instance.DeviceWidth * 0.8f);

successRect.Y = (int)(Renderer.Instance.DeviceHeight * 0.4f);

successRect.Height = (int)(successRect.Width * completedText.Height / completedText.Width);

spriteBatch.Draw(completedText, successRect, Color.White);

These lines are used to display the "Puzzle Completed" texture; successRect is an XNA Rectangle, Renderer.Instance is a singleton I use for all my rendering code, and as you can see all the dimensions I need are expressed as a percentage of the width or height of the screen. This is a pretty convenient system, for example tweaking the value returned by the DeviceHeight property was enough to make room at the bottom of the screen for the ad banner of the free version, and force the game to only use the top 90%. This won't solve every single case though, sometimes scaling the UI is not enough and a different layout needs to be used; but even if you're only working on 1 platform, it's better to not hardcode your sizes and positions in pixels, in case you want to change the resolution or support several aspect ratios (such as 4/3 versus widescreen) later on.

Televisions have a very annoying property: they don't display the whole image your console sends them. Depending on the model, up to 10% of the screen can be missing on each side, and you should therefore only draw important information in the remaining 80% zone in the middle, which is known as the "title safe area" (this is a certification requirement, if you don't follow it your game will not be approved on Xbox360). This actually took a big portion of the time I spent porting Jigsaw Guru to the console: I couldn't use the previous trick to make the game fit in the title safe area, because 1) you don't want to have a big black border on televisions that can display the whole image or most of it, 2) you don't want to only use 80% of a phone screen that's small enough already.

For this particular problem, I still tried to avoid having too much platform specific code everywhere I draw text and icons, by using a mix of small functions and constants like the following ones:

#if XBOX

public const float textTop = 0.10f;

public const float textBottom = 0.90f;

#else

public const float textTop = 0.07f;

public const float textBottom = 0.93f;

#endif

For example, the "Select Game to Load" title in the previous screenshots is positioned like this:

textPosition.Y = Renderer.Instance.DeviceHeight * textTop;

instead of directly using DeviceHeight * 0.07f like I could do it if I was only developing for the phone. This makes sure the text is low enough to be entirely contained in the title safe area on the Xbox360. Also, to verify everything is correct on all the screens, I have a boolean I can toggle in debug mode to draw the title safe area limit as a red line:

Input

Playing with a gamepad is of course totally different than playing with a touch screen, no surprise here. Therefore, the main input related change in the Xbox360 game is that the player moves a cursor on the screen to drag and drop puzzle pieces, instead of moving her finger. Also, some icons became unnecessary: for example, you don't tap a pause icon to go to the pause menu, you just press the Start button on the gamepad. But the modifications don't stop here.

The console can have up to 4 controllers connected to it, and a player must be able to pick any of them to play. That's why every Xbox360 game has a "start screen" before its main menu, where the controller used to press Start is detected, and remembered for the rest of the session (changing controller mid-game doesn't have to be supported, it's not a requirement).

In menus, you have to keep track of a "currently selected" item: on the phone, any icon can be tapped at any time, but on the console the player can only navigate to the previous or next item one step at a time.

Finally, games usually have a "controls screen" like the following one, and I didn't have anything like it on the WP7:

Storage

Saving settings and games to isolated storage and reloading them is fairly easy on WP7. On Xbox360 however, this is a different story: players may or may not have memory units, if they do they need to be able to select which storage device (including the hard drive) they want to use, and the game should never crash even if the selected device is removed at any time (while playing, loading, saving, you name it). In my opinion, this is the most annoying requirement to fulfill, it's not particularly difficult to implement it, but it's hard to think about all the different cases and places where something might go wrong and test them. Here are two things you can do to make your life easier, and hopefully avoid failing during peer review.

Use the EasyStorage library. Lots of Xbox Live Indie Games developers do it, and that's what I did too, even if I wouldn't have needed it for the WP7 version. I don't use the asynchronous functions because the files I read and write are very small and don't require any kind of progress bar, but I know they're here in case I want them in a future project.

Protect all your file accesses with try/catch blocks. Even checking if a file exists can fail if the storage device was removed, so don't be shy, and better be safe than sorry. Try to give some feedback to the players if you can, so that they don't assume their game was saved if it wasn't, they would hate you for that the next time they play (I know I would).

The other way around

What about going the other way around, and porting an Xbox360 game to the WP7 platform? Well, I haven't done that so I don't have much experience with it, but from the top of my head there are two phone specific things you'll have to pay attention to, in addition to what I have mentioned in this article: tombstoning, and the use of the Back button. Those are actually 2 of the main reasons for failing submission on the WP7, even when you're writing a game directly for that platform, so don't wait until the last minute to take care of them, the earlier you know what modifications they will require in your project the better.

Quest for speed

Last December, I experimented for a while with SPH (Smoothed Particle Hydrodynamics), using C# and XNA as usual. Of course, the more particles you can simulate the better, and I tried to optimize for speed the different parts of the method as much as possible, which even led me to use a SOA (Structure of Arrays) rather than the more natural AOS (Array of Structures) for my data, since I basically ended up being limited by the speed of memory accesses and cache misses. Anyway, I thought I had done my best and I had no plans to multithread this code, when I ran into an explanation of Parallel.For and Parallel.ForEach on the internet, and decided to give it a try.

The original code

Let's start with a simple function:

private void MoveParticlesSOA(float deltaTime)

{

Vector2[] positions = particleSystemSOA.Position;

Vector2[] forces = particleSystemSOA.Force;

float xMax = dimensions.X;

float yMax = dimensions.Y;

for (int p = 0; p < particleSystemSOA.NbParticles; p++)

{

Vector2 position = positions[p];

if (position.X < 0)

position.X = dimensionEpsilon;

else if (position.X > xMax)

position.X = xMax - dimensionEpsilon;

if (position.Y < 0)

position.Y = dimensionEpsilon;

else if (position.Y > yMax)

position.Y = yMax - dimensionEpsilon;

positions[p] = position;

particleSystemSOA.UpdateParticle(p, deltaTime);

forces[p] = Vector2.Zero;

}

What this code does is for each particle, it first makes sure the position is inside a 2D rectangle (these are the if/else lines), then it asks the particle system to move the particle, and finally it resets the forces applied to the particle to zero (for the next frame). Since there are potentially a lot of particles to deal with, how would we go about multithreading this function, so that it can take advantage of 2, 4, or more cores?

Parallel.For

This is what it looks like with Parallel.For, after I added some #if/#else/#endif statements to easily enable or disable multithreading:

private void MoveParticlesSOA(float deltaTime)

{

Vector2[] positions = particleSystemSOA.Position;

Vector2[] forces = particleSystemSOA.Force;

float xMax = dimensions.X;

float yMax = dimensions.Y;

#if PARALLEL

Parallel.For(0, particleSystemSOA.NbParticles, p =>

#else

for (int p = 0; p < particleSystemSOA.NbParticles; p++)

#endif

{

Vector2 position = positions[p];

if (position.X < 0)

position.X = dimensionEpsilon;

else if (position.X > xMax)

position.X = xMax - dimensionEpsilon;

if (position.Y < 0)

position.Y = dimensionEpsilon;

else if (position.Y > yMax)

position.Y = yMax - dimensionEpsilon;

positions[p] = position;

particleSystemSOA.UpdateParticle(p, deltaTime);

forces[p] = Vector2.Zero;

}

#if PARALLEL

);

#endif

}

Simple, isn't it? The 'for' line is replaced with Parallel.For, and what used to be the loop's code is now a delegate that can be called on several threads simultaneously for different particles. All the synchronization is taken care of automatically, it's pretty awesome if you ask me!

Shared memory

Parallel.For and the other methods of the Task Parallel Library don't do anything special regarding shared memory, and it's the programmer's job to protect the latter from data races and simultaneous accesses from different threads. Let's see another function from my SPH class as an example (sorry it's a bit long, there's no need to read it line by line as I will explain the important parts right after the code):

private void UpdateDensityAndPressureSOA()

{

Vector2[] positions = particleSystemSOA.Position;

float[] densities = particleSystemSOA.Density;

float[] pressures = particleSystemSOA.Pressure;

short[] gridIndex = particleSystemSOA.GridIndex;

for (int p = 0; p < particleSystemSOA.NbParticles; p++)

{

densities[p] = 0f;

neighborCount[p] = 0;

}

#if PARALLEL

ParallelOptions options = new ParallelOptions();

options.MaxDegreeOfParallelism = -1;

Parallel.For(0, particleSystemSOA.NbParticles, options, p =>

#else

for (int p = 0; p < particleSystemSOA.NbParticles; p++)

#endif

{

int nbNeighbors;

int[] neighbors = grid.GetNeighbors(particleSystemSOA, gridIndex[p], out nbNeighbors); // NOT GOOD FOR MULTITHREADING

for (int n = 0; n < nbNeighbors; n++)

{

int neighborIndex = neighbors[n];

if (neighborIndex < p)

continue;

if (neighborIndex == p)

{

densities[p] += selfDensity;

continue;

}

float deltaX = positions[p].X - positions[neighborIndex].X;

float deltaY = positions[p].Y - positions[neighborIndex].Y;

float r2 = deltaX * deltaX + deltaY * deltaY;

if (r2 < h2)

{

float diff2 = h2 - r2;

float density = poly6FactorMass * diff2 * diff2 * diff2;

densities[p] += density;

densities[neighborIndex] += density;

float r = (float)Math.Sqrt(r2);

if (neighborCount[p] < maxNeighbors)

{

int tableIndex = p * maxNeighbors + neighborCount[p];

neighborTable[tableIndex] = (short)neighborIndex;

neighborDist[tableIndex] = r;

neighborCount[p]++;

}

if (neighborCount[neighborIndex] < maxNeighbors)

{

int tableIndex = neighborIndex * maxNeighbors + neighborCount[neighborIndex];

neighborTable[tableIndex] = (short)p;

neighborDist[tableIndex] = r;

neighborCount[neighborIndex]++;

}

const float restDensity = 100f;

const float gasConstant = 0.1f;

pressures[p] = gasConstant * (densities[p] - restDensity);

}

#if PARALLEL

);

#endif

}

I won't go into the details of this function since they're not relevant to the current discussion, as the name indicates it basically calculates the density and pressure of the fluid at each particle's position. But there are two interesting things we haven't seen yet that I want to discuss.

The first one is just after the first #if PARALLEL line: as you can see, options can be passed to Parallel.For, such as the maximum number of threads you want to be used. Here I specified -1, which is the default and means all the available cores will be used, but you could pass 2, 4, etc.

The second topic I need to talk about is the line with the NOT GOOD FOR MULTITHREADING comment. This line in itself has no problem, but it calls a function of a grid that's used to find the neighbors of a particle (that is, other particles that are within a given radius of the current one) without having to test all the particles each time. Since a particle usually has a bunch of neighbors, the GetNeighbors function would initially store their indices in an array that was part of the grid object, and return a reference to that array. Once this function can potentially be called by several threads at the same time, this approach doesn't work anymore: the threads all try to access the same piece of memory simultaneously, and you can trust me when I tell you the SPH simulation became totally unstable as a result.

How do we fix that? Well, since I was only trying Parallel.For for the sake of trying it, I did the simplest thing I could think of just to verify that would fix the simulation and there wasn't any other problem: GetNeighbors allocates and returns a new array each time it is called, which means each thread gets its own piece of memory, each time it calls the function (that's the annoying thing, since one thread could totally reuse the same array for all the particles it handles). That's a lot of allocations per frame, that sounds crazy, and I don't recommend doing it in real code, although we're going to see in the next section that it wasn't so bad after all.

Performance

So far we've seen that Parallel.For is very easy to use, definitely much much easier than managing a bunch of threads manually, managing a bunch of BackgroundWorker objects, or using the ThreadPool class (even if I haven't shown these methods in action, and I'm not saying they don't have some valid usage cases). But what about performance?

Unfortunately, I don't have numbers to give you, since like I mentioned before, my algorithm was already memory bound when running on one core, and processing power was not the issue. But let me say this: despite the memory bottleneck, and all the extra allocations made in the GetNeighbors function, the multithreaded version of my SPH simulation still managed to run faster than the non-multithreaded one, which is a great sign in my opinion. And eventually also shows allocating memory is crazy fast in C#, and the garbage collector does a good job at quickly getting rid of short lived objects, at least on PC.

I really encourage you to read a bit more documentation about the Task Parallel Library, and try it for yourself on some existing code that could use more multithreading, I think you'll be happily surprised.

Sunday, January 30, 2011

The main differences when porting a game from Windows Phone 7 to Xbox360

Screen

Input

Storage

The other way around

Monday, January 17, 2011

Parallel.For: easy multithreading in C#

Quest for speed

The original code

Parallel.For

Shared memory

Performance