06 April 2018

Trashing the cache: C++ vs C#

In my previous post I showed how data locality can improve your program's performance. Since then I have been wondering whether these same optimizations are valid in C# too. (If you haven't read the previous post you'd better do that first, or this won't make much sense)

Integer array

Thus again, we loop over an array of integers with increasing step size, inspired by this blogpost. The goal is to illustrate the impact of cache-fetches, where we need to fetch data from memory into the cache instead of having a cache-hit. In C# we have almost identical code to the C++ version:

const int length = 64 * 1024 * 1024;
var arr = new int[length];
for (int k = 1; k <= 1024; k = k * 2)
{
  for (int i = 0; i < length; i += k)
  arr[i] = i;
}

We measure how long it takes to go through the array and perform an operation on each value, with an increasing step size of powers of two. That gave me these results:

That's quasi identical to the C++ version:

In other words, we're just as memory-fetch-time bound in C# as we are in C++. This should not come as a surprise.

GameObjects

We create a similar GameObject class as in the C++ version:

class GameObject3D
{
  public float[] transform = {
    1,0,0,0,
    0,1,0,0,
    0,0,1,0,
    0,0,0,1
  };
  public int ID;
}

Here we stumble upon a difference between C++ and C#: creating an array of these objects in C++ allocates the memory and initializes it with the default constructor on every instance. In C# we must write this ourselves:

GameObject3D* arr = new GameObject3D[length];

vs

GameObject3D[] arr = new GameObject3D[length];
for (int i = 0; i < length; i++)
  arr[i] = new GameObject3D();

The timings are very similar. Again, we are bound by cache speed:

Here's my alternative again:

class Transform
{
  public float[] values = {
    1,0,0,0,
    0,1,0,0,
    0,0,1,0,
    0,0,0,1
  };
}

class GameObject3DAlt
{
  public Transform transform;
  public int ID;
};

Which yields this graphic:

Wait what? What happens here? The chart has a different shape than the C++ version and is almost twice as high? C# clearly introduces some overhead here. The talk of Joachim Ante on the new Entity-Component system of Unity gave me the idea to try a struct version of our classes; because we are creating the game object instances one by one they're not guaranteed to be located next to each other in memory. So I tried this instead:

struct SGameObject3DAlt
{
  public Transform transform;
  public int ID;
};

Unlike arrays of classes, arrays of structs are co-allocated in memory and initialized, causing the same speeds as we had in C++ (even a little bit faster!):

Ok, that's it. We re-created all tests in C# and we achieved quasi identical results. In conclusion: C# is just as memory-fetch-bound as C++, and in C# you carefully need to think whether or not you want to use structs instead of classes. But the fact that we achieve at least the same timings also shows that C# is as fast as C++ in these rather contrived tests.

No comments: