Zero Initialisation for Classes

Original Author: Thomas Young

(Also posted to series of posts about Vectors and Vector based containers.)

This is a response to comments on a previous post, roll your own vector, and has also been rewritten and updated fairly significantly since first posted.

In roll your own vector I talked about a change we made to the initialisation semantics for PathEngine’s custom vector class. In my first followup post I looked more closely at possibilities for replacing resize() with reserve() (which can avoid the initialisation issue in many cases), but so far I’m been concentrating pretty much exclusively on zero initialisation for built-in types. In this post I come back to look at the issue of initialisation semantics for class element types.

Placement new subtleties

At it’s root the changed initialisation semantics for our vector all come down to a single (quite subtle) change in the way we write one of the placement new expressions.

It’s all about the placement new call for element default construction. This is required when elements need to be initialised, but no value is provided for initialisation by the calling code, for example in a call to vector resize() with no fill value argument.

As shown in my previous post, the standard way to implement this placement new is with the following syntax:

       new((void*)begin) T();

but we chose to replace this with the following, subtly different placement new syntax:

       new((void*)begin) T;

So we left out a pair of brackets.

Note that this missing pair of brackets is what I’m talking about when I refer to ‘changed initialisation semantics’. (Our custom vector class does not omit initialisation completely!)

What those brackets do

So what do those brackets do, and what happens when we remove them?

Well, this is all about ‘zero initialisation’.

In certain cases the memory for the object of type T being constructed will get zero initialised in the first version of the placement new call (‘new((void*)begin) T()’), but not in the second version (‘new((void*)begin) T’).

You can see find these two initialisation types documented on cppreference.com, in this stackoverflow answer, as well as in the related links.

This makes a difference during element construction for built in types, (as we saw with the buffer initialisation overhead in my previous post), but also for certain types classes and structs, and this is what I’ll be looking at in this post.

Initialisation of built in types

It’s quite well known that initialisation for built-in types works differently for global variables (which are usually created as part of the program’s address space) and local variables (which are allocated on the program stack).

If we start with the following:

int
 
  main(int argc, char* argv[])
 
  {
 
      int i;
 
      assert(i == 0);
 
      return 0;
 
  }

This runs through quite happily with the debug build, but if I turn assertions on in the release build then this assertion gets triggered. That’s not really surprising. This kind of uninitialised local variable is a well known gotcha and I think most people with a reasonable amount of experience in C++ have come across something like this.

But the point is that the local variable initialisation here is using ‘default initialisation’, as opposed to ‘zero initialisation’.

And if we change i from a local to a global variable the situation changes:

int i;
 
  int
 
  main(int argc, char* argv[])
 
  {
 
      assert(i == 0);
 
      return 0;
 
  }

This time the variable gets zero initialised, and the program runs through without assertion in both release and debug builds.

The reason for this is that global variables can be initialised in the linked binary for your program, at no cost (or else very cheaply at program startup), but local variables get instantiated on the program stack and initialising these explicitly to zero would add a bit of extra run time overhead to your program.

Since uninitialised data is a big potential source of error, many other (more modern) languages choose to always initialise data, but this inevitably adds some overhead, and part of the appeal of C++ is that it lets us get ‘close to the metal’ and avoid this kind of overhead.

Zero initialisation and ‘value’ classes

What’s less well known (I think) is that this can also apply to classes, in certain cases. This is something you’ll come across most commonly, I think, in the form of classes that are written to act as a kind of ‘value type’, and to behave in a similar way to the C++ built in types.

More specifically, it’s all about classes where internal state is not initialised in during class construction, and for which you could choose to omit the class default constructor.

In PathEngine we have a number of classes like this. One example looks something like this:

class cMeshElement
 
  {
 
  public:
 
      enum eType
 
      {
 
          FACE,
 
          EDGE,
 
          VERTEX,
 
      };
 
  
 
  //.. class methods
 
  
 
  private:
 
      eType _type;
 
      tSigned32 _index;
 
  };

Default construction of value classes

What should happen on default construction of a cMeshElement instance?

The safest thing to do will be to initialise _type and _index to some fixed, deterministic values, to eliminate the possibility of program execution being dependant on uninitialised data.

In PathEngine, however, we may need to set up some fairly large buffers with elements of this type. We don’t want to limit ourselves to only ever building these buffers through a purely iterator based paradigm (as discussed in my previous post), and sometimes want to just create big uninitialised vectors of cMeshElement type directly, without buffer initialisation overhead, so we leave the data members in this class uninitialised.

Empty default constructor or no default constructor?

So we don’t want to do anything on default construction.

There are two ways this can be implemented in our value type class. We can omit the class default constructor completely, or we can add an empty default constructor.

Omitting the constructor seems nice, insofar as avoids a bit of apparently unnecessary and extraneous code, but it turns out there’s some unexpected complexity in the rules for C++ object construction with respect to this choice, and to whether an object is being constructed with ‘zero initialisation’ or ‘default initialisation’.

Note that what the two terms refer to are actually two different sets of object construction semantics, with each defining a set of rules for what happens to memory during construction (depending on the exact construction situation), and ‘zero initialisation’ does not always result in an actual zero initialisation step.

We can test what happens in the context of our custom vector, and ‘value type’ elements, with the following code:

class cInitialisationReporter
 
  {
 
    int i;
 
  public:
 
    ~cInitialisationReporter()
 
    {
 
        std::cout << "cInitialisationReporter::i is " << i << 'n';
 
    }
 
  };
 
  
 
  class cInitialisationReporter2
 
  {
 
    int i;
 
  public:
 
    cInitialisationReporter2() {}
 
    ~cInitialisationReporter2()
 
    {
 
        std::cout << "cInitialisationReporter2::i is " << i << 'n';
 
    }
 
  };
 
  template <class T> void
 
  SetMemAndPlacementConstruct_ZeroInitialisation()
 
  {
 
    T* allocated = static_cast<T*>(malloc(sizeof(T)));
 
    signed char* asCharPtr = reinterpret_cast<signed char*>(allocated);
 
    for(int i = 0; i != sizeof(T); ++i)
 
    {
 
        asCharPtr[i] = -1;
 
    }
 
    new((void*)allocated) T();
 
    allocated->~T();
 
  }
 
  template <class T> void
 
  SetMemAndPlacementConstruct_DefaultInitialisation()
 
  {
 
    T* allocated = static_cast<T*>(malloc(sizeof(T)));
 
    signed char* asCharPtr = reinterpret_cast<signed char*>(allocated);
 
    for(int i = 0; i != sizeof(T); ++i)
 
    {
 
        asCharPtr[i] = -1;
 
    }
 
    new((void*)allocated) T;
 
    allocated->~T();
 
  }
 
  
 
  int
 
  main(int argc, char* argv[])
 
  {
 
    SetMemAndPlacementConstruct_ZeroInitialisation<cInitialisationReporter>();
 
    SetMemAndPlacementConstruct_ZeroInitialisation<cInitialisationReporter2>();
 
    SetMemAndPlacementConstruct_DefaultInitialisation<cInitialisationReporter>();
 
    SetMemAndPlacementConstruct_DefaultInitialisation<cInitialisationReporter2>();
 
    return 0;
 
  }

This gives the following results:

cInitialisationReporter::i is 0
 
  cInitialisationReporter2::i is -1
 
  cInitialisationReporter::i is -1
 
  cInitialisationReporter2::i is -1

In short:

  • If our vector uses ‘zero initialisation’ form (placement new with brackets), and the value type has default constructor omitted then the compiler will add code to zero element memory on construction.
  • If our vector uses ‘zero initialisation’ form (placement new with brackets), and the value type has an empty default then the compiler will leave element memory uninitialised on construction.
  • If the vector uses ‘default initialisation’ form (placement new without brackets), then the compiler will leave element memory uninitialised regardless of whether or not there is a default constructor.

Zero initialisation in std::vector

The std::vector implementations I’ve looked at also all perform ‘zero initialisation’ (and I assume this is then actually required by the standard). We can test this by supplying the following custom allocator:

template <class T>
 
  class cNonZeroedAllocator
 
  {
 
  public:
 
      typedef T value_type;
 
      typedef value_type* pointer;
 
      typedef const value_type* const_pointer;
 
      typedef value_type& reference;
 
      typedef const value_type& const_reference;
 
      typedef typename std::size_t size_type;
 
      typedef std::ptrdiff_t difference_type;
 
  
 
      template <class tTarget>
 
      struct rebind
 
      {
 
          typedef cNonZeroedAllocator<tTarget> other;
 
      };
 
  
 
      cNonZeroedAllocator() {}
 
      ~cNonZeroedAllocator() {}
 
      template <class T2>
 
      cNonZeroedAllocator(cNonZeroedAllocator<T2> const&)
 
      {
 
      }
 
  
 
      pointer
 
      address(reference ref)
 
      {
 
          return &ref;
 
      }
 
      const_pointer
 
      address(const_reference ref)
 
      {
 
          return &ref;
 
      }
 
  
 
      pointer
 
      allocate(size_type count, const void* = 0)
 
      {
 
          size_type byteSize = count * sizeof(T);
 
          void* result = malloc(byteSize);
 
          signed char* asCharPtr = reinterpret_cast<signed char*>(result);
 
          for(size_type i = 0; i != byteSize; ++i)
 
          {
 
              asCharPtr[i] = -1;
 
          }
 
          return reinterpret_cast<pointer>(result);
 
      }
 
      void deallocate(pointer ptr, size_type)
 
      {
 
          free(ptr);
 
      }
 
  
 
      size_type
 
      max_size() const
 
      {
 
          return 0xffffffffUL / sizeof(T);
 
      }
 
  
 
      void
 
      construct(pointer ptr, const T& t)
 
      {
 
          new(ptr) T(t);
 
      }
 
      void
 
      destroy(pointer ptr)
 
      {
 
          ptr->~T();
 
      }
 
  
 
      template <class T2> bool
 
      operator==(cNonZeroedAllocator<T2> const&) const
 
      {
 
          return true;
 
      }
 
      template <class T2> bool
 
      operator!=(cNonZeroedAllocator<T2> const&) const
 
      {
 
          return false;
 
      }
 
  };

Oh, by the way, did I mention that I don’t like STL allocators? (Not yet, I will in my next post!) This is a bog standard STL allocator with the allocate method hacked to set all the bytes in the allocated memory block to non-zero values. The important bit is the implementation of the allocate and deallocate methods. The rest is just boilerplate.

To apply this in our test code:

int
 
  main(int argc, char* argv[])
 
  {
 
    std::vector<cInitialisationReporter,
 
      cNonZeroedAllocator<cInitialisationReporter> > v1;
 
    v1.resize(1);
 
    std::vector<cInitialisationReporter2,
 
      cNonZeroedAllocator<cInitialisationReporter2> > v2;
 
    v2.resize(1);
 
    return 0;
 
  }

And this gives:

cInitialisationReporter::i is 0
 
  cInitialisationReporter2::i is -1

Class with no default constructor + std::vector = initialisation overhead

So if I implement a ‘value class’ without default constructor, and then construct an std::vector with elements of this type, then I get initialisation overhead. And this accounts for part of the speedups we saw when switching to a custom vector implementation (together with the corresponding issue for built in types).

But there’s a clear workaround for this issue, now, based on the above. To use std::vector, but avoid initialisation overhead for value type elements, we just need to make sure that each of our value type classes has an empty default constructor.

Extending to a wrapper for working around zero initialisation for built-in types

In the comments (commenting on the original version of this post!) Marek Knápek suggests using the following wrapper to avoid zero initialisation, in the context of built-in types:

template<typename T>
 
  // assuming T is int, short, long, std::uint64_t, ...
 
  // TODO: add static assert
 
  class MyInt{
 
  public:
 
  MyInt()
 
  // m_int is "garbage-initialized" here
 
  {}
 
  public:
 
  T m_int;
 
  };

And sure enough, this works (because of the empty default constructor in the wrapper class). But I really don’t like using this kind of wrapper in practice, as I think that this complicates (and slightly obfuscates!) each vector definition.

Using default initialisation semantics for our custom vector avoids the need for this kind of workaround. And, more generally, if we take each of the possible construction semantics on their merits (ignoring the fact that one of these is the behaviour of the standard vector implementation), I prefer ‘default initialisation’ semantics, since:

  • these semantics seem more consistent and avoid surprises based on whether or not an empty default constructor is included in a class, and
  • value type classes shouldn’t depend on zero initialisation, anyway (since they may be instantiated as local variables)

Type specialisation

One thing to be aware of, with this workaround, is that it looks like there can be implications for type specialisation, depending on your compiler version.

As I understand the type traits system in C++11, and the is_trivially_default_constructible method, this should return the same value for both cInitialisationReporter and cInitialisationReporter2, but when I try the following (with clang 3.2.1):

  cout
 
      << "is_trivially_default_constructible<cInitialisationReporter>: "
 
      << is_trivially_default_constructible<cInitialisationReporter>::value
 
      << 'n';
 
    cout
 
      << "is_trivially_default_constructible<cInitialisationReporter2>: "
 
      << is_trivially_default_constructible<cInitialisationReporter2>::value
 
      << 'n';

I get:

error: no template named 'is_trivially_default_constructible' in namespace 'std'; did you mean 'has_trivial_default_constructor'?

and then when I try with ‘has_trivial_default_constructor’:

  cout
 
      << "has_trivial_default_constructor<cInitialisationReporter>: "
 
      << has_trivial_default_constructor<cInitialisationReporter>::value
 
      << 'n';
 
    cout
 
      << "has_trivial_default_constructor<cInitialisationReporter2>: "
 
      << has_trivial_default_constructor<cInitialisationReporter2>::value
 
      << 'n';

I get:

has_trivial_default_constructor<cInitialisationReporter>: 1
 
  has_trivial_default_constructor<cInitialisationReporter2>: 0

This doesn’t matter for PathEngine since we still use an ‘old school’ type specialisation setup (to support older compilers), but could be something to look out for, nevertheless.

Conclusion

The overhead for zero initialisation in std::vector is something that has been an issue for us historically but it turns out that for std::vector of value type classes, zero initialisation can be avoided, without resorting to a custom vector implementation.

It’s interesting to see the implications of this kind of implementation detail. Watch out how you implement ‘value type’ classes if they’re going to be used as elements in large buffers, and maximum performance is desired!