2017-02-28

How to avoid unnecessary copies when appending to a C++ vector

This blog post explains how to avoid unnecessary copies when appending to a C++ std::vector, and recommends the fast_vector_append helper library, which eliminates most copies automatically.

TL;DR If you are using C++11, and your element classes have an efficient move constructor defined, then just use push_back to append, it won't do any unnecessary copies. In addition to that, if you are constructing the to-be-appended element, you can use emplace_back to append, which even avoids the (otherwise fast) call to the move constructor.

Copying is slow and needs a lot of (temporary memory) if the object contains lots of data. Such an object is a long std::string: the entire array of characters get copied to a new array. This hurts performance if the copy is unnecessary, e.g. if only a temporary copy is made. For example:

std::string create_long_string(int);

std::vector<std::string> v;
{
  // Case A.
  std::string s1 = create_long_string(1);
  std::string s2 = create_long_string(2);
  std::string s3 = create_long_string(3);
  // Case B.
  v.push_back(s1);
  std::cout << s1;
}
// Case C.
v.push_back("foo");
// Case D, from C++11.
v.emplace_back("foo");
// Case E.
v.push_back(create_long_string(4));
// Case F.
v.push_back(std::string()); v.back().swap(s2);
// Case G, from C++11.
v.push_back(std::move(s3));

In Case A, return value optimization prevents the unnecessary copying: the string built in the function body of create_long_string is placed directly to s1.

In Case B, a copy has to be made (there is no way around it), because v is still valid after s1 is destroyed, thus it cannot reuse the data in s1.

Case C could work without a copy, but in C++98 an unnecessary copy is made. So first std::string("foo") is called (which makes a copy of the data), and then the copy constructor of std::string is called to create a new string (with a 2nd copy of the data), which gets added to v.

Case D avoids the 2nd (unnecessary) copy, but it works only from C++11. In earlier versions of C++ (such as C++98), std::vector doesn't have the emplace_back method.

In Case E, there is an unnecessary copy in C++98: create_long_string creates an std::string, and it gets copied to a new std::string within v. It would be better if create_long_string could create the std::string to its final location.

Case F shows the workaround in C++98 of adding s2 to an std::vector without a copy. It's a workaround because it's a bit ugly and it still involves some copying. Fortunately this copying is fast: it copies only the empty string. As a side effect, the value of s2 is lost, it will then be the empty string.

Case G shows the C++11 way of adding s3 to an std::vector without a copy. It doesn't work in C++98 (there is no std::move in C++98). The std::move(s3) visibly documents that the old value of s3 is lost.

C++11 (the version of C++ after C++98) introduces rvalue references, move constructors and move semantics to avoid unnecessary copies. This will fix both Case C and Case E. For this to work, new code needs to be added to the element class (in our case std::string) and to the container class (in our case std::vector) as well. Fortunately, the callers (including our code above and the body of create_long_string) can be kept unchanged. The following code has been added to the C++ standard library (STL) in C++11:

class string {
  ...
  // Copy constructor. C++98, C++11.
  string(const string& old) { ... }
  // Move constructor. Not in C++98, added in C++11.
  string(string&& old) { ... } ... }
};

template<typename T, ...>
class vector {
  ...
  // Takes a const reference. C++98, C++11.
  void push_back(const T& t);
  // Takes an rvalue reference. Not in C++98, added in C++11.
  void push_back(T&& t);
};

As soon as both of these are added, when v.push_back(...) will attempt to call the 2nd method (which takes the rvalue reference), which will call to move constructor of std::string instead of the copy constructor. This gives us the benefit of no copying, because typically the move constructor is fast, because it doesn't copy data. In general, the move constructor creates the new object with the data of the old object, and it can leave the old old object in an arbitrary but valid state. For std::string, it just copies the pointer to the data (which is fast, because it doesn't copy the data itself), and sets the pointer in the old std::string to nullptr. Thus Case C and Case E become fast in C++11. Case B is not affected (it still copies), and that's good, because we want to print s1 to cout below, so we want that data there. This happens automatically, because in the call v.push_back(s1), s1 is not an rvalue reference, thus the cost-reference push_back will be called, which does a copy. For more details about the magic to select the proper push_back, see this tutorial or this tutorial.

Guidelines to avoid unnecessary copies

Define your (element) classes like this:

  • Define the default constructor (C() { ... }).
  • Define the destructor (~C() { ... }).
  • Define the copy constructor (C(const C& c) { ... }).
  • It's a good practice to define operator=, but not needed here.
  • For C++11 classes, define a move constructor (e.g. C(C&& c) { ... }).
  • For C++11 classes, don't define a member swap method. If you must define it, then also define a method void shrink_to_fit() { ... }. It doesn't matter what the method does, you can just declare it. The fast_vector_append library detects shrink_to_fit, and will use the move constructor instead of the swap method, the former being slightly faster, although neither copies the data.
  • For C++98 classes, don't define a move constructor. In fact, C++98 doesn't support move constructors.
  • For C++98 classes, define a member swap method.

To append a new element to an std::vector without unnecessary copying, as fast as possible, follow this advice from top to bottom:

  • If it's C++11 mode, and the object is being constructed (not returned by a function!), use emplace_back without the element class name.
  • If it's C++11 mode, and the class has a move constructor, use push_back.
  • If it's C++11 mode, and the class has the member swap method, use: { C c(42); v.resize(v.size() + 1); v.back().swap(c); }
  • If the class has the member swap method, use: { C c(42); v.push_back(C()); v.back().swap(c); }
  • Use push_back. (This is the only case with a slow copy.)

Automating the avoidance of unnecessary copies when appending to a vector

It would be awesome if the compiler could guess the programmer's intentions, e.g. it would pick emplace_back if it is faster than push_back, and it will avoid the copy even in C++98 code, e.g. it will use swap if it's available, but the move constructor isn't. This is important because sometimes it's inconvenient to modify old parts of a codebase defining the element class, and it already has swap.

For automation, use fast_vector_append(v, ...) in the fast_vector_append library to append elements to an std::vector. It works in both C++98 and C++11, but it can avoid more copies in C++11. The example above looks like:

#include "fast_vector_append.h"
std::string create_long_string(int);

std::vector<std::string> v;
{
  // Case A. No copy.
  std::string s1 = create_long_string(1);
  std::string s2 = create_long_string(2);
  std::string s3 = create_long_string(3);
  // Case B. Copied.
  fast_vector_append(v, s1);
  std::cout << s1;
}
// Case C. Not copied.
fast_vector_append(v, "foo");
// Case D. Not copied.
fast_vector_append(v, "foo");
// Case E. Copied in C++98.
fast_vector_append(v, create_long_string(4));
{ std::string s4 = create_long_string(4);
  // Case E2. Not copied.
  fast_vector_append_move(v, s4);
}
// Case F. Not copied.
fast_vector_append_move(v, s2);
// Case G. Not copied.
fast_vector_append_move(v, s3);
// Case H. Copied in C++98.
fast_vector_append(v, std::string("foo"));

Autodetection of class features with SFINAE

The library fast_vector_append does some interesting SFINAE tricks to autodetect the features of the element class, so that it will be able to use the fastest way of appending supported by the class.

For example, this is how it detects whether to use the member swap method:

// Use swap iff: has swap, does't have std::get, doesn't have shrink_to_fit,
// doesn't have emplace, doesn't have remove_suffix. By doing so we match
// all C++11, C++14 and C++17 STL templates except for std::optional and
// std::any. Not matching a few of them is not a problem because then member
// .swap will be used on them, and that's good enough.
//
// Based on HAS_MEM_FUNC in http://stackoverflow.com/a/264088/97248 .  
// Based on decltype(...) in http://stackoverflow.com/a/6324863/97248 .
template<typename T>   
struct __aph_use_swap {
  template <typename U, U> struct type_check;
  // This also checks the return type of swap (void). The checks with
  // decltype below don't check the return type.
  template <typename B> static char (&chk_swap(type_check<void(B::*)(B&), &B::swap>*))[2];
  template <typename  > static char (&chk_swap(...))[1];
  template <typename B> static char (&chk_get(decltype(std::get<0>(*(B*)0), 0)))[1];  
// ^^^ C++11 only: std::pair, std::tuple, std::variant, std::array. template <typename > static char (&chk_get(...))[2]; template <typename B> static char (&chk_s2f(decltype(((B*)0)->shrink_to_fit(), 0)))[1];
// ^^^ C++11 only: std::vector, std::deque, std::string, std::basic_string. template <typename > static char (&chk_s2f(...))[2]; template <typename B> static char (&chk_empl(decltype(((B*)0)->emplace(), 0)))[1];
// ^^^ C++11 only: std::vector, std::deque, std::set, std::multiset, std::map, std::multimap, std::unordered_multiset, std::unordered_map, std::unordered_multimap, std::stack, std::queue, std::priority_queue. template <typename > static char (&chk_empl(...))[2]; template <typename B> static char (&chk_rsuf(decltype(((B*)0)->remove_suffix(0), 0)))[1];
// ^^^ C++17 only: std::string_view, std::basic_string_view. template <typename > static char (&chk_rsuf(...))[2]; static bool const value = sizeof(chk_swap<T>(0)) == 2 && sizeof(chk_get<T>(0)) == 2
&& sizeof(chk_s2f<T>(0)) == 2 && sizeof(chk_empl<T>(0)) == 2
&& sizeof(chk_rsuf<T>(0)) == 2; };

The autodetection is used like this, to select one of the 2 implementations (either with v.back().swap(t) or v.push_back(std::move(t))):

template<typename V, typename T> static inline
typename std::enable_if<std::is_same<typename V::value_type, T>::value &&
    __aph_use_swap<typename V::value_type>::value, void>::type
fast_vector_append(V& v, T&& t) { v.resize(v.size() + 1); v.back().swap(t); }                               

template<typename V, typename T> static inline
typename std::enable_if<std::is_same<typename V::value_type, T>::value &&
    !__aph_use_swap<typename V::value_type>::value, void>::type
fast_vector_append(V& v, T&& t) { v.push_back(std::move(t)); }

No comments: