2015-03-03

How to avoid data copies with move semantics in C++11

This blog post explains how to avoid data copies in assignment from temporary values in C++. The move assignment operator (a feature introduced in C++11) will be defined for the class, and it will get called instead of the copy assignment operator, and the copy will be avoided.

Let's consider std::string, a type whose values are expensive to copy (assuming that the implementation copies the entire string data, not just a pointer to t buffer). Both the copy constructor and the copy assignment operator (operator=) copy the old data from the new data, like this for the copy assignment operator:

#include <string.h>
namespace std {
string &operator=(const string &other) {
  resize(other.size());
  memcpy(&(*this)[0], other.data(), other.size() + 1);
  return *this;
}
}

Let's assume that we have a function which returns a string: std::string GetUserName();. We can call this function and save the result to a variable: std::string user_name = GetUserName();. (It also works the same way with const in the beginning.) How many times does the value have to be copied until it lands in the variable user_name? Most modern compilers do the return value optimization to avoid all copies (so no copy constructor and no copy assignment operator is run). But if we already have the variable std::string user_name; and we want the assignment user_name = GetUserName(); avoid copies, then we need to define a move assignment operator (taking an rvalue reference (&&) argument instead of a const reference (const&) argument), and the assignment above will use the move assignment operator, which is faster than the copy assignment operator, because it can steal the resources from the source. An example implementation:

#include <string.h>
namespace std {
string &operator=(string &&other) {
  capacity_ = other.capacity_;
  size_ = other.size_;
  data_ = other.data_;  // Copies just the pointer.
  other.capacity_ = other.size_ = 0;
  other.data_ = nullptr;
  return *this;
}
}

There is also a corresponding move constructor which can be called instead of the copy constructor to avoid the copy. It works even if the return value optimization cannot be applied (e.g. when the function body has both return a; and return b;).

Let's see a more detailed example which has all these:

  • copy constructor (*C): C(const C&)
  • move constructor (&C): C(&&) (only for C++11)
  • copy assignment operator (=C): C &operator(const C&)
  • move assignment operator (#C): C &operator(C &&) (only for C++11)
#include <stdio.h>

class C {
 public:
  explicit C(int v) { printf("+C %d\n", v); }
  ~C() { puts("~C"); }
  C(const C&) { puts("*C"); }
  C &operator=(const C&) { puts("=C"); return *this; }
#if __GXX_EXPERIMENTAL_CXX0X__ || __cplusplus >= 201100
  C(C&&) { puts("&C"); }
  C &operator=(C &&) { puts("#C"); return *this; }
#endif
};

static inline C C10(int v) {
  return C(v * 10);
}

int main(int argc, char **argv) {
  (void)argc; (void)argv;
  C ca = C10(11);
  puts("---R1");
  C cb(22);
  puts("---R2");
  cb = C10(33);
  puts("---R3");
  return 0;
}

We can compile it for C++98 (older C++ standard than C++11) and run it:

$ g++ -W -Wall -Wextra -Werror -s -O2 -ansi -pedantic test_assign_with_move_semantics.cc && ./a.out
+C 110
---R1
+C 22
---R2
+C 330
=C
~C
---R3
~C
~C

And for C++11:

$ g++ -W -Wall -Wextra -Werror -s -O2 -std=c++0x -pedantic test_assign_with_move_semantics.cc && ./a.out
+C 110
---R1
+C 22
---R2
+C 330
#C
~C
---R3
~C
~C

The only difference is =C has changed to #C when C++11 features were enabled. That's because the body of the #if in the code gets compiled only for C++11 and above, and this body contains the move assignment operator. If there is a move assignment operator (e.g. in our C++11 version), then the line cb = C10(33); will use it, otherwise (e.g. in our C++98 version) that line will use the copy assignment operator.

Where do the actual data copies occur? In the copy assignment operator (#C) and in the copy constructor (*C, not called at all in the example). By defining a move assignment operator in C++11, we can prevent the copy assignment operator from getting called, thus we can avoid a copy when assigning from a temporary (rvalue).

Please note that the return value optimization avoids the copy in the C ca = C10(11); statement. This works even in C++98, without the move constructor.

In C++98, a copy can be avoided by using swap at the call site, for example if the caller replaces user_name = GetUserName(); with { std::string tmp = GetUserName(); user_name.swap(tmp); }, then the copy will be avoided: the definition of tmp takes advantage of the return value optimization, and swap swaps only the pointers and the sizes, not the actual data.

No comments: