cout, buffering, and premature pessimizations.

The other day I was in a discussion with some c++ developers, where one of them stated, most definitavely, that “cout is not buffered”.   Now I have to admit that I was flabbergasted by how wrong this assertion was, and my first instict was to question the capabilities of the developer in question.  As I look back on the vast majority of the code that I have worked with however, it’s pretty clear that most c++ developers are either unaware that cout is indeed buffered, or they are unaware of the side effect of std::endl, or they just don’t think about the impact it causes.  Consider the following two lines of code:

  std::cout << "some text" << std::endl;
  std::cout << "some text\n";

Now, neither of these lines of code is more readable than the other.  Neither is more maintainable than the other.   The endl variant can however take significantly more time to execute than the \n variant. Why? Because std::endl has two effects:

  1.  It inserts a ‘\n’.
  2. It iinserts a std::flush into the stream, flushing the buffer.

If you are using a std::endl where a ‘\n’ will do (i.e. you do not need to explicitly flush the buffer), you are creating what Sutter and Alexandrescu call a “premature pessimization” in their excellent book “C++ coding standards”.   Despite this, the endl variant is much more common.  Whenever I ask anyone why they are using endl’s all over the place instead of \n,, the typical answer is “well, it’s more the C++ way to do things”.  That’s just not true — it’s not the C++ way to do meaningless resource comsumption.

Rule 1 is “don’t optmize prematurely.”  This means you should not make your code less readable, more complex, or less maintainable for the sake of dubious performance benefits.  A correllary to this rule however is “don’t pessimize prematurely”.  If two variants are equally readable, equally clean, and equally maintainable, prefer the more efficient variant.  This is just a question of correcting ignorance and forming good habits.

So you might be curious if this performance difference is measurable, and the answer is of course it is.  You can test it yourself with the following benchmark:

#include <iostream>
#include <chrono>

namespace chrono = std::chrono;

int main ()
{

	constexpr unsigned int numLines = 100000;
	auto start = chrono::high_resolution_clock::now();
	for (unsigned int i =0; i< numLines; ++i)
	{
		std::cout << "This is a prematurely pessimized line" << std::endl;
	}
	auto pess = chrono::high_resolution_clock::now();
	for (unsigned int i=0; i<numLines; ++i)
	{
		std::cout << "This is not a prematurely pessimized line\n";
	}
	std::cout << std::endl;   // flush the buffer so the comparison is
							  // only biased in favor of pessimized
							  // code.
	auto np = chrono::high_resolution_clock::now();
	double durp = chrono::duration_cast<chrono::milliseconds> (pess-start).count();
	double durnp = chrono::duration_cast<chrono::milliseconds> (np - pess).count();
	/// Use cerr for benchmark results, so we can redirect the noise.
	std::cerr << "\n==============================\n"
			  << "pessimized code took: " << durp << "ms.\n"
			  << "unpessimized  took  : " << durnp << "ms.\n"
			  << "Buffering saved: " << durp-durnp << "ms., or " << 100* (durp-durnp)/durp
			  << "% speedup." << std::endl;

}

Compiling on gcc with -O2, I get the following results:

Output to termial:

==============================
pessimized code took: 5370ms.
unpessimized  took  : 4806ms.
Buffering saved: 564ms., or 10.5028% speedup.
~/Personal/Miscelaneous[master]$ ./a.out > /dev/null

Output to /dev/null
==============================
pessimized code took: 45ms.
unpessimized  took  : 6ms.
Buffering saved: 39ms., or 86.6667% speedup.
~/Personal/Miscelaneous[master]$ ./a.out > tmp

Output to file:

==============================
pessimized code took: 365ms.
unpessimized  took  : 79ms.
Buffering saved: 286ms., or 78.3562% speedup.

If you are writing code which uses output streams a lot, like logger functionality, or file output, this can make a huge difference to your resource consumption, and you’ll never see the needless waste in a profiler.  So form good habits.  Unless you need to flush the buffer for some reason (which in fact is a rare need unless you’re dealing with concurrency issues), prefer the \n construct.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.