It is natural to expect that a modern dual-processor or dual-core PCs would improve the performance if not twice but at least 1.7 times comparing to single CPU PC. To our surprise, it appeared that after splitting the calculations to several parallel worker threads the application run about 1.3 times slower than with one thread. The second CPU made it actually slower.
At first it appeared that the source of the slow down is the thread synchronization, but after further debugging we found out that the worker threads almost never block each other in our source code. After Googleing the Internet, it started to become pretty clear that the problem is coming from the run-time library memory manager. And a quick look at the source code of C++ CRT library revealed that all memory allocations/deallocations are ending up in the same memory heap, which is of course is single threaded for reliability and stability purpose.
Because of that, parallel threads in any application must wait to each other until the memory manager is released. This is not a big problem on a single-processor/single threaded application, because anyway only one thread can execute at one moment. But it becomes a real issue on dual processor or dual core processor system.
After we have tried several memory allocators available on Internet the performance of our application has improved. But then the testing revealed a few problems. All the software vendors of the alternative allocators published the benchmarks of their products, but they never mention that the memory that is allocated from OS is almost never returned back during those benchmarks, and allocating/deallocating the memory from OS is the most time consuming task.
We, at ClearBytes Soft, needed the allocator which does very fast memory allocations/deallocations and actually returns memory to OS when it is no longer used. Many people can wonder why bother returning memory to OS if your application may reuse it again at some later point of execution.
The simple answer is that OS can certainly do the better memory management if it knows that the memory is not used. Second, your application can link to some third party DLLs or COM objects that can do their own memory management, or they are written on different programming language. So you may end up in the situation when the request for memory can be denied even if there is still plenty of it available but in some private memory allocation heap. Third your application may need additional memory buffers for files or IO buffers which must be allocated by OS. |