We've done all sorts of C++ style dynamic memory allocation on tight embedded systems in the past. You just have to follow a few rules and be careful about mxing short and long term buffers. First, memory pools are your friend - as the article says.
Also, for all the small (<64 bytes) allocations that C++ loves to make in helping out with pairs, and control structures, a unit allocation scheme is essential - not only for fragmentation control, but also performance. A unit allocator preallocates a number of identically sized units of memory (say 64 bytes) and places them on a free stack. As memory is allocated, you pop them off the free stack and return them. Because all the sizes are identical, you only have internal fragmentation to the block size. Because you don't have to join memory when done, allocation and freeing is O(1) time.
Some other rules: If you need to make a dynamic allocation that will be long lived, don't have any short term allocations before it. Allocate the big buffer first, then the little ones so memory is not scattered. Another system would be to place long-term allocations on the back of the heap and short term ones on the front. We've had success with that as well.
You can also use multiple heaps (pools) to segregate different types of allocations. If you have something that is creating a whole bunch of short term allocations in one section of the code, while another section follows a different pattern, give them a different heap.
All the above, if followed carefully, will either prevent or limit fragmentation. Another solution is to use a relocatable memory allocation system where a low priority thread can re-order memory to keep it continuous over time. I've seen that done a few times as well - trading a little performance for 0 long-term fragmentation.
alloca
can also help, but if you aren't following memory fragmentation prevention methods, you'll just end of scattering your stack as well - and as this tends to be a more valuable resource in embedded land, this may not be a good idea.