It will probably give you better performance to have one contiguous buffer (at the very least, it's not worse!).
How big the performance difference is will depend on a large number of factors (and of course, if you allocate a bunch of 32 byte blocks, you are quite likely to get "close-together" lumps of memory, so the caching benefit will still be there. Worst case is if every block is in a different 4KB segment of memory, but if you have a few bytes of "empty space" between each block, not that big a deal.
Like so many other performance questions, it's quite a lot to do with the exact details of what the code does, memory types, processor type, etc. The only way to REALLY find out, you will need to benchmark the different options...