Question

I want to "simulate" mapreduce for a software assignment using TBB, the pipeline paradigm seems like a good way to see it, since serial filters could be I/O, and parallel ones could be Map and Reduce implementations, however this function implementations receive and return a single elements (this is ok for map if just one tuple is generated by input, but how about something like word counting that need multiple output?), and reduce simply aggregates on a global hashmap without actually returning "something"

Is there a way to use pipeline for this purpose, or I should use something like parallel_while/for?

Thanks!

Was it helpful?

Solution

The parallel pipeline generally does not scale as well as parallel_for, so I would be inclined to try to use parallel_for or some parallel recursive scheme. I recommend looking at parallel sort algorithms for guidance, since map-reduce is quite similar to a sort, except that duplicate keys are merged. For small core counts, something similar to parallel sample sort seems like good inspiration. (See http://parallelbook.com/sites/parallelbook.com/files/code20131121.zip for an implementation in in TBB). For large core counts, something similar to parallel merge sort might be better (see https://software.intel.com/en-us/articles/a-parallel-stable-sort-using-c11-for-tbb-cilk-plus-and-openmp for a discussion and code).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top