Question

We are writing an image processing algorithm targeting some Intel hardware. Generally we prefer generic C implementations, but we have identified an algorithm that at its core does a ton of Discrete Cosine Transforms (DCT's) that works extremely well. Unfortunately, our throughput requirements are such that a generic C implementation is about 2 orders of magnitude too slow. I can get one order of magnitude through some other tricks, so if I can improve my DCT's by about an order of magnitude I have a path towards success.

Is the Intel MMX a way to get at hardware acceleration to do these DCT's? Is there other intel specific libraries and/or hardware that I can exploit to speed these bad boys up?

Where do I start to look? This is a new job for me, and my first time digging hard into Intel hardware, so any pointers would be most appreciated.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top