Question

We are writing an image processing algorithm targeting some Intel hardware. Generally we prefer generic C implementations, but we have identified an algorithm that at its core does a ton of Discrete Cosine Transforms (DCT's) that works extremely well. Unfortunately, our throughput requirements are such that a generic C implementation is about 2 orders of magnitude too slow. I can get one order of magnitude through some other tricks, so if I can improve my DCT's by about an order of magnitude I have a path towards success.

Is the Intel MMX a way to get at hardware acceleration to do these DCT's? Is there other intel specific libraries and/or hardware that I can exploit to speed these bad boys up?

Where do I start to look? This is a new job for me, and my first time digging hard into Intel hardware, so any pointers would be most appreciated.

Was it helpful?

Solution

Take a look at Intel's Integrated Performance Primitives library. It contains a wealth of routines that are optimized heavily to take use of the Intel architecture, specifically MMX and SSE. Among many other things, IPP also contains routines for the DCT (documentation here).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top