We are writing an image processing algorithm targeting some Intel hardware. Generally we prefer generic C implementations, but we have identified an algorithm that at its core does a ton of Discrete Cosine Transforms (DCT's) that works extremely well. Unfortunately, our throughput requirements are such that a generic C implementation is about 2 orders of magnitude too slow. I can get one order of magnitude through some other tricks, so if I can improve my DCT's by about an order of magnitude I have a path towards success.

Is the Intel MMX a way to get at hardware acceleration to do these DCT's? Is there other intel specific libraries and/or hardware that I can exploit to speed these bad boys up?

Where do I start to look? This is a new job for me, and my first time digging hard into Intel hardware, so any pointers would be most appreciated.

有帮助吗?

解决方案

Take a look at Intel's Integrated Performance Primitives library. It contains a wealth of routines that are optimized heavily to take use of the Intel architecture, specifically MMX and SSE. Among many other things, IPP also contains routines for the DCT (documentation here).

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top