C/C ++에서 DSP 루틴을 다시 작성해야합니까? 아니면 C# 안전하지 않은 포인터에 능숙해야합니까?

StackOverflow https://stackoverflow.com/questions/261591

문제

현재 많은 디지털 신호 처리를 수행하는 C# 애플리케이션을 작성하고 있는데, 여기에는 작은 미세 조정 된 메모리 XFER 작업이 포함되어 있습니다. 나는 안전하지 않은 포인터를 사용 하여이 루틴을 썼으며 처음 생각했던 것보다 훨씬 더 잘 수행되는 것 같습니다. 그러나 앱이 최대한 빨리되기를 원합니다.

C 또는 C ++에서 이러한 루틴을 다시 작성함으로써 성능 혜택을받을 수 있습니까? 아니면 안전하지 않은 포인터를 고수해야합니까? C/C ++에 비해 성능 측면에서 안전하지 않은 포인터가 테이블에 무엇을 가져 오는지 알고 싶습니다.

편집 : 나는이 루틴 내에서 특별한 일을하고 있지 않습니다. 일반 DSP 물건 만 있습니다. 한 배열에서 다른 배열에서 다른 배열에서 다른 배열로 전송하는 캐시는 많은 곱셈, 추가, 비트 이동 등을 사용합니다. 나는 C/C ++ 루틴이 C# 상대방과 거의 동일하게 보일 것으로 기대합니다.

편집 : 모든 영리한 답변에 대해 모두에게 감사드립니다. 내가 배운 것은 일종의 SSE 최적화가 이루어지지 않는 한 직접 포트를 수행하는 것만으로도 성능이 크게 향상되지 않는다는 것입니다. 모든 최신 C/C ++ 컴파일러가이를 활용할 수 있다고 가정하면 시도해 볼 수 있기를 기대합니다. 누군가가 결과에 관심이 있다면 저에게 알려 주시면 어딘가에 게시하겠습니다. (하지만 시간이 걸릴 수 있습니다).

도움이 되었습니까?

해결책

I've actually done pretty much exactly what you're asking, only in an image processing area. I started off with C# unsafe pointers, then moved into C++/CLI and now I code everything in C++. And in fact, from there I changed from pointers in C++ to SSE processor instructions, so I've gone all the way. Haven't reached assembler yet, although I don't know if I need to, I saw an article on CodeProject that showed SSE can be as fast as inline assembler, I can find it if you want me to.

What happened as I went along was my algorithm went from around 1.5-2 frames per second in C# with unsafe pointers, to 40 frames per second now. C# and C++/CLI were definitely slower than C++, even with pointers, I haven't been able to get above 10 frames per second with those languages. As soon as I switched to C++, I got something like 15-20 frames per second instantly. A few more clever changes and SSE got me up to 40 frames per second. So yes, it is worth going down if you want speed in my experience. There is a clear performance gain.

다른 팁

Another way to optimize DSP code is to make it cache friendly. If you have a lot of filters to apply to your signal you should apply all the filters to each point, i.e. your innermost loop should be over the filters and not over data, e.g.:

for each n do t´[n] = h(g(f(t[n])))

This way you will trash the cache a lot less and will most likely gain a good speed increase.

I think you should write your DSP routines either in C++ (managed or unmanaged) or in C#, using a solid design but without trying to optimize everything from the start, and then you should profile your code and find the bottlenecks and try to optimize those away.

Trying to produce "optimal" code from the start is going to distract you from writing working code in the first place. Remember that 80% of your optimization is only going to affect 20% of your code as in a lot of cases only 10% of your code is responsible for 90% of your CPU time. (YMMV, as it depends on the type of application)

When I was trying to optimize our use of alpha blending in our graphics toolkit, I was trying to use SIMD the "bare metal" way first: inline assembler. Soon I found out that it's better to use the SIMD intrinsics over pure assembly, since the compiler is able to optimize readable C++ with intrinsics further by rearranging the individual opcodes and maximize the use of the different processing units in the CPU.

Don't underestimate the power of your compiler!

Would I get any performance benefit from rewriting these routines in C/C++ or should I stick to unsafe pointers?

In theory it wouldn't matter - a perfect compiler will optimize the code, whether C or C++, into the best possible assembler.

In practice, however, C is almost always faster, especially for pointer type algorithms - It's as close as you can get to machine code without coding in assembly.

C++ doesn't bring anything to the table in terms of performance - it is built as an object oriented version of C, with a lot more capability and ease of use for the programmer. While for some things it will perform better because a given application will benefit from an object oriented point of view, it wasn't meant to perform better - it was meant to provide another level of abstraction so that programming complex applications was easier.

So, no, you will likely not see a performance increase by switching to C++.

However, it is likely more important for you to find out, than it is to avoid spending time on it - I think it would be a worthwhile activity to port it over and analyze it. It is quite possible that if your processor has certain instructions for C++ or Java usage, and the compiler knows about them, it may be able to take advantage of features unavailable in C. Unlikely, but possible.

However, DSP processors are notoriously complex beasts, and the closer you get to assembly, the better performance you can get (ie, the more hand-tuned your code is). C is much closer to assembly than C++.

-Adam

First let me answer the question about "safe" vs "unsafe": You said in your post "I want the app to be as fast as possible" and that means you don't want to mess with "safe" or "managed" pointers ( don't even mention garbage collection ).

Regarding your choice of languages: C/C++ lets you work with the underlying data much much more easily without any of the overhead associated with the fancy containers that everyone is using these days. Yes it is nice to be cuddled by containers that prevent you from seg-faulting... but the higher-level of abstraction associated with containers RUINS your performance.

At my job our code has to run fast. An example is our polyphase-resamplers at work that play with pointers and masking operations and fixed point DSP filtering ... none of these clever tricks are really possible without low level control of the memory and bit manipulations ==> so I say stick with C/C++.

If you really want to be smart write all your DSP code in low level C. And then intermingle it with the more safe containers/managed pointers... when it gets to speed you need to take off the training wheels... they slow you down too much.

( FYI, regarding taking the training wheels off: you need to test your C DSP code extra offline to make sure their pointer usage is good... o/w it will seg fault. )

EDIT: p.s. "seg faulting" is a LUXURY for all you PC/x86 developers. When you are writing embedded code... a seg fault just means your processor will go into the wuides and only be recovered by power cycling ;).

In order to know how you would get a performance gain, it's good to know the portions of code that could cause bottlenecks.

Since you're talking about small memory transfers, I assume all data will fit in the CPU's cache. In that case, the only gain you can achieve would be by knowing how to work the CPU's intrinsics. Typically, the compiler most familiar with the CPU's intrinsics is a C compiler. So here, I think you may improve performance by porting.

Another bottleneck will be on the path between CPU and memory - cache misses due to the big number of memory transfers in your application. The biggest gain will then lie in minimizing cache misses, which depend on the platform you use, and on the layout of your data (is it local or spread out through memory?).

But since you're already using unsafe pointers, you have that bit under your own control, so my guess is: on that aspect, you won't benefit much from a port to C (or C++).

Concluding: you may want to port small portions of your application into C.

Seeing that you're writing in unsafe code already, I presume it would be relatively easy to convert that to a C dll and call them from within C#. Do this after you have identified the slowest parts of your program and then replace them with C.

Your question is largely philosophical. The answer is this: dont't optimize until you profile.

You ask whether you'll gain improvement. Okay, you will gain improvement by N percent. If that's enough (like you need code that executes 200 times in 20 milliseconds on some embedded system) you're fine. But what if it's not enough?

You have to measure first and then find whether some parts of code could be rewritten in the same language but faster. Maybe you can redesign data structures to avoid unnecessary computations. Maybe you can skip on some memory reallocation. Maybe something is done with quadratic complexity when it could be done with linear complexity. You won't see it until you've measured it. This is usually much less of waste of time than just rewriting everything in another language.

C# has no support for SSE (yet, there is a mono project for SSE operations). Therefor C/C++ with SSE would definitely be faster.

You must, however, be careful with managed-to-native and native-to-managed transitions, as they are quite expensive. Stay as long in either world as possible.

Do you really want the app to be as fast as possible or simply fast enough? That tells you what you should do next.

If you're insistent on sticking with your hand-roll, without hand-optimising in assembler or similar, the C# should be fine. Unfortunately, this is the kind of question that can only really be answered experimentally. You're already in unmanaged pointer space, so my gut feel is that a direct port to C++ would not see a significant difference in speed.

I should say, though, that I had a similar issue recently, and we ended up throwing away the hand-roll after trying the Intel Integrated Performance Primitives library. The performance improvements we saw there were very impressive.

Mono 2.2 now has SIMD support with this you can have the best of both worlds managed code and raw speed.

You might also want to have a look at Using SSE in c# is it possible?

I would suggest that if you have any algorithms in your DSP code that need to be optimised then you should really be writing them in assembly, not C or C++.

In general, with modern processors and hardware, there aren't that many scenarios that require or warrant the effort involved in optimisation. Have you actually identified any performance issues? If not then it's probably best to stick with what you have. Unsafe C# is unlikely to be significantly slower than C/C++ in most cases of simple arithmetic.

Have you considered C++/CLI? You could have the best of both worlds then. It would even allow you to use inline assembler if required.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top