CPU Utilization

https://stackoverflow.com/questions/897886

23-08-2019
|

Question

Q1. What are best practices for a writing a code that does not consume CPU but still achieve a great performance? The question is very generic. What I seek over here is to list down different practices used for different environments? debugging tips besides process-monitor/task manager

EDIT: I am not speaking of IO bound processes. I am speaking of CPU bound process. But, here I do not want my process to keep on hogging CPU. If i have a 4 core machines and if i run four simple loops within a process, the CPU consumption shoots up to 400% till the application/process is running.

I am seeking here some experience on the topic which everyone would have faced some time or other. e.g. I debugged once an application was hogging CPU on Windows as it was looping continuously to search for a non-existent file.

How can I write my program in a way that two different CPU bound applications run smoothly (give a good response)?

UPDATE: Suggestions:

Write good clean code, then Profile your application and then optimize. (Thanks ted for the tip)
It is easier to rewrite/redesign/refactor code than profiling and fixing it.
Use a profiler to debug your application
Don't use spinlocks for threads with long waits
Algorithm choice

These suggestions go a long way for the beginner to understand the concepts.

Solution

First, write good clean code. Do things the simplest way possible. After that, do the following repeatedly until you are satisfied with speed of the program:

Profile its execution.
Find the parts where it is spending the most time.
Speed up those parts.

Do not fall into the trap of perverting your code up front in the name of optimization.

Remember Amdhal's Law. You aren't going to get noticeable improvements by speeding up something that is already only consuming 1% of your program's time. You get the best bang for your optimization buck by speeding up the part your program spends the most of its time doing.

OTHER TIPS

Do as little work as possible.

Since you have edited your original question I'll add some more thoughts here to address the specific situation that you have described.

Assuming that you don't know where your process is blocking (since you were asking for debugging tips) you can start by just pausing the debugger, this will stop the application whatever it is doing and from there you can investigate the current location of all the threads and see if any of them are in a tight loop.

Secondly any decent profiler will easily help catch situations like this. Attach the profiler and run the application to the blocked point look at the calls that are dramatically getting a greater percentage of your total runtime. From there you can work your way back out to find the blocking loop.

Once you have located your problem ideally rethink the algorithm to avoid the situation completely. If this isn't possible then introduce sleep commands on the thread. This will allow other threads to get onto the CPU and increase the responsiveness of the application and the OS as a whole at the expense of increasing the operations runtime. The trick to multicore programming is ensuring that all your threads compromise between performance and consideration to the other waiting tasks.

Without knowing the specific language or operating system your targeting I can't advise on the but debugger/profiler combination for the problem, but I'd imagine there are good solutions out there for most mature languages.

Use a profiler religiously. Don't rely on common sense when searching for bottlenecks.
Learn big-O notation, remember big-O for common algorithms.
Avoid busy wait loops at all costs.
In case of embedded, learn how to make code fit into code cache, that can make a tenfold speedup sometimes on tight loops.
When doing high-level layered development, learn how to cache data efficently (for example, to minimize the number of DB statements).

If its purely CPU usage gain you're after then it's big-O notation you need. You need to work out how to get your algorithms to work in the least computations possible.

However, as for general performance, I find CPU usage to be one of the lesser bottlenecks.

Some more important things to look at with performance is

Data binding, do you get all the data up front or just get it as required. Choosing one of these methods can be key to the performance of your app.

Can you reduce the data you're working on? If you can get it all to fit in memory easily, you can gain performance here. On a side note, if you put too much in memory it can have an opposite effect.

I think to sum up, there is no generic solution to performance. Write your code (with some intelligence), then look at where it is struggling.

I am speaking of CPU bound process. But, here I do not want my process to keep on hogging CPU. If i have a 4 core machines and if i run four simple loops within a process, the CPU consumption shoots up to 400% till the application/process is running.

You will probably want to look into throttling mechanisms to reduce your CPU utilization when in idle state:

Under normal circumstances, your code will even consume CPU cycles when it doesn't have to do anything (i.e. "busy wait").

For example, an empty infinite loop will simply run as fast as it possibly can, if it doesn't have to do anything else.

However, in some circumstances, you don't want to busy wait, or on some platforms you may want to avoid it at all.

One established way of doing this is to use sleep calls when idling, so that the system scheduler can reschedule all running threads. Similarly, you could use timers to determine your function's actual update rate and simply avoid calling your code if it doesn't have to be run (this is a mechanism that is sometimes used by games or simulators).

In general, you'll want to avoid polling and instead use intelligent data structures (for example a job queue) that provide for a possibility to automatically adjust your runtime behavior accordingly, without having to check the data structure itself permanently.

Use lazy values and/or other kind of caching
Choose your algorithms with care

follow code optimization techniques.
calculate memory of your operations.
calculate time of each operations.
(big o notation)

It's not quite clear to me whether you're looking for ways to make the most efficient use of CPU, or ways to avoid bogging down a machine when you've got a lot of CPU-intensive stuff to do.

These are incompatable.

For the former, you ideally want an OS that will simply let you take over the CPU(s) completely for as long as you like, so you don't have to waste CPU cycles on the OS itself, not to mention any other processes that might be running.

For the latter, well, I've been writing some some code lately that uses ~~poorly designed~~ CPU-bound algorithms, and the new Intel I7 processor has been my saviour. Given four cores each able to run two threads, I merely try to limit my OS thread usage to five or six per application, and I've still got CPU available to switch to another window to run the kill command. At least until I drive the system into swap with space leaks.

Good suggestions here. The simpler you write the code, the more time you will save yourself.

I look at performance tuning as an extension of debugging. People say measure, measure, measure, but I don't. I just let the program tell me exactly what the problem is, by dropping in on it unannounced, several times if needed. It's usually a surprise, and it's never wrong.

The usual history of this, depending on how big the program, is finding and fixing a series of performance problems, each giving anywhere from 10% to 50% speedup (more if there is a bad problem). This gives an overall speedup of maybe 10 times.

Then the samples tell me exactly what it is doing, but I can't think of how to fix it without a basic redesign, realizing that if I had done the design differently in the first place, it would have been a lot faster to start with.

Supposing I can do the redesign, then I can do a few more rounds of performance find-and-fix. After this, when I hit the point of diminishing returns, I know it is about as fast as physically possible, and I can single-step it at the assembly level and watch every instruction "pulling its weight" in getting to the answer.

There is real satisfaction in getting to that point.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow