Where is my python script spending time? Is there “missing time” in my cprofile / pstats trace?

https://stackoverflow.com/questions/2780294

03-10-2019
|

Question

I am attempting to profile a long running python script. The script does some spatial analysis on raster GIS data set using the gdal module. The script currently uses three files, the main script which loops over the raster pixels called find_pixel_pairs.py, a simple cache in lrucache.py and some misc classes in utils.py. I have profiled the code on a moderate sized dataset. pstats returns:

   p.sort_stats('cumulative').print_stats(20)
   Thu May  6 19:16:50 2010    phes.profile

   355483738 function calls in 11644.421 CPU seconds

   Ordered by: cumulative time
   List reduced from 86 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.008    0.008 11644.421 11644.421 <string>:1(<module>)
        1 11064.926 11064.926 11644.413 11644.413 find_pixel_pairs.py:49(phes)
340135349  544.143    0.000  572.481    0.000 utils.py:173(extent_iterator)
  8831020   18.492    0.000   18.492    0.000 {range}
   231922    3.414    0.000    8.128    0.000 utils.py:152(get_block_in_bands)
   142739    1.303    0.000    4.173    0.000 utils.py:97(search_extent_rect)
   745181    1.936    0.000    2.500    0.000 find_pixel_pairs.py:40(is_no_data)
   285478    1.801    0.000    2.271    0.000 utils.py:98(intify)
   231922    1.198    0.000    2.013    0.000 utils.py:116(block_to_pixel_extent)
   695766    1.990    0.000    1.990    0.000 lrucache.py:42(get)
  1213166    1.265    0.000    1.265    0.000 {min}
  1031737    1.034    0.000    1.034    0.000 {isinstance}
   142740    0.563    0.000    0.909    0.000 utils.py:122(find_block_extent)
   463844    0.611    0.000    0.611    0.000 utils.py:112(block_to_pixel_coord)
   745274    0.565    0.000    0.565    0.000 {method 'append' of 'list' objects}
   285478    0.346    0.000    0.346    0.000 {max}
   285480    0.346    0.000    0.346    0.000 utils.py:109(pixel_coord_to_block_coord)
      324    0.002    0.000    0.188    0.001 utils.py:27(__init__)
      324    0.016    0.000    0.186    0.001 gdal.py:848(ReadAsArray)
        1    0.000    0.000    0.160    0.160 utils.py:50(__init__)

The top two calls contain the main loop - the entire analyis. The remaining calls sum to less than 625 of the 11644 seconds. Where are the remaining 11,000 seconds spent? Is it all within the main loop of find_pixel_pairs.py? If so, can I find out which lines of code are taking most of the time?

Solution

You are right that most of the time is being spent in the phes function on line 49 of find_pixel_pairs.py. To find out more, you need to break up phes into more subfunctions, and then reprofile.

OTHER TIPS

Forget functions and measuring. Use this technique. Just run it in debug mode, and do ctrl-C a few times. The call stack will show you exactly which lines of code are responsible for the time.

Added: For example, pause it 10 times. If, as EOL says, 10400 seconds out of 11000 are being spent directly in phes, then on about 9 of those pauses, it will stop right there. If, on the other hand, it is spending a large fraction of time in some subroutine called from phes, then you will not only see where it is in that subroutine, but you will see the lines that call it, which are also responsible for the time, and so on, up the call stack.

Don't measure. Capture.

The time spent in the code execution of each function or method is in the tottime column. The cumtime method is tottime+time spent in functions called.

In your listing, you see that the 11,000 seconds that you are looking for are spent directly by the phes function itself. What it calls only takes about 600 seconds.

Thus, you want to find what takes time in phes, by breaking it up into subfunctions and reprofiling, as ~unutbu suggested.

If you've identified potential for bottlenecks in the phes function/method in find_pixel_pairs.py, you can use line_profiler to get line-by-line execution profile performance numbers like these (copied from another question here):

Timer unit: 1e-06 s

Total time: 9e-06 s
File: <ipython-input-4-dae73707787c>
Function: do_other_stuff at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def do_other_stuff(numbers):
     5         1            9      9.0    100.0      s = sum(numbers)

Total time: 0.000694 s
File: <ipython-input-4-dae73707787c>
Function: do_stuff at line 7

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     7                                           def do_stuff(numbers):
     8         1           12     12.0      1.7      do_other_stuff(numbers)
     9         1          208    208.0     30.0      l = [numbers[i]/43 for i in range(len(numbers))]
    10         1          474    474.0     68.3      m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

With this information, you do not need to break phes up into multiple subfunctions, because you could see precisely which lines have the highest execution time.

Since you mention that your script is long running, I'd recommend using line_profiler on as limited number of methods as possible, because while profiling adds additional overhead, line profiling can add more.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow