Question

I'd like to be able to test some guesses about memory complexity of various command line utilities.

Taking as a simple example

grep pattern file

I'd like to see how memory usage varies with the size of pattern and the size of file.

For time complexity, I'd make a guess, then run

time grep pattern file

on various sized inputs to see if my guess seems to be borne out in reality, but I don't know how to do this for memory.

One possibility would be a wrapper script that initiates the job and samples memory usage periodically, but this seems inelegant and unlikely to give the real high watermark.

I've seen time -v suggested, but don't have that flag available on my machine (running bash on OSX) and don't know where to find a version that supports it.

I've also seen that on Linux this information is available through the proc filesystem, but again, it's not available to me in my context.

I'm wondering if dtrace might be an appropriate tool, but again am concerned that a simple sample-based figure might not be the true high watermark?

Does anyone know of a tool or approach that would be appropriate on OSX?

Edit

I removed two mentions of disk usage, which were just asides and perhaps distracted from the main thrust of the question.

Was it helpful?

Solution

Your question is interesting because, without the application source code, you need to make a few assumptions about what constitutes memory use. Even if you were to use procfs, the results will be misleading: both the resident set size and the total virtual address space will be over-estimates since they will include extraneous data such as the program text.

Particularly for small commands, it would be easier to track individual allocations, although even there you need to be sure to include all the possible sources. In addition to malloc() etc., a process can extend its heap with brk() or obtain anonymous memory using mmap().

Here's a DTrace script that traces malloc(); you can extend it to include other allocating functions. Note that it isn't suitable for multi-threaded programs as it uses some non-atomic variables.

bash-3.2# cat hwm.d
/* find the maximum outstanding allocation provided by malloc() */
size_t total, high;

pid$target::malloc:entry
{
    self->size = arg0;
}

pid$target::malloc:return
/arg1/
{
    total += self->size;
    allocation[arg1] = self->size;
    high = (total > high) ? total : high;
}

pid$target::free:entry
/allocation[arg0]/
{
    total -= allocation[arg0];
    allocation[arg0] = 0;
}

END
{
    printf("High water mark was %d bytes.\n", high);
}
bash-3.2# dtrace -x evaltime=exec -qs hwm.d -c 'grep maximum hwm.d'
/* find the maximum outstanding allocation provided by malloc() */
High water mark was 62485 bytes.

bash-3.2#

A much more comprehensive discussion of memory allocators is contained in this article by Brendan Gregg. It provides a much better answer than my own to your question. In particular, it includes a link to a script called memleak.d; modify this to include time stamps for the allocations & deallocations, so that you can sort its output by time. Then, perhaps using the accompanying script as an example, use perl to track the current outstanding total allocation and high water mark. Such a DTrace/perl combination would be suitable for tracing multi-threaded processes.

OTHER TIPS

You can use /usr/bin/time -l (which is not the time builtin in macos) and read the "maximum resident set size", which is not precisely high water mark but might give you some idea.

$ /usr/bin/time -l ls
...
        0.00 real         0.00 user         0.00 sys
    925696  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
       239  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         3  voluntary context switches
         1  involuntary context switches

The meaning of this field is explained here.

Tried getrusage(). Inaccurate results. Tried Instruments. Pain in the arse.

Best solution by far: valgrind + massif.

  • command-line based: easy to run, script and automate; no apps to open, menus to click, blah blah; can run in background etc
  • provides a visual graph-- in your terminal-- of memory usage over time
    valgrind --tool=massif /path/to/my_program arg1 ...
    ms_print `ls -r massif.out.* | head -1` | grep Detailed -B50

To view more details, run ms_print `ls -r massif.out.* | head -1`

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top