Question

I'm currently trying to track down some phantom I/O in a PostgreSQL build I'm testing. It's a multi-process server and it isn't simple to associate disk I/O back to a particular back-end and query.

I thought Linux's perf tool would be ideal for this, but I'm struggling to capture block I/O performance counter metrics and associate them with user-space activity.

It's easy to record block I/O requests and completions with, eg:

sudo perf record -g -T -u postgres -e 'block:block_rq_*'

and the user-space pid is recorded, but there's no kernel or user-space stack captured, or ability to snapshot bits of the user-space process's heap (say, query text) etc. So while you have the pid, you don't know what the process was doing at that point. Just perf script output like:

postgres  7462 [002] 301125.113632: block:block_rq_issue: 8,0 W 0 () 208078848 + 1024 [postgres]

If I add the -g flag to perf record it'll take snapshots of the kernel stack, but doesn't capture user-space state for perf events captured in the kernel. The user-space stack only goes up to the entry-point from userspace, like LWLockRelease, LWLockAcquire, memcpy (mmap'd IO), __GI___libc_write, etc.

So. Any tips? Being able to capture a snapshot of the user-space stack in response to kernel events would be ideal.

I'm on Fedora 19, 3.11.3-201.fc19.x86_64, Schrödinger’s Cat, with perf version 3.10.9-200.fc19.x86_64.

Was it helpful?

Solution

OK, looks like there are several parts to this:

  • I'm on x86_64, where most distros build with -fomit-frame-pointer by default, and perf can't follow the stack without frame pointers;

  • .... unless it's a newer version built with libunwind support, in which case it supports perf record -g dwarf.

See:

I'm on Fedora 18, but the same issue applies. So if you're profiling code you're working on (as is likely on Stack Overflow), rebuild with -fno-omit-frame-pointer and -ggdb.

I landed up rebuilding perf because I wanted to be able to compare to the stock RPMs:

  • sudo yum build-dep perf
  • sudo yum install yum-utils rpmdevtools libunwind-devel
  • yumdownloader --source perf or download the appropriate kernel-.....src.rpm srpm
  • rpmdev-setuptree
  • rpm -Uvh kernel-*.src.rpm
  • cd $HOME/rpmbuild/SPECS
  • rpmbuild -bp --target=$(uname -m) kernel.spec

At this point you can just build a new perf if you want:

  • cd $HOME/rpmbuild/BUILD/kernel-*/linux-*/tools/perf
  • make

... which I did and tested that the updated perf does in fact capture a useful stack if built with libunwind available.

You can also build a new rpm:

  • edit kernel.spec, uncomment the line %define buildid ..., change buildid to something like .perfunwind. Note it's %define not % define.

  • In the same spec file, find:

    %global perf_make \
    make %{?_smp_mflags} -C tools/perf -s V=1 WERROR=0 NO_LIBUNWIND=1 HAVE_CPLUS_DEMANGLE=1 NO_GTK2=1 NO_LIBNUMA=1 NO_STRLCPY=1 prefix=%{_prefix}
    

    and delete NO_LIBUNWIND=1

  • rpmbuild -bb --without up --without mp --without pae --without debug --without doc --without headers --without debuginfo --without bootwrapper --without with_vdso_install --with perf kernel.spec to produce new perf RPMs without building the whole kernel. Or if you want, omit the --without for the kernel flavour you want, in which case you'll also want to build headers, debuginfo, etc.

  • sudo rpm -Uvh $HOME/rpmbuild/RPMS/x86_64/perf-*.fc19.x86_64.rpm

See the fedora project guide on building a custom kernel.

I've reported the issue to Fedora; they shouldn't be using NO_LIBUNWIND=1. See bug 1025603.

Once you have a rebuilt perf you can use perf record -g dwarf to get full stacks.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top