Installing and using qprof

Installing qprof

To install qprof, unpack the distribution and change to the resulting qprof- directory. Then:
  1. (Optional) Make sure that the PREFIX variable in Makefile is set to the appropriate installation directory. Files will be installed in $(PREFIX)/lib/qprof-version, $(PREFIX)/include/qprof-version, and $(PREFIX)/doc/qprof-version. The above directories will be linked to $(PREFIX)/lib/qprof, $(PREFIX)/include/qprof, and $(PREFIX)/doc/qprof.
  2. (Optional) Unpack a copy of libunwind in the source directory, or create a symbolic link from libunwind- to the identically named source directory elsewhere. (You will need at least version 0.93. For version 0.93 on x86, apply the included libunwind-0.93.patch.) If this step is performed, a very basic version of call-stack profiling will become available.
  3. (Optional) Type "make" (or "make check" to also run tests).
  4. (Needs permission to write to PREFIX directory, if set.) Type "make install".

To start profiling all programs run from a particular shell:

  1. Run "source <PREFIX from above>/lib/qprof/alias.csh" or ". <PREFIX from above>/lib/qprof/alias.sh", depending on your shell. If you skipped step one above, PREFIX is <build directory from above>/installed. Or you can run the identical scripts from the build directory. For regular use, put one of the above commands into your .bashrc or .cshrc files.
  2. (Optional)In an ANSI color-capable terminal window (e.g. most xterm variants), set the environment variable QPROF_COLOR to, for example, "green" to distinguish profiling output from normal command output.
  3. Run qprof_start to start profiling.
  4. Run commands to be profiled.
  5. Run qprof_stop to stop profiling.

Assumptions made by the above:

  1. The LD_PRELOAD environment variable is not already set for other reasons. (If you don't know what it's used for, you're probably OK. If you do know what this means, you can probably fix up the qprof_start and qprof_stop aliases to make things work with another preload library.)
  2. You are running only dynamically linked executables. If you don't know what this means, you can ignore it. (Statically linked programs can be profiled by calling the prof_utils.h routines directly from the application to be profiled.)
  3. There are no doubt some library version dependencies. RedHat 7, 8, and 9 should work, as should other Linux distributions from the same era.
  4. Nothing done by the process interferes with profiling. Empirically, this works fine for nearly all applications. But since the profiler runs as part of the application process, obscure kinds of interference are probably possible.

Interpreting the results:

Adjusting profiling output:

The output produced by qprof depends on several environment variables. In particular, QPROF_GRANULARITY can be set to one of function, line, or instruction to control whether samples should summed for each function, line, or instruction. Setting QPROF_REAL will cause the profiler to sample based on wall clock time, and should thus point out where processes are waiting. If libunwind was available during installation, setting QPROF_STACK will effectively include time spent in called functions to be included in the caller's (parent's) counts. Other relevant environment variables are described here.

To profile using hardware event counters:

(This currently works only on Itanium.)
  1. Install a supported underlying event counter library. (Currently this is Itanium perfmon).
  2. Add -DHW_EVENT_SUPPORT to CFLAGS in Makefile; Build as above. (Perfmon must be installed for the profiler to build. If it is missing at runtime, qprof will still run, but without hardware event support. If you are using libpfm3 on a 2.6 kernel, replace prof_utils.c in the distribution with prof_utils.c.libpfm3.)
  3. Run pfmon -l to find the appropriate event name.
  4. Set the environment variable QPROF_HW_EVENT to the event name. Profile as above. (QPROF_INTERVAL can be set to a number n to indicate that the program counter should be sampled every nth event. By default n is 10,000.)
  5. Note that the program counter is sampled when the process is notified of the event. This may be a few cycles after the event occurred. For example, cache miss events are likely to be attributed to an instruction that uses the resulting value, or even a slightly later instruction. You should be able to determine which loop is causing cache misses, but it will take a little bit of guess work to identify the actual load or store instruction.