In high performance computing,
performance is kind of a big deal.
And the first step in performance analysis
and performance improvement
is
profiling.
High performance computing almost always entails some
form of parallelism.
And parallel programs are plain hard. They’re harder to write,
harder to debug, and harder to profile.
gprof
gprof
is pretty great. Just compile your code with -pg
, and -g
,
$ gcc -pg -g -O0 hello.c bye.c -o hibye.exe
run your code as usual,
and you’ll see gmon.out
. Now,
$ gprof hibye.exe gmon.out
should summarize the performance of your code.
Beware, gprof
will not
pick up on any calls to shared library functions.
OK, that’s a downer, and
there’s lots more. But it’s easy to use, and gives me quick results.
With the legacy code I work with, where there are no shared library calls,
gprof
is pretty awesome.
gprof + MPI
gprof
isn’t designed to work with MPI code.
But, as is generally the case with these things,
it’s possible with sufficient abuse:
First, set the environment variable GMON_OUT_PREFIX
:
$ export GMON_OUT_PREFIX=gmon.out-
Then, the usual business:
$ mpicc -pg -g -O0 hello.c bye.c -o hibye.exe
$ mpiexec -n 32 hibye.exe
You should see 32 (or however many processes) files,
with names gmon.out-<pid>
.
This is an undocumented feature of glibc
,
and it really shouldn’t be - it’s massively useful.
Now you have a separate gmon.out
file for every
MPI process. Awesome. Sum them:
$ gprof -s hibye.exe gmon.out-*
And use the resulting gmon.sum
to generate
gprof
output:
$ gprof hibye.exe gmon.sum
Now, I haven’t figured out how to replace the pid
with the MPI rank -
this could be exponentially more useful to some users.
And the method mentioned in the source doesn’t really
seem to be working.
But I’m sure this is possible with some ingenuity.
mpiP
mpiP is a neat little
tool for profiling MPI applications.
In particular, it’s extremely useful in figuring out
how much your application is spending time communicating
relative to computing.
The documentation for setting up and using mpiP
is complete (good), but small (better).
Once you have mpiP
set up, profiling your code is
as easy as linking it with the mpiP
library and some
other stuff it needs:
$ mpicc -g -O0 hello.c bye.c -o hibye.exe -lmpiP -liberty -lbfd -lunwind
Running your code (mpiexec
) will produce mpiP
output.
I’ve found that while gprof
and mpiP
are great tools
that do different things, using them both gives
me a very good idea of where my programs are spending time
and where I should focus optimization efforts.