In high performance computing, performance is kind of a big deal. And the first step in performance analysis and performance improvement is profiling.
High performance computing almost always entails some form of parallelism. And parallel programs are plain hard. They're harder to write, harder to debug, and harder to profile.
gprof is pretty great. Just compile your code with
run your code as usual,
and you'll see
should summarize the performance of your code.
gprof will not
pick up on any calls to shared library functions.
OK, that's a downer, and
there's lots more. But it's easy to use, and gives me quick results.
With the legacy code I work with, where there are no shared library calls,
gprof is pretty awesome.
gprof + MPI
gprof isn't designed to work with MPI code.
But, as is generally the case with these things,
it's possible with sufficient abuse:
First, set the environment variable
Then, the usual business:
You should see 32 (or however many processes) files,
This is an undocumented feature of
and it really shouldn't be - it's massively useful.
Now you have a separate
gmon.out file for every
MPI process. Awesome. Sum them:
And use the resulting
gmon.sum to generate
where it's due.
Now, I haven't figured out how to replace the
with the MPI rank -
this could be exponentially more useful to some users.
And the method mentioned in the source doesn't really
seem to be working.
But I'm sure this is possible with some ingenuity.
mpiP is a neat little tool for profiling MPI applications. In particular, it's extremely useful in figuring out how much your application is spending time communicating relative to computing.
The documentation for setting up and using
is complete (good), but small (better).
Once you have
mpiP set up, profiling your code is
as easy as linking it with the
mpiP library and some
other stuff it needs:
Running your code (
mpiexec) will produce
I've found that while
mpiP are great tools
that do different things, using them both gives
me a very good idea of where my programs are spending time
and where I should focus optimization efforts.