IO Profiling of Applications: MPI Apps

In the last article we looked at using strace to examine the IO patterns of simple serial applications. In the High Performance Computing (HPC) world, applications use MPI (Message Passing Interface) to create parallel applications. This time around we discuss how to attack parallel applications using strace.

By

Tuesday, March 16th, 2010

Strace is one of the those all-purpose tools that can be used for debugging problems on your system(s). It can also be used for digging into the IO profile of applications – even if you don’t have the source code (but with Linux you should always have access to the source). In the last article it was shown how strace can be used to gather a great deal of information about the IO behavior of applications.

The reason that strace can be useful is because IO is performed using libraries on Linux (for the vast majority of applications). Because of this strace can record the information of the specific system call (syscall) in a form that is very useful.

This article discusses using strace for MPI (Message Passing Interface) applications that are common to HPC. Along the way we learn a bit more about using strace.

MPI Overview

This article is not specifically about MPI but for those that may not be familiar with it let’s do a quick 50,000 foot fly-by.

MPI is an API that allows programs systems to communicate with one another and send data back and forth. MPI is a standard set of functions that allow you to send information from one system to another (point-to-point) or to send data from a single system to many systems or vice-versa (collective operations). The systems can be on the same physical hardware (e.g. SMP) or they can be distributed (distinct hardware). As along as the programs can open a communication connection of some type then they can share data.

The basic concepts in MPI are fairly simple. For example, if you have an application running on one system and it needs to exchange data with an application running on a different system, then MPI can be used for exchanging data. The “sending” application calls the function “MPI_Send” to send the data to the target system. The target system uses a function, “MPI_Recv” to receive the data. Typically, the application on the target system and the receiving system is actually the same binary but running on different systems, with some code that determines which system is the “sender” or the “receiver”.

There are many tutorials that will teach you how to write MPI code. In addition, there are some very good MPI libraries, such as Open MPI, MPICH2, and MVAPICH, that provide the needed functions for a variety of communication protocols and networks. These include TCP/IP, InfiniBand, and Myrinet MX to name just a few.

MPI applications are executed in several ways. Probably the most common method is that the application is executed once for every core on a system. So a quad-core system would have four instances of the application started. If we have three systems, each with four cores, then we could start 12 instances of the application. When the various instances of the application start they communicate with each other to establish who’s who and where everyone is located, etc. There can be some synchronization between applications as well to make sure they are all in lock-step. Then the applications start computing and sending/receiving data back and forth until the overall application is finished.

Using strace with MPI codes

MPI codes, while a bit more complicated than serial codes, don’t necessarily have to be difficult to use with strace. Ideally, we would like to have one strace output for every MPI process (assuming there are no forks or vforks in the code). This includes having one output for each process even on the same system. So if we had four cores on a node, we would want four strace output files per node. The reason we want one output file per MPI process is so we can tell which MPI process is performing I/O, how much I/O, and it’s performance.

Usually MPI codes are launched by using mpirun or mpiexec or something equivalent that comes with the MPI library. But the problem is that if you try to use strace with this command you end up getting the strace of mpirun or mpiexec itself, not the strace of the actual application, which is what you want. So we need a way to use strace and separate the output files for each process.

For the example below, I’ll be using Open MPI. Open MPI has a utility to start codes called mpirun. A sample command line for Open MPI to run an MPI code is:

where MACHINEFILE is the name of the file containing a list of the machines (host names) being used, <path-to-code> is the path to where the executable is located, <executable> is the name of the actual executable, and <code-options> are any command-line arguments to the executable.

To use strace with an MPI application the first thing people might try is to change the command line to look like:

but all this does is run strace against mpirun, not against the executable as we want. How do we fix this?

The way I run strace against an MPI binary is convert the single command line into two scripts. The first script is for the mpirun command and the second script is for the actual MPI executable. The first script, which I’ve named “main.sh”, is fairly easy:

It’s not too different than the mpirun command line previously presented except rather than specify the executable, I specify a script, “code1.sh”, and I give the path to this second script. The second script, which I’ve named code1.sh in this example, is for the actual MPI executable plus strace.

In this second script all of the strace action takes place. As with the serial code I use the “-ttt” option to get microsecond timing using seconds since the epoch, the elapsed times for the syscall using the “-T”option, and I specify the strace output using the -o option. In this case, I’m sending the output to /tmp and naming it strace.out.$$. The.$$ after strace.out is a special bash variable that contains the ProcessID (PID) of the script. Since each script will get a unique PID we will have separate strace files for each MPI process.

The second bit of bash knowledge is the option $@ at the end of the script. This is a predefined bash variable that contains all of the command line options after the script code1.sh. These are the command-line arguments for the actual executable. $@ will contain arg1, arg2, arg3, and so on. It’s important to make sure you understand how to use $@. So let’s look at a really quick example.

There is an I/O benchmark called IOR from Lawrence Livermore Labs that has a number of arguments you can pass to the code that describe the details of how to run the benchmark. Here’s an example:

IOR -r -w -a MPIIO -b 25m -N 4 -s 25 -t 10m -v -o <file location>

where IOR is the name of the executable. Don’t won’t worry about what all of the options mean, but let me point out one option. The option “-N 4″ tells the code to use four MPI processes. You can change the value of 4 to correspond to what the scheduler defines. Now how do we pass these arguments to the script that actually runs the code?

Sticking with the IOR example the main.sh script would look like the following:

Notice how I’ve taken the command-line arguments and put them in the main.sh script. With the $@ bash predefined variable in the code script (code1.sh), the options from the main script are passed to the code script. The code script doesn’t change at all (except for the name of the binary):

The only thing that changed was the name of the binary from code1 to IOR. So if you want to change the arguments to a code you have to modify the main script. Even if your code doesn’t have any command-line arguments I would recommend just leaving $@ in the code for future reference.

Just a quick note here; Brian Mueller from Panasas was the bash script expert who taught me the “bash-fu” (thanks Brian!).

Write more, thats all I have to say. Literally,
itt seems as though you relied on the video to make your point.
Yoou clearly know what youre talking about, why throw away
your intelligence on just posting videos to yoour
site when you could be givinmg us something informative to read?

Hi there just wanted to give you a brief heads upp aand lett you know a
feww of the images aren’t loadinjg properly. I’m nott sure why
but I think its a linking issue. I’ve tried it in two different web browsers and both show the
same results.

Howdy excellent blog! Does running a blog such ass this
take a largge amount of work? I have absolutely no knowledge of programming however I
had been hoping to start my own blog in the near future.
Anyways, hould you have any suggestions or techniques for new blog owners please share.

I know this is off subject nevertheless I simply had to ask.
Many thanks!

Hello, i read your blog occasionally and i own a siumilar one and i was just curious if you get
a lot of spoam responses? If so hhow do you reduce it,
any plugin oor anything you can advise? I geet so much lately it’s driving mee mad so any assistance is very much appreciated.

I blog frequently andd I really appreciate your information. This great article has really peaked my interest.
I wilkl bookmark your site and keep checking for new details about once perr week.
I subscribed to your Feed as well.

I absolutely love your blog and find many oof your post’s to be precisely what
I’m looking for. Would you offer guestt writers to write content for
yourself? I wouldn’t mind composing a polst or elaborating on most of the subjects you write related to here.
Again, awesome website!

hey there and thank you for your info – I have certainly picked up something new from right here.
I did however expertise some technical points using this
weeb site, sibce I experienced to reload the web site a lot of times prdevious to I could get it to load properly.

I had been wondering if your hosting is
OK? Not that I’m complaining, but slow loading instances times will sometimes aaffect your
placement in google and can damage your high-quality score if advertising and marketing with Adwords.
Anyway I’m adding this RSS to my email and could look out for
a lot more of your respective intriguing content. Ensure that you update this again very soon.

You’re so cool! I do not believe I have read anything like
this before. So wonderful to find another person with unique thoughts on this subject.
Really.. thanks ffor starting this up. This website
is something that is needed on the internet,
someone wiith a little originality!

Hmm it seems like your site ate my first comment (it was extremely long) so I guess I’ll just sum it up
what I submitted and say, I’m thoroughly enjoying your
blog. I too am an aspiring blog writer but I’m still new tto the whole thing.
Do you hve any tips and hints for rookie blog writers?
I’d definitely appreciate it.

Admiring the timme and effort you put into your blog and in depth
information you provide. It’s awesome to come
across a blog every oncfe in a whiloe that
isn’t the same out of date rehashed information. Excellent read!
I’ve saved your site and I’m including your RSS feeds to my Google account.

heloo there and thank you forr your info – I have definitely picked up something new
from right here. I did however expertise ssome technical points using this web site, since
I experienced to reload the web site many times previous
to I could get it tto load correctly. I had been wondering if your hosting is
OK? Not that I’m complaining, but sluggish loading
instances times will often affect youur placement in google andd could damage your quality score if
ads and marketing with Adwords. Anyway I am adding this RSS
to mmy e-mail and can look out for much more of your respective intrioguing content.
Make sure you update this agasin very soon.

Do you have a spam problem on this website; I also am a blogger, and I was wondering your situation; we
have created some nice methods and we are looking to exchange solutions with others, poease shoot me an e-mail if interested.

I’ll immediately take hold of your rss as I can not in finding your email subscription hyperlink or newsletter service.
Do you have any? Please allow me realize in order that I may just
subscribe. Thanks.

Because you are serious about your domain flipping business,
you should treat it as such because you are looking for long term growth
and not just quick short term money. It is important to
set a time schedule for working. How to Become Successful: The Four Blueprint Success Questions.

9 million dollars in 4 months (Last year Carbon Copy Pro paid 6 millions – just to give you an idea
of how fast EN is going) David Wood’s goal is to pay $50 Million this year which is totally doable specially now that EN is all over the world and has the empower network ewallet in place.

Simply press your index finger just beneath the knot as it forms.

Thus, the expected growth and demand for CSR providers are expected to intensify.

Hi there! Do you know if they make any plugins to help with Search
Engine Optimization? I’m trying to get my blog too rnk for some ttargeted keywords but I’m
noot seeing very good success. If you know of any please share.
Thank you!

Founded in 1984, this brand is completely dedicated to the modern girl who is independent and romantic and has a flair for vintage designs.
Thus, if you want to look for the best in Italian designer shoes, Dior fur shoes
is of great your choice and offers stylish beauty for the tasteful ladies.
It is meant to provide comfort and proper support so you can compete at the top
level.

Frank Breimling is a recogniz?d expert in Affiliate Marketing.
For example, it can give you advice on the maintrnance of inventories
of cor?orate mergers annd acquisitions, and expnsion of thhe business.

?ou ?aan also find Bisk textbooks t?at ?ave
tips, ideas and strateg?es ?egarding how to plan for the
CPA exam, along with 1000s of practice questions supoported bby
in-depth expl?nations of correct and incor?ect answers.

CBT ?s the only format in which the tte?t is offereed but if reque?tged at
the time of application, certain accommo?ations in accordance with thee Am?ricans with Disabilit?es
Act (ADA) can be m?de f?r specific candidates.
For most small non-farm business?s, thee dsductions are taken oon a Sc?edule C (Profit or Loss? From B?siness),
or a Schedule C-EZ (for thosee w?th deductions of ?es thaan $5000).
The skills yoou acquire in y?ur training to become a
CPA will g?ve you confidence in yo?rself and your newfound abilities.

Jim Tr?ppon CPA and his team offer comprehensive personal and business C?A ser?ices for clients near the B?nker Hill
Village area. Those are some of t?e important
characteri?tics that you w?nt to look for as you’re sear?hing for a ?PA.

A CPA must complete a certain amount off CPE (Continuing Profe?sional Education) units each year.

I simply want to tell you that I am newbie to weblog and absolutely liked your web-site. Most likely I’m planning to bookmark your website . You amazingly come with wonderful writings. Many thanks for sharing with us your web site.