Debugging IBM MPI (PE) Applications

You can debug IBM MPI Parallel Environment (PE) applications on the IBM RS/6000 and SP platforms.

To take advantage of TotalView's automatic process acquisition capabilities, you must be running release 2.2 or later of the Parallel Environment for AIX. If you are not running release 2.2, you can run TotalView on release 2.1 if you also load PTF 15.

See Displaying the Message Queue Graph for message queue display information.

Preparing to Debug a PE Application

The following sections describe what you must do before TotalView can display a PE application.

Switch-Based Communication

If you are using switch-based communications (either "IP over the switch" or "user space") on an SP machine, you must configure your PE debugging session so that TotalView can use "IP over the switch" for communicating with the TotalView Debugger Server (tvdsvr), by setting adaptor_use to shared and cpu_use to multiple, as follows:

  • If you are using a PE host file, add shared multiple after all host names or pool IDs in the host file.
     
  • Always use the following arguments on the poe command line:

        -adaptor_use shared -cpu_use multiple

    If you do not want to set these arguments in the poe command line, set the following environment variables before starting poe:

        setenv MP_ADAPTOR_USE shared

        setenv MP_CPU_USE multiple

    When using "IP over the switch," the default is usually shared adapter use and multiple cpu use; to be safe, set it explicitly by using one of these techniques.

When you are using switch-based communications, you must run TotalView on one of the SP or SP2 nodes. Since TotalView uses "IP over the switch" in this case, you cannot run TotalView on an RS/6000 workstation.

Remote Login

You must be able to perform a remote login using the rsh command. You will also need to enable remote logins by adding the host name of the remote node to the /etc/hosts.equiv file or to your .rhosts file.

When the program is using switch-based communications, TotalView tries to start the TotalView Debugger Server by using the rsh command with the switch host name of the node.

Timeout

TotalView automatically sets the timeout value to 600 seconds. If you receive communications timeouts, you can set the value higher. For example:

setenv MP_TIMEOUT 1200 

Note:   The timeout value cannot be set using the poe command line.

 
 
 
 
support@etnus.com
Copyright © 2001, Etnus, LLC. All rights reserved.
Version 5.0