MPI Debugging Troubleshooting

If you cannot successfully start TotalView on MPI programs, check the following:

  • Can you successfully start MPICH programs without TotalView? The MPICH code contains some useful scripts that let you verify that you can start remote processes on all of the machines in your machines file. (See tstmachines in mpich/util.)
     
  • You will not get a message queue display if you get the following warning:

    The symbols and types in the MPICH library used by TotalView to extract the message queues are not as expected in the image <<your image name>>. This is probably an MPICH version or configuration problem.

    You need to check that you are using MPICH 1.1.0 or later and that you have configured it with the -debug option. (You can check this by looking in the config.status file at the root of the MPICH directory tree).

  • Does the TotalView Debugger Server (tvdsvr) fail to start? tvdsvr must be on your PATH when you log in. Remember that rsh is being used to start the server, and it does not pass your current environment to the process you started remotely.
     
  • Make sure you have the correct MPI version and have applied any required patches. See the TotalView Release Notes for up-to-date information.
     
  • Under some circumstances, MPICH kills TotalView with the SIGINT signal. You can see this behavior when restarting an MPICH job using the Group > Delete command. If TotalView exits and is terminated abnormally with a Killed message, try setting the TotalView -ignore_control_c command-line option. For example:
    setenv TOTALVIEW "totalview -ignore_control_c"
    mpirun -tv /users/smith/mympichexe
 
 
 
 
support@etnus.com
Copyright © 2001, Etnus, LLC. All rights reserved.
Version 5.0