Home >> Service >> Rechenanlagen >> LSF (English version)

Zur deutschen Version

Compute resource management with LSF

1. Overview

2. Commands

3. Common problems


1. Overview

The batch system Load Sharing Facility (LSF) by Platform Computing unifies usage of all GWDG compute resources. It is available on all compute resource frontends, namely gwdu102, gwdu104, gwdu105, gwds1, gwdg-wk20, and gwdk081. The different architectures available at the GWDG are mapped to different queues, while special resource requirements like hard disk space or memory are handled by additional parameters to the job submission command. Hosts belonging to the same architecture are subsumed in host groups with sub groups representing different configurations within the same architecture. An overview of the LSF configuration is given in the following table.

LSF - configuration

Queue System Host group Number of slots Maximum walltime (hours)
gwdg-x64par(-short) Xeon64 cluster hgroupx64 592 (641) 48 (4)
gwdg-ia64(-long) SGI Altix 4700 gwds1 508 (322) 48 (120)
gwdg-oppar Opteron cluster hgroupoppar 60 48
gwdg-opser(-long) Opteron workstations hgroupopser 32 (16) 48 (3363)
gwdg-pcser(-long) Xeon64 workstations hgroupwk 25 48 (1203)
gwdg-p690 IBM p690 hgroupp690 30 48
116 slots of these are usable in queue gwdg-x64par-short only.
232 slots have 6 GB of memory instead of 3 GB. Please select the memory per CPU slot with the -M parameter of bsub.
3The maximum walltime is 336h, the maximum CPU time is 336h for opser and 120h for pcser, respectively. Therefore, the gwdg-pcser-long queue is suited for jobs with unpredictable walltimes.

This document describes only the parts of LSF most relevant to GWDG users, as well as the characteristics of the GWDG setup. You can find further information in the official user documentation and the LSF manpages.


2. Commands

2.1. Submitting jobs: bsub

2.2. Watching jobs: bjobs

2.3. Terminating jobs: bkill

2.4. Host information: bhosts


2.1. Submitting jobs: bsub

a) Syntax

bsub -q queuename -a wrapper -n nproc -M mem_in_kb -W hh:mm -m host -R resourcestring jobcommand
bsub < jobscript

b) Comments

With bsub new jobs can be submitted to LSF. Options can be entered on the command line or in a shell script right after the interpreter information. In a script each option has to be prefixed with #BSUB (see examples). The most important options are:

-q queuename
Available queue names can be found in the configuration table. This table also shows the architecture corresponding to each queue.
-a wrapper
Wrappers are additional scripts used for the submission of parallel jobs. The following wrappers are currently available:
  • openmp for SMP jobs
  • scampi for MPI jobs in queue gwdg-oppar
  • mvapich_gc for MPI jobs in queue gwdg-x64par (MVAPICH compiled with the GNU compiler)
  • mvapich_ic for MPI jobs in queue gwdg-x64par (MVAPICH compiled with the Intel compiler)
  • mvapich2_gc for MPI jobs in queue gwdg-x64par (MVAPICH2 compiled with the GNU compiler)
  • mvapich2_ic for MPI jobs in queue gwdg-x64par (MVAPICH2 compiled with the Intel compiler)
  • openmpi_gc for MPI jobs in queue gwdg-x64par (OpenMPI compiled with the GNU compiler)
  • poe-p690 for MPI jobs in queue gwdg-p690
If the execution of serial commands is required prior to the parallel program, one needs to call the pam command directly, instead of using a wrapper with -a. An exception is the queue gwdg-x64par, where this isn't necessary (see examples).
  • "pam -g openmp_wrapper" for Openmp jobs
  • "pam -g sca_mpimon_wrapper" for MPI jobs in queue gwdg-oppar
  • "pam -g mvapich_gc_wrapper" for MPI jobs in queue gwdg-x64par (MVAPICH compiled with the GNU compiler)
  • "pam -g mvapich_ic_wrapper" for MPI jobs in queue gwdg-x64par (MVAPICH compiled with the Intel compiler)
  • "pam -g mvapich2_gc_wrapper" for MPI jobs in queue gwdg-x64par (MVAPICH2 compiled with the GNU compiler)
  • "pam -g mvapich2_ic_wrapper" for MPI jobs in queue gwdg-x64par (MVAPICH2 compiled with the Intel compiler)
  • "pam -g openmpi_gc_wrapper" for MPI jobs in queue gwdg-x64par (OpenMPI compiled with the GNU compiler)
  • "pam -g 1 poe-p690" for MPI jobs in queue gwdg-p690
  • "pam -g 1 poe-p690-hybrid" for special SMP/MPI hybrid jobs in queue gwdg-p690
For Openmp and SMP/MPI hybrid jobs one also needs to export "LSF_PAM_HOSTLIST_USE=unique".
-n nprocmin,nprocmax
The number of processors is requested with this option. A job starts, when at least nprocmin processors are available, and then uses all available processors up to nprocmax. If only one value is given, it is used for both nprocmin and nprocmax. Please note that this value refers to the number of processors, not the number of nodes, even in the case of SMP and hybrid jobs.
-M mem_in_kb
With this option one sets minimum memory requirements for the execution hosts of a job. This is useful on the Altix (queue gwdg-ia64), where nodes with 3GB as well as 6GB of RAM are available. The value is in KB per CPU. Please note that stating a memory requirement with -M is not the same as making a memory reservation with -R, as it is necessary in queue gwdg-p690.
-W hh:mm
This is the walltime, i.e., the maximum time the job can have the status "running", in hours and minutes. The maximum possible values in the different queues are listed in the configuration table.
-m host
This option can be used to choose the hosts allowed to execute a job. This is useful if not all hosts belonging to a certain queue are equally well suited for the job's requirements. A list of hosts can be given in quotation marks, using spaces as separators ("host1 host2 host3"). Alternatively you can use the host groups listed in the configuration table. Applications of this option are:
  • Using a host with more than 1GB memory in gwdg-p690 with -m gwdk081
If available, hosts and host groups with faster processors within the same architecture are also given in the configuration table. They can alternatively be selected by using the appropriate resource string (see below).
-R resourcestring
A resourcestring can be used to reserve special resources for a job. Multiple reservation requirements have to be separated by spaces and enclosed by quotation marks. The syntax for a reservation requirement is
section[string], with section being select, order, rusage, span, or same. Possible applications in the GWDG environment are:
select
With a selectstring a faster CPU can be reserved in queue gwdg-p690 ("select[model=Power417]").
rusage
In queue gwdg-p690 "rusage[resmem=sizeofmem]" must be used to reserve memory (sizeofmem MB per processor). If you reserve more than 1000MB per processor the job will run on host gwdk081.
span
"span[ptile=npn]" is used to distribute the job using npn processors per host. In order to use only one host for a parallel job, npn needs to be equal to nproc (see option -n). In order to use a host exclusively, npn needs to be equal to the number of job slots on the host. The -x option for exclusive host usage is not supported at GWDG.

c) Examples

Jobs on the Opteron workstations (queue gwdg-opser)

  1. Running the program serprog for 24h on one CPU
         bsub -q gwdg-opser -W 24:00 serprog
        
  2. Running an SMP application, for example parallel Gaussian, on multiple (four at most) processors of one host. The openmp wrapper (-a option) starts the program only once. -R span[ptile=4] allocates all the four processors (-n 4) on one host. This is the default setting in queue gwdg-opser, so it is actually not needed here.
         bsub -q gwdg-opser -W 24:00 -a openmp -n 4 -R span[ptile=4] smpprog
        
  3. As in 2, but using a job script jobscript. This time an OpenMP program is started, which means that the variable OMP_NUM_THREADS has to be exported.
         bsub < jobscript
        
    jobscript contains:
         #!/bin/sh
         #BSUB -q gwdg-opser
         #BSUB -W 24:00
         #BSUB -n 4
         #BSUB -R span[ptile=4]
         
         export OMP_NUM_THREADS=4
         pam -g openmp_wrapper ompprog  
        
  4. Using the same syntax one can also let a serial program use a node exclusively. Mostly this is done in order to use all main memory of a node. The -R span[ptile=4] is missing in this example, as it is the default value for this queue.
         bsub < jobscript
        
    jobscript contains:
         #!/bin/sh
         #BSUB -q gwdg-opser
         #BSUB -W 24:00
         #BSUB -n 4
         
         serprog  
        
  5. A Gaussian03 job on one CPU can also be submitted using a script (in this case: g03lsf):
          bsub < g03lsf
        
    g03lsf contains:
          #!/bin/ksh
          #BSUB -q gwdg-opser
          #BSUB -W 48:00
    
          export g03root="/usr/product/gaussian"
          . $g03root/g03/bsd/g03.profile
          export GAUSS_SCRDIR="/scratch"
          g03 < input.com > output.log
        
    g03root can also be set to "/usr/product/gaussian64" in the script:
          ...
          export g03root="/usr/product/gaussian64"
          ...
        
    That way the 64 bit version of Gaussian is chosen. In queues gwdg-x64par and gwdg-ia64, the 64 bit version is the default, and therefore available under "/usr/product/gaussian".

Jobs on the Opteron cluster (queue gwdg-oppar)

  1. Running the MPI program mpiprog for 24h on at least 16 and at most 32 processors. The scampi wrapper is used for starting mpiprog.
         bsub -q gwdg-oppar -W 24:00 -a scampi -n 16,32 mpiprog
        
  2. Like 1, but using a job script jobscript
         bsub < jobscript
        
    jobscript contains:
         #!/bin/sh
         #BSUB -q gwdg-oppar
         #BSUB -W 24:00
         #BSUB -n 16,32
    
         pam -g sca_mpimon_wrapper mpiprog  
        

Jobs on the Xeon64/Woodcrest cluster (queue gwdg-x64par)

  1. Running the MPI program mpiprog for 24h on at least 16 and up to 32 processors. The MPI wrapper mvapich_gc is used to choose the GNU compiled MVAPICH library for this job. In contrast to the older architectures a second wrapper, mpirun.lsf, is required.
         bsub -q gwdg-x64par -W 24:00 -n 16,32 -a mvapich_gc mpirun.lsf ./mpiprog
        
  2. As in 1, but using a job script jobscript
         bsub < jobscript
        
    jobscript contains:
         #!/bin/sh
         #BSUB -q gwdg-x64par
         #BSUB -W 24:00
         #BSUB -n 16,32
         #BSUB -a mvapich_gc
    
         mpirun.lsf ./mpiprog  
        
  3. As in 2, but with a direct call to pam, like it is done on the older architectures.
         bsub < jobscript
        
    jobscript contains:
         #!/bin/sh
         #BSUB -q gwdg-x64par
         #BSUB -W 24:00
         #BSUB -n 16,32
    
         pam -g mvapich_gc_wrapper ./mpiprog
        

Jobs on the SGI Altix 4700 (queue gwdg-ia64)

Due to the integration of specific tools for the Altix into LSF, job submission differs slightly from the other architectures. Also only parallel jobs for at least 4 processors are allowed.
  1. Running the MPI program mpiprog for 24h on at least 16 and up to 32 processors. Pam is called directly, no wrapper is used.
         bsub -q gwdg-ia64 -W 24:00 -n 16,32 pam -mpi -auto_place ./mpiprog
        
  2. As in 1, but using a job script jobscript. Additionally, nodes with 6GB per processor are required for this job.
         bsub < jobscript
        
    jobscript contains:
         #!/bin/sh
         #BSUB -q gwdg-ia64
         #BSUB -W 24:00
         #BSUB -n 16,32
         #BSUB -M 6000000
    
         pam -mpi -auto_place ./mpiprog  
        
  3. Using 64 processors (maximum is 256) for an SMP program. The wrapper openmp (-a option) ensures that the program is started only once. As ompprog is an OpenMP application, the variable OMP_NUM_THREADS has to be exported.
         bsub -q gwdg-ia64 -W 24:00 -a openmp -n 64 env OMP_NUM_THREADS=64 ./ompprog
        
  4. As in 3, but using a job script jobscript. 32 CPUs with 6GB per processor are required for this job, though.
         bsub < jobscript
        
    jobscript contains:
         #!/bin/sh
         #BSUB -q gwdg-ia64
         #BSUB -W 24:00
         #BSUB -n 32
         #BSUB -M 6000000
         #BSUB -a openmp
         
         export OMP_NUM_THREADS=32
         ./ompprog
        

2.2. Watching jobs: bjobs

a) Syntax

bjobs -l -a -r -p -u uid -m host jobid

b) Comments

bjobs displays the job status. If no jobid is given, all jobs matching the job selection options are shown.

-l
Displays the detailed job status.
-a -r -p
-a displays running (RUN), pending (PEND), and recently finished jobs. By default only running and pending jobs are shown. -r shows only running jobs and -p only pending jobs plus the pending reason.
-u uid
Displays jobs of user uid. Without this option your own jobs are shown. uid=all displays the jobs of all users.
-m host
Displays jobs running on the given host or hostgroup. A list of hosts or hostgroups can be given in quotation marks (e.g. -m "gwdl001 hgroupwk")

c) Examples

  1. Show all jobs of user xyz running on the Woodcrest cluster
  2.       bjobs -r -u xyz -m hgroupx64
        

2.3. Terminate jobs: bkill

a) Syntax

bkill -r -u uid -m host -q queue jobid

b) Comments

bkill is used to terminate a running job, or to remove a waiting job from a queue. If no jobid is given, one of the options -u -m -q has to be used. In this case the last job matching all given criteria is removed. Using jobid=0 removes all jobs meeting these criteria.

-r
Removes a job from the batch system's database without waiting for confirmation that it actually has been terminated. This option should only be used when removal without it fails repeatedly.
-u uid -m host -q queue
These options determine the criteria by which jobs are selected for the bkill command. -u uid selects jobs of user uid (only administrators can remove jobs of other users), -m host jobs on the given host or hostgroup, and -q queue jobs in the given queue. Without a jobid the last matching job is selected for bkill, with jobid=0 all matching jobs are selected. Other values for jobid invalidate these options.

c) Examples

  1. Removing job 14444
          bkill 14444
        
  2. Removing all jobs in queue gwdg-x64par running on host gwdm111
          bkill -m gwdm111 -q gwdg-x64par 0
        

2.4. Host information: bhosts

a) Syntax

bhosts -l -w host

b) Comments

bhosts shows information on the host or hostgroup host.

-l -w
-l and -w change the output format. -w shows more information than the default output, but still uses one line per host. -l results in a detailed, multiple line output for each host.

c) Examples

  1. Extended one line information on all hosts
          bhosts -w
        
  2. Detailed, multiple line information on the hosts gwdm001, gwdm002, and gwdm003
          bhosts -l gwdm001 gwdm002 gwdm003
        

3. Common problems

Using LSF commands results in "command not found" or similar messages

The LSF environment variables were not set correctly. You can do this manually on the commandline. For ksh or bash simply type
      . /opt/hplsf/conf/profile.lsf
    
For csh or tcsh:
      source /opt/hplsf/conf/cshrc.lsf
    

The "#BSUB ..." lines are ignored when submitting a job script

When submitting an entirely correct job script, LSF ignores all options set with "#BSUB ...". For example the job is added to the default gwdg-pcpar queue instead of the one chosen with "#BSUB -q". Most often this is caused by a missing less than <. The correct line must look like this:
      bsub < jobscript
    

Jobs fail to start and remain in status PEND

This behaviour can have, among others, the following reasons:
  • On the Linux based resources (pc(par/ser), op(par/ser), x64par, ia64) your GWDG account has to be activated before you can use them. Please contact support@gwdg.de.
  • You requested an inexistant resource. This can happen, for example, when one adapts a script from gwdg-p690 to a different queue, and then forgets to remove the -R rusage1[resmem=...] resource requirement.
  • The requested queue has been deactivated for maintenance. You can use the bqueues command to view the queue status.

Gaussian jobs abort immediately, the output contains a "permission denied" message.

In order to use Gaussian your account has to be added to the Gaussian users group, in addition to being activated on the Linux resources. Please send an email with a few lines describing your project and an estimate of its scale (number of calculations, methods, number of basis functions, etc.) to: cboehme1@gwdg.de.

Missing libraries when submitting between different architectures

Submitting from one architecture to another - for example from gwdu102 (x86 Linux) into queue gwdg-ia64 (ia64 Linux) - sometimes results in error messages regarding missing libraries. In this case please resubmit from the same architecture as intended for job execution and inform us of the problem.

Program 'hangs' when using STDIN redirection

When using the < character to read a file into the STDIN stream of a program, this program stops reacting. This problem occurs when submitting with the -a openmp option. In this case please use cat to form a pipe, i.e.
      bsub -a openmp ... 'cat inputfile | smpprog'
    
instead of
      bsub -a openmp ... smpprog < inputfile
    
An analogous workaround can also be used in submission scripts.

Last change: 25/09/2008
Christian Boehme