[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
The
lsb.paramsfile defines general parameters used by the LSF system. This file contains only one section, named Parameters.mbatchduseslsb.paramsfor initialization. The file is optional. If not present, the LSF-defined defaults are assumed.Some of the parameters that can be defined in
lsb.paramscontrol timing within the system. The default settings provide good throughput for long- running batch jobs while adding a minimum of processing overhead in the batch daemons.This file is installed by default in
LSB_CONFDIR/cluster_name/configdir.[ Top ]
Parameters Section
This section and all the keywords in this section are optional. If keywords are not present, the default values are assumed. The valid keywords for this section are:
ABS_RUNLIMIT
ABS_RUNLIMIT = y|YIf set, the run time limit specified by the
-Woption ofbsub, or the RUNLIMIT queue parameter inlsb.queuesis not normalized by the host CPU factor. Absolute wall-clock run time is used for all jobs submitted with a run limit.Undefined. Run limit is normalized.
ACCT_ARCHIVE_AGE
ACCT_ARCHIVE_AGE =daysEnables automatic archiving of LSF accounting log files, and specifies the archive interval. LSF archives the current log file if the length of time from its creation date exceeds the specified number of days.
- ACCT_ARCHIVE_SIZE also enables automatic archiving.
- ACCT_ARCHIVE_TIME also enables automatic archiving.
- MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives.
Undefined (no limit to the age of
lsb.acct).ACCT_ARCHIVE_SIZE
ACCT_ARCHIVE_SIZE =kilobytesEnables automatic archiving of LSF accounting log files, and specifies the archive threshold. LSF archives the current log file if its size exceeds the specified number of kilobytes.
- ACCT_ARCHIVE_AGE also enables automatic archiving.
- ACCT_ARCHIVE_TIME also enables automatic archiving.
- MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives.
Undefined (no limit to the size of
lsb.acct).ACCT_ARCHIVE_TIME
ACCT_ARCHIVE_TIME =hh:mmEnables automatic archiving of LSF accounting log file
lsb.acct, and specifies the time of day to archive the current log file.
- ACCT_ARCHIVE_AGE also enables automatic archiving.
- ACCT_ARCHIVE_SIZE also enables automatic archiving.
- MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives.
Undefined (no time set for archiving
lsb.acct).CHUNK_JOB_DURATION
CHUNK_JOB_DURATION =minutesSpecifies a CPU limit or run limit for jobs submitted to a chunk job queue to be chunked.
When CHUNK_JOB_DURATION is set, the CPU limit or run limit set in the queue (CPULIMIT or RUNLMIT) or specified at job submission (
-cor-Wbsuboptions) must be less than or equal to CHUNK_JOB_DURATION for jobs to be chunked.If CHUNK_JOB_DURATION is set, jobs are not chunked if:
- No CPU limit and no run limit are specified in the queue (CPULIMIT and RUNLIMIT) or at job submission (
-cor-Wbsuboptions).or
- CPU limit or a run limit is greater than the value of CHUNK_JOB_DURATION.
If CHUNK_JOB_DURATION is set, chunk jobs are accepted regardless of the value of CPULIMIT or RUNLIMIT.
The value of CHUNK_JOB_DURATION is displayed by
bparams -l.Undefined
CLEAN_PERIOD
CLEAN_PERIOD =secondsFor non-repetitive jobs, the amount of time that job records for jobs that have finished or have been killed are kept in
mbatchdcore memory after they have finished.Users can still see all jobs after they have finished using the
bjobscommand.For jobs that finished more than CLEAN_PERIOD seconds ago, use the
bhistcommand.3600 (1 hour)
CPU_TIME_FACTOR
CPU_TIME_FACTOR=numberUsed only with fairshare scheduling. CPU time weighting factor.
In the calculation of a user's dynamic share priority, this factor determines the relative importance of the cumulative CPU time used by a user's jobs.
0.7
COMMITTED_RUN_TIME_FACTOR
COMMITTED_RUN_TIME_FACTOR=numberUsed only with fairshare scheduling. Committed run time weighting factor.
In the calculation of a user's dynamic priority, this factor determines the relative importance of the committed run time in the calculation. If the
-Woption ofbsubis not specified at job submission and a RUNLIMIT has not been set for the queue, the committed run time is not considered.Any positive number between 0.0 and 1.0
0.0
DEFAULT_HOST_SPEC
DEFAULT_HOST_SPEC=host_name | host_modelThe default CPU time normalization host for the cluster.
The CPU factor of the specified host or host model will be used to normalize the CPU time limit of all jobs in the cluster, unless the CPU time normalization host is specified at the queue or job level.
Undefined
DEFAULT_PROJECT
DEFAULT_PROJECT=project_nameThe name of the default project. Specify any string.
When you submit a job without specifying any project name, and the environment variable LSB_DEFAULTPROJECT is not set, LSF automatically assigns the job to this project.
defaultDEFAULT_QUEUE
DEFAULT_QUEUE=queue_name ...Space-separated list of candidate default queues (candidates must already be defined in
lsb.queues).When you submit a job to LSF without explicitly specifying a queue, and the environment variable LSB_DEFAULTQUEUE is not set, LSF puts the job in the first queue in this list that satisfies the job's specifications subject to other restrictions, such as requested hosts, queue status, etc.
Undefined. When a user submits a job to LSF without explicitly specifying a queue, and there are no candidate default queues defined (by this parameter or by the user's environment variable LSB_DEFAULTQUEUE), LSF automatically creates a new queue named
default, using the default configuration, and submits the job to that queue.DISABLE_UACCT_MAP
DISABLE_UACCT_MAP = y | YSpecify y or Y to disable user-level account mapping.
Undefined
EADMIN_TRIGGER_DURATION
Defines how often
LSF_SERVERDIR/eadminis invoked once a job exception is detected. Used in conjunction with job exception handling parameters JOB_OVERRUN and JOB_UNDERRUN inlsb.queues.EADMIN_TRIGGER_DURATION=205 minutes
ENABLE_HIST_RUN_TIME
ENABLE_HIST_RUN_TIME = y|YUsed only with fairshare scheduling. If set, enables the use of historical run time in the calculation of fairshare scheduling priority.
Undefined
ENABLE_USER_RESUME
ENABLE_USER_RESUME = Y|NDefines job resume permissions.
When this parameter is defined:
- If the value is Y, users can resume their own jobs that have been suspended by the administrator.
- If the value is N, jobs that are suspended by the administrator can only be resumed by the administrator or
root; users do not have permission to resume a job suspended by another user or the administrator. Administrators can resume jobs suspended by users or administrators.Undefined (users cannot resume jobs suspended by administrator)
EVENT_UPDATE_INTERVAL
EVENT_UPDATE_INTERVAL =secondsUsed with duplicate logging of event and accounting log files. LSB_LOCALDIR in
lsf.confmust also be specified. Specifies how often to back up the data and synchronize the directories (LSB_SHAREDIR and LSB_LOCALDIR).The directories are always synchronized when data is logged to the files, or when
mbatchdis started on the first LSF master host.Use this parameter if NFS traffic is too high and you want to reduce network traffic.
1 to INFINIT_INT
INFINIT_INT is defined in
lsf.hUndefined
See lsf.conf under LSB_LOCALDIR.
HIST_HOURS
HIST_HOURS =hoursUsed only with fairshare scheduling. Determines a rate of decay for cumulative CPU time and historical run time.
To calculate dynamic user priority, LSF scales the actual CPU time using a decay factor, so that 1 hour of recently-used time is equivalent to 0.1 hours after the specified number of hours has elapsed.
To calculate dynamic user priority with historical run time, LSF scales the accumulated run time of finished jobs using the same decay factor, so that 1 hour of recently-used time is equivalent to 0.1 hours after the specified number of hours has elapsed.
When HIST_HOURS=0, CPU time accumulated by running jobs is not decayed.
5
JOB_ACCEPT_INTERVAL
JOB_ACCEPT_INTERVAL=integerThe number you specify is multiplied by the value of
lsb.paramsMBD_SLEEP_TIME (60 seconds by default). The result of the calculation is the number of seconds to wait after dispatching a job to a host, before dispatching a second job to the same host.If 0 (zero), a host may accept more than one job. By default, there is no limit to the total number of jobs that can run on a host, so if this parameter is set to 0, a very large number of jobs might be dispatched to a host all at once. This can overload your system to the point that it will be unable to create any more processes. It is not recommended to set this parameter to 0.
JOB_ACCEPT_INTERVAL set at the queue level (
lsb.queues) overrides JOB_ACCEPT_INTERVAL set at the cluster level (lsb.params).1
JOB_ATTA_DIR
JOB_ATTA_DIR =directoryThe shared directory in which
mbatchdsaves the attached data of messages posted with thebpostcommand.Use JOB_ATTA_DIR if you use
bpost(1) andbread(1)to transfer large data files between jobs and want to avoid using space in LSB_SHAREDDIR. By default, thebread(1) command reads attachment data from the JOB_ATTA_DIR directory.JOB_ATTA_DIR should be shared by all hosts in the cluster, so that any potential LSF master host can reach it. Like LSB_SHAREDIR, the directory should be owned and writable by the primary LSF administrator. The directory must have at least 1 MB of free space.
The attached data will be stored under the directory in the format:
JOB_ATTA_DIR/timestamp.jobid.msgs/msg$msgindexOn UNIX, specify an absolute path. For example:
JOB_ATTA_DIR=/opt/share/lsf_workOn Windows, specify a UNC path or a path with a drive letter. For example:
JOB_ATTA_DIR=\\HostA\temp\lsf_workor JOB_ATTA_DIR=D:\temp\lsf_workAfter adding JOB_ATTA_DIR to
lsb.params, usebadmin reconfigto reconfigure your cluster.JOB_ATTA_DIR can be any valid UNIX or Windows path up to a maximum length of 256 characters.
Undefined
If JOB_ATTA_DIR is not specified, job message attachments are saved in
LSB_SHAREDIR/info/.JOB_DEP_LAST_SUB
Used only with job dependency scheduling.
If set to 1, whenever dependency conditions use a job name that belongs to multiple jobs, LSF evaluates only the most recently submitted job.
Otherwise, all the jobs with the specified name must satisfy the dependency condition.
Undefined
JOB_EXIT_RATE_DURATION
Defines how long LSF waits before checking the job exit rate for a host. Used in conjunction with EXIT_RATE in
lsb.hostsfor LSF host exception handling.If the job exit rate is exceeded for the period specified by JOB_EXIT_RATE_DURATION, LSF invokes
LSF_SERVERDIR/eadminto trigger a host exception.JOB_EXIT_RATE_DURATION=510 minutes
JOB_PRIORITY_OVER_TIME
JOB_PRIORITY_OVER_TIME =increment/intervalJOB_PRIORITY_OVER_TIME enables automatic job priority escalation when MAX_USER_PRIORITY is also defined.
increment
Specifies the value used to increase job priority every interval minutes. Valid values are positive integers.
interval
Specifies the frequency, in minutes, to increment job priority. Valid values are positive integers.
Undefined
JOB_PRIORITY_OVER_TIME=3/20Specifies that every 20 minute interval increment to job priority of pending jobs by 3.
JOB_SCHEDULING_INTERVAL
JOB_SCHEDULING_INTERVAL =secondsTime interval at which
mbatchdsends jobs for scheduling to the scheduling daemonmbschdalong with any collected load information.5 seconds
JOB_SPOOL_DIR
JOB_SPOOL_DIR=dirSpecifies the directory for buffering batch standard output and standard error for a job.
When JOB_SPOOL_DIR is defined, the standard output and standard error for the job is buffered in the specified directory.
Files are copied from the submission host to a temporary file in the directory specified by the JOB_SPOOL_DIR on the execution host. LSF removes these files when the job completes.
If JOB_SPOOL_DIR is not accessible or does not exist, files are spooled to the default job output directory
$HOME/.lsbatch.For
bsub -isandbsub -Zs, JOB_SPOOL_DIR must be readable and writable by the job submission user, and it must be shared by the master host and the submission host. If the specified directory is not accessible or does not exist, and JOB_SPOOL_DIR is specified,bsub -iscannot write to the default directoryLSB_SHAREDIR/cluster_name/lsf_indir, andbsub -Zscannot write to the default directoryLSB_SHAREDIR/cluster_name/lsf_cmddir, and the job will fail.As LSF runs jobs, it creates temporary directories and files under JOB_SPOOL_DIR. By default, LSF removes these directories and files after the job is finished. See
bsub(1) for information about job submission options that specify the disposition of these files.On UNIX, specify an absolute path. For example:
JOB_SPOOL_DIR=/home/share/lsf_spoolOn Windows, specify a UNC path or a path with a drive letter. For example:
JOB_SPOOL_DIR=\\HostA\share\spooldiror
JOB_SPOOL_DIR=D:\share\spooldirIn a mixed UNIX/Windows cluster, specify one path for the UNIX platform and one for the Windows platform. Separate the two paths by a pipe character (|):
JOB_SPOOL_DIR=/usr/share/lsf_spool | \\HostA\share\spooldirJOB_SPOOL_DIR can be any valid path up to a maximum length of 256 characters. This maximum path length includes the temporary directories and files that the LSF system creates as jobs run. The path you specify for JOB_SPOOL_DIR should be as short as possible to avoid exceeding this limit.
Undefined
Batch job output (standard output and standard error) is sent to the
.lsbatchdirectory on the execution host:
- On UNIX:
$HOME/.lsbatch- On Windows:
%windir%\lsbtmpuser_id\.lsbatchIf %HOME% is specified in the user environment, uses that directory instead of %windir% for spooled output.
JOB_TERMINATE_INTERVAL
JOB_TERMINATE_INTERVAL=secondsUNIX only.
Specifies the time interval in seconds between sending SIGINT, SIGTERM, and SIGKILL when terminating a job. When a job is terminated, the job is sent SIGINT, SIGTERM, and SIGKILL in sequence with a sleep time of JOB_TERMINATE_INTERVAL between sending the signals. This allows the job to clean up if necessary.
10
MAX_ACCT_ARCHIVE_FILE
MAX_ACCT_ARCHIVE_FILE =integerEnables automatic deletion of archived LSF accounting log files and specifies the archive limit.
ACCT_ARCHIVE_SIZE or ACCT_ARCHIVE_AGE should also be defined.
MAX_ACCT_ARCHIVE_FILE=10LSF maintains the current
lsb.acctand up to 10 archives. Every time the oldlsb.acct.9becomeslsb.acct.10, the oldlsb.acct.10gets deleted.
- ACCT_ARCHIVE_AGE also enables automatic archiving.
- ACCT_ARCHIVE_SIZE also enables automatic archiving.
- ACCT_ARCHIVE_TIME also enables automatic archiving.
- MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives.
Undefined (no deletion of
lsb.acct.n files).MAX_JOB_ARRAY_SIZE
MAX_JOB_ARRAY_SIZE =integerSpecifies the maximum number of jobs in a job array that can be created by a user for a single job submission. The maximum number of jobs in a job array cannot exceed this value.
A large job array allows a user to submit a large number of jobs to the system with a single job submission.
Specify an integer value from 1 to 65534.
1000
MAX_JOB_ATTA_SIZE
MAX_JOB_ATTA_SIZE=integer |0Specify any number less than 20000.
Maximum attached data size, in KB, that can be transferred to a job.
Maximum size for data attached to a job with the
bpost(1) command. Useful if you usebpost(1) andbread(1) to transfer large data files between jobs and you want to limit the usage in the current working directory.0 indicates that jobs cannot accept attached data files.
Undefined. LSF does not set a maximum size of job attachments.
MAX_JOBID
MAX_JOBID=integerThe job ID limit. The job ID limit is the highest job ID that LSF will ever assign, and also the maximum number of jobs in the system.
By default, LSF assigns job IDs up to 6 digits. This means that no more than 999999 jobs can be in the system at once.
Specify any integer from 999999 to 9999999 (for practical purposes, any seven- digit integer).
You cannot lower the job ID limit, but you can raise it to seven digits. This means you can have more jobs in the system, and the job ID numbers will roll over less often.
LSF assigns job IDs in sequence. When the job ID limit is reached, the count rolls over, so the next job submitted gets job ID "1". If the original job 1 remains in the system, LSF skips that number and assigns job ID "2", or the next available job ID. If you have so many jobs in the system that the low job IDs are still in use when the maximum job ID is assigned, jobs with sequential numbers could have totally different submission times.
By raising the job ID limit, you allow more time for old jobs to leave the system, and make it more likely that numbers can be assigned in sequence without conflicting with existing jobs.
MAX_JOBID
=1234567999999
MAX_JOBINFO_QUERY_PERIOD
MAX_JOBINFO_QUERY_PERIOD =integerMaximum time for job information query commands (e.g., bjobs) to wait.
When the time arrives, the query command processes exit, and all associated threads are terminated.
If the parameter is not defined, query command processes will wait for all threads to finish.
Specify a multiple of MBD_REFRESH_TIME.
Any positive integer greater than or equal to one (1)
Undefined
See lsf.conf under LSB_BLOCK_JOBINFO_TIMEOUT.
MAX_JOB_MSG_NUM
MAX_JOB_MSG_NUM=integer |0Maximum number of message slots for each job. Maximum number of messages that can be posted to a job with the
bpost(1) command.0 indicates that jobs cannot accept external messages.
128
MAX_JOB_NUM
MAX_JOB_NUM=integerThe maximum number of finished jobs whose events are to be stored in the
lsb.eventslog file.Once the limit is reached,
mbatchdstarts a new event log file. The old event log file is saved aslsb.events.n, with subsequent sequence number suffixes incremented by 1 each time a new log file is started. Event logging continues in the newlsb.eventsfile.1000
MAX_PREEXEC_RETRY
MAX_PREEXEC_RETRY=integerMultiCluster job forwarding model only. The maximum number of times to attempt the pre-execution command of a job from a remote cluster.
If the job's pre-execution command fails all attempts, the job is returned to the submission cluster.
MAX_SBD_CONNS
MAX_SBD_CONNS=integerThe maximum number of file descriptors
mbatchdcan have open and connected concurrently tosbatchdControls the maximum number of connections that can maintained to
sbatchds in the system. Many sites require more than 32 connections.The value should not exceed the file descriptor limit of the root (the usual limit is 1024). Setting it equal or larger than this limit can cause
mbatchdto constantly die becausembatchdallocates all file descriptors tosbatchdconnection. This could causembatchdto run out of descriptors, which results in anmbatchdfatal error, such as failure to openlsb.events.Reasonable settings are:
32
MAX_SBD_FAIL
MAX_SBD_FAIL=integerThe maximum number of retries for reaching a non-responding slave batch daemon,
sbatchd.The interval between retries is defined by MBD_SLEEP_TIME. If
mbatchdfails to reach a host and has retried MAX_SBD_FAIL times, the host is considered unavailable. When a host becomes unavailable,mbatchdassumes that all jobs running on that host have exited and that all rerunnable jobs (jobs submitted with thebsub-roption) are scheduled to be rerun on another host.3
MAX_SCHED_STAY
MAX_SCHED_STAY=integerThe time in seconds the
mbatchdhas for scheduling pass.3
MAX_USER_PRIORITY
MAX_USER_PRIORITY=integerEnables user-assigned job priority and specifies the maximum job priority a user can assign to a job.
LSF administrators can assign a job priority higher than the specified value.
User-assigned job priority changes the behavior of
btopandbbot.MAX_USER_PRIORITY=100Specifies that 100 is the maximum job priority that can be specified by a user.
Undefined
MBD_REFRESH_TIME
MBD_REFRESH_TIME=secondsTime interval, in seconds, at which
mbatchdwill fork a new childmbatchdto service query requests to keep information sent back to clients updated. A childmbatchdprocesses query requests creating threads.MBD_REFRESH_TIME applies only to UNIX platforms that support thread programming.
MBD_REFRESH_TIME works in conjunction with LSB_QUERY_PORT in
lsf.conf. The childmbatchdcontinues to listen to the port number specified by LSB_QUERY_PORT and creates threads to service requests until the job changes status, a new job is submitted, or MBD_REFRESH_TIME has expired.
- If MBD_REFRESH_TIME is < 10 seconds, the child
mbatchdexits at MBD_REFRESH_TIME even if the job changes status or a new job is submitted before MBD_REFRESH_TIME expires- If MBD_REFRESH_TIME > 10 seconds, the child
mbatchdexits at 10 seconds even if the job changes status or a new job is submitted before the 10 seconds- If MBD_REFRESH_TIME > 10 seconds and no job changes status or a new job is submitted, the child
mbatchdexits at MBD_REFRESH_TIMEThe value of this parameter must be between 5 and 300. Any values specified out of this range are ignored, and the system default value is applied.
The
bjobscommand may not display up-to-date information if two consecutive query commands are issued before a childmbatchdexpires because childmbatchdjob information is not updated. If you use thebjobscommand and do not get up-to-date information, you may need to decrease the value of this parameter. Note, however, that the lower the value of this parameter, the more you negatively affect performance.The number of concurrent requests is limited by the number of concurrent threads that a process can have. This number varies by platform:
- Sun Solaris, 2500 threads per process
- AIX, 512 threads per process
- Digital, 256 threads per process
- HP-UX, 64 threads per process
5 seconds if not defined or if defined value is less than 5; 300 seconds if defined value is more than 300
MBD_SLEEP_TIME
MBD_SLEEP_TIME=secondsUsed in conjunction with the parameters SLOT_RESERVE, MAX_SBD_FAIL.
Amount of time in seconds used for calculating parameter values.
60
MC_RECLAIM_DELAY
MC_RECLAIM_DELAY =minutesMultiCluster resource leasing model only. The reclaim interval (how often to reconfigure shared leases) in minutes.
Shared leases are defined by
Type=sharedin thelsb.resourcesHostExport section.10
MC_PENDING_REASON_PKG_SIZE
MC_PENDING_REASON_PKG_SIZE =kilobytes |0MultiCluster job forwarding model only. Pending reason update package size, in KB. Defines the maximum amount of pending reason data this cluster will send to submission clusters in one cycle.
Specify the keyword
0(zero) to disable the limit and allow any amount of data in one package.512
MC_PENDING_REASON_UPDATE_INTERVAL
MC_PENDING_REASON_UPDATE_INTERVAL =seconds |0MultiCluster job forwarding model only. Pending reason update interval, in seconds. Defines how often this cluster will update submission clusters about the status of pending MultiCluster jobs.
Specify the keyword
0(zero) to disable pending reason updating between clusters.300
MC_RUSAGE_UPDATE_INTERVAL
MC_RUSAGE_UPDATE_INTERVAL=secondsMultiCluster only. Enables resource use updating for MultiCluster jobs running on hosts in the cluster and specifies how often to send updated information to the submission or consumer cluster.
300
NO_PREEMPT_RUN_TIME
NO_PREEMPT_RUN_TIME =run_timeIf set, jobs have been running for the specified number of minutes or longer will not be preempted. Run time is wall-clock time, not normalized run time.
You must define a run limit for the job, either at job level by
bsub -Woption or in the queue by configuring RUNLIMIT inlsb.queues.NO_PREEMPT_FINISH_TIME
NO_PREEMPT_FINISH_TIME =finish_timeIf set, jobs that will finish within the specified number of minutes will not be preempted. Run time is wall-clock time, not normalized run time.
You must define a run limit for the job, either at job level by
bsub -Woption or in the queue by configuring RUNLIMIT inlsb.queues.NQS_QUEUES_FLAGS
NQS_QUEUES_FLAGS =integerFor Cray NQS compatibility only. Used by LSF to get the NQS queue information.
If the NQS version on a Cray is NQS 1.1, 80.42 or NQS 71.3, this parameter does not need to be defined.
For other versions of NQS on Cray, define both NQS_QUEUES_FLAGS and NQS_REQUESTS_FLAGS.
To determine the value of this parameter, run the NQS
qstatcommand. The value of Npk_int[1] in the output is the value you need for this parameter. Refer to the NQS chapter in Administering Platform LSF for more details.Undefined
NQS_REQUESTS_FLAGS
NQS_REQUESTS_FLAGS=integerFor Cray NQS compatibility only.
If the NQS version on a Cray is NQS 80.42 or NQS 71.3, this parameter does not need to be defined.
If the version is NQS 1.1 on a Cray, set this parameter to 251918848. This is the is the
qstatflag which LSF uses to retrieve requests on Cray in long format.For other versions of NQS on a Cray, run the NQS
qstatcommand. The value ofNpk_int[1]in the output is the value you need for this parameter. Refer to the NQS chapter in Administering Platform LSF for more details.Undefined
PEND_REASON_UPDATE_INTERVAL
PEND_REASON_UPDATE_INTERVAL=secondsTime interval that defines how often pending reasons are calculated by the scheduling daemon
mbschd.30 seconds
PEND_REASON_MAX_JOBS
PEND_REASON_MAX_JOBS=integerNumber of jobs for each user per queue for which pending reasons are calculated by the scheduling daemon
mbschd. Pending reasons are calculated at a time period set by PEND_REASON_UPDATE_INTERVAL.20 jobs
PG_SUSP_IT
PG_SUSP_IT=secondsThe time interval that a host should be interactively idle (it > 0) before jobs suspended because of a threshold on the
pgload index can be resumed.This parameter is used to prevent the case in which a batch job is suspended and resumed too often as it raises the paging rate while running and lowers it while suspended. If you are not concerned with the interference with interactive jobs caused by paging, the value of this parameter may be set to 0.
180 (seconds)
PREEMPTABLE_RESOURCES
PREEMPTABLE_RESOURCES=resource_name...LicenseMaximizer only. Enables license preemption when preemptive scheduling is enabled (has no effect if PREEMPTIVE is not also specified) and specifies the licenses that will be preemption resources. Specify shared numeric resources, static or decreasing, that LSF is configured to release (RELEASE=Y in
lsf.shared, which is the default).You must also configure LSF's preemption action to make the preempted application releases its licenses. To kill preempted jobs instead of suspending them, set TERMINATE_WHEN=PREEMPT in
lsb.queues, or set JOB_CONTROLS inlsb.queuesand specifybrequeueas the SUSPEND action.Undefined (if preemptive scheduling is configured, LSF preempts on job slots only)
PREEMPT_FOR
PREEMPT_FOR=[HOST_JLU|USER_JLP|GROUP_MAX|GROUP_JLP|MINI_JOB|LEAST_RUN_TIME]...If preemptive scheduling is enabled, this parameter can change the behavior of job slot limits and can also enable the optimized preemption mechanism for parallel jobs.
Specify a space-separated list of the following keywords:
- GROUP_MAX--LSF does not count suspended jobs against the total job slot limit for user groups, specified at the user level (MAX_JOBS in
lsb.users); if preemptive scheduling is enabled, suspended jobs never count against the limit for individual users- HOST_JLU--LSF does not count suspended jobs against the total number of jobs for users and user groups, specified at the host level (JL/U in
lsb.hosts)- USER_JLP--LSF does not count suspended jobs against the user-processor job slot limit for individual users, specified at the user level (JL/P in
lsb.users)- GROUP_JLP--LSF does not count suspended jobs against the per-processor job slot limit for user groups, specified at the user level (JL/P in
lsb.users)- MINI_JOB--LSF uses the optimized preemption mechanism for preemption between parallel jobs
- LEAST_RUN_TIME--LSF preempts job with least run time. Run time is wall-clock time, not normalized run time.
Job slot limits specified at the queue level always count suspended jobs.
Undefined. If preemptive scheduling is configured, the default preemption mechanism is used to preempt parallel jobs, and suspended jobs are ignored for the following limits only:
- Total job slot limit for hosts, specified at the host level (MXJ in
lsb.hosts)- Total job slot limit for individual users, specified at the user level (MAX_JOBS in
lsb.users); by default, suspended jobs still count against the limit for user groupsPREEMPTION_WAIT_TIME
PREEMPTION_WAIT_TIME=secondsLicenseMaximizer only. You must also specify PREEMPTABLE_RESOURCES in
lsb.params).The amount of time LSF waits, after preempting jobs, for preemption resources to become available. Specify at least 300 seconds.
If LSF does not get the resources after this time, LSF might preempt more jobs.
300 (5 minutes)
RESOURCE_RESERVE_PER_SLOT
RESOURCE_RESERVE_PER_SLOT=y|YIf Y,
mbatchdreserves resources based on job slots instead of per-host.By default,
mbatchdonly reserves static resources for parallel jobs on a per- host basis. For example, by default, the command:%bsub -n 4 -R "rusage[mem=500]" -q reservation my_jobrequires the job to reserve 500 MB on each host where the job runs.
Some parallel jobs need to reserve resources based on job slots, rather than by host. In this example, if per-slot reservation is enabled by RESOURCE_RESERVE_PER_SLOT, the job
my_jobmust reserve 500 MB of memory for each job slot (4 * 500 = 2 GB) on the host in order to run.If RESOURCE_RESERVE_PER_SLOT is set, the following command reserves the resource
static_resourceon all 4 job slots instead of only 1 on the host where the job runs:bsub -n 4 -R "static_resource > 0 rusage[static_resource=1]" myjobUndefined (reserve resources per-host)
RUN_JOB_FACTOR
RUN_JOB_FACTOR=numberUsed only with fairshare scheduling. Job slots weighting factor.
In the calculation of a user's dynamic share priority, this factor determines the relative importance of the number of job slots reserved and in use by a user.
3.0
RUN_TIME_FACTOR
RUN_TIME_FACTOR=numberUsed only with fairshare scheduling. Run time weighting factor.
In the calculation of a user's dynamic share priority, this factor determines the relative importance of the total run time of a user's running jobs.
0.7
SBD_SLEEP_TIME
SBD_SLEEP_TIME=secondsThe interval at which LSF checks the load conditions of each host, to decide whether jobs on the host must be suspended or resumed.
The job-level resource usage information is updated at a maximum frequency of every SBD_SLEEP_TIME seconds.
The update is done only if the value for the CPU time, resident memory usage, or virtual memory usage has changed by more than 10 percent from the previous update or if a new process or process group has been created.
30
SYSTEM_MAPPING_ACCOUNT
SYSTEM_MAPPING_ACCOUNT=user_accountLSF Windows Workgroup installations only. User account to which all Windows workgroup user accounts are mapped.
Undefined
USER_ADVANCE_RESERVATION
USER_ADVANCE_RESERVATION in
lsb.paramsis obsolete. Use the ResourceReservation section configuration inlsb.resourcesto configure advance reservation policies for your cluster.[ Top ]
SEE ALSO
lsf.conf(5),lsb.params(5),lsb.hosts(5),lsb.users(5),bsub(1)[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: February 24, 2004
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.