Enable fast job dispatch

  1. Log in to the LSF master host as the root user.
  2. Increase the system-wide file descriptor limit of your operating system if you have not already done so.
  3. In lsb.params, set MAX_SBD_CONNS equal to the number of hosts in the cluster plus a buffer.
  4. In lsf.conf, set the parameter LSB_MAX_JOB_DISPATCH_PER_SESSION to a value greater than 300 and less than or equal to one-half the value of MAX_SBD_CONNS.
    For example, for a cluster with 4000 hosts:
    LSB_MAX_JOB_DISPATCH_PER_SESSION = 2050
    MAX_SBD_CONNS=4100
  5. In lsf.conf, define the parameter LSF_SERVER_HOSTS to decrease the load on the master LIM.
  6. In the shell you used to increase the file descriptor limit, shut down the LSF batch daemons on the master host:

    badmin hshutdown

  7. Run badmin mbdrestart to restart the LSF batch daemons on the master host.
  8. Run badmin hrestart all to restart every sbatchd in the cluster:
    Note:

    When you shut down the batch daemons on the master host, all LSF services are temporarily unavailable, but existing jobs are not affected. When mbatchd is later started by sbatchd, its previous status is restored and job scheduling continues.