The maximum number of retries for reaching a non-responding slave batch daemon, sbatchd.
The interval between retries is defined by MBD_SLEEP_TIME. If mbatchd fails to reach a host and has retried MAX_SBD_FAIL times, the host is considered unreachable.
If you define LSB_SYNC_HOST_STAT_LIM=Y, mbatchd obtains the host status from the master LIM before it polls sbatchd. When the master LIM reports that a host is unavailable (LIM is down) or unreachable (sbatchd is down) MAX_SBD_FAIL number of times, mbatchd reports the host status as unavailable or unreachable.
When a host becomes unavailable, mbatchd assumes that all jobs running on that host have exited and that all rerunnable jobs (jobs submitted with the bsub -r option) are scheduled to be rerun on another host.