If the ResourceMap section contains even one resource mapped as default, and if there are multiple elim executables in LSF_SERVERDIR, the MELIM starts all of the elim executables in LSF_SERVERDIR on all hosts in the cluster. Not all of the elim executables continue to run, however. Those that use a checking header could exit with ELIM_ABORT_VALUE if they are not programmed to report values for the resources listed in LSF_RESOURCES.
Restarts an elim if the elim exits. To prevent system-wide problems in case of a fatal error in the elim, the maximum restart frequency is once every 90 seconds. The MELIM does not restart any elim that exits with ELIM_ABORT_VALUE.
Collects the load information reported by the elim executables.
Checks the syntax of load update strings before sending the information to the LIM.
Merges the load reports from each elim and sends the merged load information to the LIM. If there is more than one value reported for a single resource, the MELIM reports the latest value.
Logs its activities and data into the log file LSF_LOGDIR/melim.log.host_name
Increases system reliability by buffering output from multiple elim executables; failure of one elim does not affect other elim executables running on the same host.
If you use the default keyword for any external resource in lsf.cluster.cluster_name, all elim executables in LSF_SERVERDIR run on all hosts in the cluster. You can control the hosts on which your elim executables run by using the environment variables LSF_MASTER, LSF_RESOURCES, and ELIM_ABORT_VALUE. These environment variables provide a way to ensure that elim executables run only when they are programmed to report the values for resources expected on a host.
LSF_RESOURCES—When the LIM starts an MELIM on a host, the LIM checks the resource mapping defined in the ResourceMap section of lsf.cluster.cluster_name. Based on the mapping location (default, all, or a host list), the LIM sets LSF_RESOURCES to the list of resources expected on the host.
When the location of the resource is defined as default, the resource is listed in LSF_RESOURCES on the server hosts. When the location of the resource is defined as all, the resource is only listed in LSF_RESOURCES on the master host.
Use LSF_RESOURCES in a checking header to verify that an elim is programmed to collect values for at least one of the resources listed in LSF_RESOURCES.
ELIM_ABORT_VALUE—An elim should exit with ELIM_ABORT_VALUE if the elim is not programmed to collect values for at least one of the resources listed in LSF_RESOURCES. The MELIM does not restart an elim that exits with ELIM_ABORT_VALUE. The default value is 97.
#!/bin/sh# list the resources that the elim can report to limmy_resource="myrsc"# do the check when $LSF_RESOURCES is defined by limif [ -n "$LSF_RESOURCES" ]; then# check if the resources elim can report are listed in $LSF_RESOURCESres_ok=`echo " $LSF_RESOURCES " | /bin/grep " $my_resource " `# exit with $ELIM_ABORT_VALUE if the elim cannot report on at least# one resource listed in $LSF_RESOURCESif [ "$res_ok" = "" ] ; thenexit $ELIM_ABORT_VALUEfifiwhile [ 1 ];do# set the value for resource "myrsc"val="1"# create an output string in the format:# number_indices index1_name index1_value...reportStr="1 $my_resource $val"echo "$reportStr"# wait for 30 seconds before reporting againsleep 30done