Configuration to enable external load indices

To enable the use of external load indices, you must
  • Define the dynamic external resources in lsf.shared. By default, these resources are host-based (local to each host) until the LSF administrator configures a resource-to-host-mapping in the ResourceMap section of lsf.cluster.cluster_name. The presence of the dynamic external resource in lsf.shared and lsf.cluster.cluster_name triggers LSF to start the elim executables.

  • Map the external resources to hosts in your cluster in lsf.cluster.cluster_name.
    Important:

    You must run the command lsadmin reconfig followed by badmin mbdrestart to apply changes.

  • Create one or more elim executables in the directory specified by the parameter LSF_SERVERDIR. LSF does not include a default elim; you should write your own executable to meet the requirements of your site. The section Create an elim executable provides guidelines for writing an elim.

Define a dynamic external resource

To define a dynamic external resource for which elim collects an external load index value, define the following parameters in the Resource section of lsf.shared:

Configuration file

Parameter and syntax

Description

lsf.shared

RESOURCENAME

resource_name

  • Specifies the name of the external resource.

TYPE

Numeric

  • Specifies the type of external resource: Numeric resources have numeric values.

  • Specify Numeric for all dynamic resources.

INTERVAL

seconds

  • Specifies the interval for data collection by an elim.

  • For numeric resources, defining an interval identifies the resource as a dynamic resource with a corresponding external load index.
    Important:

    You must specify an interval: LSF treats a numeric resource with no interval as a static resource and, therefore, does not collect load index values for that resource.

INCREASING

Y | N

  • Specifies whether a larger value indicates a greater load.
    • Y—a larger value indicates a greater load. For example, if you define an external load index for the number of shared software licenses in use, the larger the value, the heavier the load.

    • N—a larger value indicates a lighter load. For example, if you define an external load index for the number of shared software licenses currently available, the larger the value, the lighter the load, and the more licenses are available.

RELEASE

Y | N

  • For shared resources only, specifies whether LSF releases the resource when a job that uses the resource is suspended.
    • Y—Releases the resource when a job is suspended.

    • N—Holds the resource when a job is suspended.

DESCRIPTION

description

  • Brief description of the resource. Enter a description that enables you to easily identify the type and purpose of the resource.

  • The lsinfo command and the ls_info() API call return the contents of the DESCRIPTION parameter.


Map an external resource

Once external resources are defined in lsf.shared, they must be mapped to hosts in the ResourceMap section of lsf.cluster.cluster_name.


Configuration file

Parameter and syntax

Default behavior

lsf.cluster. cluster_name

RESOURCENAMEresource_name

  • Specifies the name of the external resource as defined in the Resource section of lsf.shared.

LOCATION
  • ([all]) | ([all ~host_name])

  • Maps the resource to the master host only; all hosts share a single instance of the dynamic external resource.

  • To prevent specific hosts from accessing the resource, use the not operator (~) and specify one or more host names. All other hosts can access the resource.

  • [default]

  • Maps the resource to all hosts in the cluster; every host has an instance of the dynamic external resource.

  • If you use the default keyword for any external resource, all elim executables in LSF_SERVERDIR run on all hosts in the cluster. For information about how to control which elim executables run on each host, see the section How LSF determines which hosts should run an elim executable.

  • ([host_name]) | ([host_name] [host_name])

  • Maps the resource to one or more specific hosts.

  • To specify sets of hosts that share a dynamic external resource, enclose each set in square brackets ([ ]) and use a space to separate each host name.


Create an elim executable

You can write one or more elim executables. The load index names defined in your elim executables must be the same as the external resource names defined in the lsf.shared configuration file.

All elim executables must
  • Be located in LSF_SERVERDIR and follow these naming conventions:

    Operating system

    Naming convention

    UNIX

    LSF_SERVERDIR\elim.application

    Windows

    LSF_SERVERDIR\elim.application.exe

    or

    LSF_SERVERDIR\elim.application.bat


    Restriction:

    The name elim.user is reserved for backward compatibility. Do not use the name elim.user for your application-specific elim.

    Note:

    LSF invokes any elim that follows this naming convention,—move backup copies out of LSF_SERVERDIR or choose a name that does not follow the convention. For example, use elim_backup instead of elim.backup.

  • Exit upon receipt of a SIGTERM signal from the load information manager (LIM).

  • Periodically output a load update string to stdout in the format number_indices index_name index_value [index_name index_value …] where

    Value

    Defines

    number_indices

    • The number of external load indices collected by the elim.

    index_name

    • The name of the external load index.

    index_value

    • The external load index value returned by your elim.


For example, the string

3 tmp2 47.5 nio 344.0 licenses 5

reports three indices: tmp2, nio, and licenses, with values 47.5, 344.0, and 5, respectively.
    • The load update string must report values between -INFINIT_LOAD and INFINIT_LOAD as defined in the lsf.h header file.

    • The elim should ensure that the entire load update string is written successfully to stdout. Program the elim to exit if it fails to write the load update string to stdout.
      • If the elim executable is a C program, check the return value of printf(3s).

      • If the elim executable is a shell script, check the return code of /bin/echo(1).

    • If the elim executable is implemented as a C program, use setbuf(3) during initialization to send unbuffered output to stdout.

    • Each LIM sends updated load information to the master LIM every 15 seconds; the elim executable should write the load update string at most once every 15 seconds. If the external load index values rarely change, program the elim to report the new values only when a change is detected.

If you map any external resource as default in lsf.cluster.cluster_name, all elim executables in LSF_SERVERDIR run on all hosts in the cluster. If LSF_SERVERDIR contains more than one elim executable, you should include a header that checks whether the elim is programmed to report values for the resources expected on the host. For detailed information about using a checking header, see the section How environment variables determine elim hosts.

Overriding built-in load indices

An elim executable can be used to override the value of a built-in load index. For example, if your site stores temporary files in the /usr/tmp directory, you might want to monitor the amount of space available in that directory. An elim can report the space available in the /usr/tmp directory as the value for the tmp built-in load index. However, the value reported by an elim must less than the maximum size of /usr/tmp.

To override a built-in load index value, you must:
  • Write an elim executable that periodically measures the value of the dynamic external resource and writes the numeric value to standard output. The external load index must correspond to a numeric, dynamic external resource as defined by TYPE and INTERVAL in lsf.shared.

  • Configure an external resource in lsf.shared and map the resource in lsf.cluster.cluster_name, even though you are overriding a built-in load index. Use a name other than the built-in load index, for example, mytmp rather than tmp.

  • Program your elim to output the formal name of the built-in index (for example, r1m, it, ls, or swp), not the resource name alias (cpu, idle, login, or swap). For example, an elim that collects the value of the external resource mytmp reports the value as tmp (the built-in load index) in the load update string: 1 tmp 20.

Setting up an ELIM to support JSDL

To support the use of Job Submission Description Language (JSDL) files at job submission, LSF collects the following load indices:

Attribute name

Attribute type

Resource name

OperatingSystemName

string

osname

OperatingSystemVersion

string

osver

CPUArchitectureName

string

cpuarch

IndividualCPUSpeed

int64

cpuspeed

IndividualNetworkBandwidth

int64

bandwidth

(This is the maximum bandwidth).


The file elim.jsdl is automatically configured to collect these resources. To enable the use of elim.jsdl, uncomment the lines for these resources in the ResourceMap section of the file lsf.cluster.cluster_name.

Example of an elim executable

See the section How environment variables determine elim hosts for an example of a simple elim script.

You can find additional elim examples in the LSF_MISC/examples directory. The elim.c file is an elim written in C. You can modify this example to collect the external load indices required at your site.