[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
This chapter provides simple examples that demonstrate how to use LSLIB functions in an application. The function prototypes, as well as data structures that are used by the functions, are described. Many of the examples resemble the implementation of the existing LSF utilities.
- Getting Configuration Information
- Handling Default Resource Requirements
- Getting Dynamic Load Information
- Making a Placement Decision
- Getting Task Resource Requirements
- Using Remote Execution Services
[ Top ]
Getting Configuration Information
One of the services that LSF provides to applications is cluster configuration information. This section describes how to get this service with a C program using LSLIB.
Getting general cluster configuration information
In the previous chapter, a very simple application was introduced that prints the name of the LSF cluster. This section extends that example by printing the current master host name and the defined resource names in the cluster. It uses the following additional LSLIB function calls:
struct lsInfo *ls_info(void) char *ls_getclustername(void) char *ls_getmastername(void)All of these functions return NULL on failure and set
lserrno
to indicate the error.The function
ls_info()
returns a pointer to the lsinfo data structure
(defined in<lsf/lsf.h>
):struct lsInfo { int nRes; Number of resources in the system struct resItem *resTable; A resItem for each resource in the system int nTypes; Number of host types char hostTypes[MAXTYPES][MAXLSFNAMELEN]; Host types int nModels; Number of host models char hostModels[MAXMODELS][AXLSFNAMELEN]; Host models char hostArchs[MAXMODELS][MAXLSFNAMELEN]; Architecture name int modelRefs[MAXMODELS]; Number of hosts of this architecture float cpuFactor[MAXMODELS]; CPU factors of each host model int numIndx; Total number of load indices in resItem int numUsrIndix; Number of user-defined load indices };Within
struct
lsinfo
, the resItem data structure describes the valid resources defined in the LSF cluster:struct resItem { char name[MAXLSFNAMELEN]; The name of the resource char des[MAXRESDESLEN]; The description of the resorce enum valueType valueType; Type of value: BOOLEAN, NUMERIC, STRING, EXTERNAL enum orderType orderType; Order: INCR, DECR, NA int flags; Resource attribute flags #define RESF_BUILTIN 0x01 Built-in vs configured resource #define RESF_DYNAMIC 0x02 Dynamic vs static value #define RESF_GLOBAL 0x04 Resource defined in all clusters #define RESF_SHARED 0x08 Shared resource for some hosts #define RESF_LIC 0x10 License static value #define RESF_EXTERNAL 0x20 External resource defined #define RESF_RELEASE 0x40 Resource can be released when job is suspended int interval; The update interval for a load index, in seconds };The constants
MAXTYPES
,MAXMODELS
, andMAXLSFNAMELEN
are defined in<lsf/lsf.h>
.MAXLSFNAMELEN
is the maximum length of a name in LSF.A host type in LSF refers to a class of hosts that are considered to be compatible from an application point of view. This is entirely configurable, although normally hosts with the same architecture (binary compatible hosts) should be configured to have the same host type.
A host model in LSF refers to a class of hosts with the same CPU performance. The CPU factor of a host model should be configured to reflect the CPU speed of the model relative to other host models in the LSF cluster.
ls_getmastername()
returns a string containing the name of the current master host.
ls_getclustername()
returns a string containing the name of the local load sharing cluster defined in the configuration files.The returned data structure of every LSLIB function is dynamically allocated inside LSLIB. This storage space is automatically freed by LSLIB and re-allocated next time the same LSLIB function is called. An application should never attempt to free the storage returned by LSLIB. If you need to keep this information across calls, make your own copy of the data structure. This applies to all LSLIB function calls.
The following program displays LSF cluster information using the above LSLIB function calls.
#include <stdio.h> #include <lsf/lsf.h> main() { struct lsInfo *lsInfo; char *cluster, *master; int i; /* get the name of the local load sharing cluster */ cluster = ls_getclustername(); if (cluster == NULL) { ls_perror("ls_getclustername"); exit(-1); } printf("My cluster name is <%s>\n", cluster); /* get the name of the current master host */ master = ls_getmastername(); if (master == NULL) { ls_perror("ls_getmastername"); exit(-1); } printf("Master host is <%s>\n", master); /* get the load sharing configuration information */ lsInfo = ls_info(); if (lsInfo == NULL) { ls_perror("ls_info"); exit(-1); } printf("\n%-15.15s %s\n", "RESOURCE_NAME", "DESCRIPTION"); for (i=0; i<lsInfo->nRes; i++) printf("%-15.15s %s\n", lsInfo->resTable[i].name, lsInfo->resTable[i].des) ; exit(0); }The above program will produce output similar to the following:
%a.out
My cluster name is <test_cluster> Master host is <hostA> RESOURCE_NAME DESCRIPTION r15s 15-second CPU run queue length r1m 1-minute CPU run queue length (alias: cpu) r15m 15-minute CPU run queue length ut 1-minute CPU utilization (0.0 to 1.0) pg Paging rate (pages/second) io Disk IO rate (Kbytes/second) ls Number of login sessions (alias: login) it Idle time (minutes) (alias: idle) tmp Disk space in /tmp (Mbytes) swp Available swap space (Mbytes) (alias: swap) mem Available memory (Mbytes) ncpus Number of CPUs ndisks Number of local disks maxmem Maximum memory (Mbytes) maxswp Maximum swap space (Mbytes) maxtmp Maximum /tmp space (Mbytes) cpuf CPU factor rexpri Remote execution priority server LSF server host LSF_Base Base product lsf_base Base product LSF_Manager Standard product lsf_manager Standard product LSF_JobSchedule JobScheduler product lsf_js JobScheduler product LSF_Make Make product lsf_make Make product LSF_Parallel Parallel product lsf_parallel Parallel product LSF_Analyzer Analyzer product lsf_analyzer Analyzer product mips MIPS architecture dec DECStation system sparc SUN SPARC bsd BSD unix sysv System V UNIX hpux HP-UX UNIX aix AIX UNIX irix IRIX UNIX ultrix Ultrix UNIX solaris SUN SOLARIS sun41 SunOS4.1 convex ConvexOS osf1 OSF/1 fs File server cs Compute server frame Hosts with FrameMaker license bigmem Hosts with very big memory diskless Diskless hosts alpha DEC alpha linux LINUX UNIX type Host type model Host model status Host status hname Host nameGetting host configuration information
Host configuration information describes the static attributes of individual hosts in the LSF cluster. Examples of such attributes are host type, host model, number of CPUs, total physical memory, and the special resources associated with the host. These attributes are either read from the LSF configuration file, or determined by the host's LIM on start up.
Host configuration information can be obtained by calling
ls_gethostinfo()
:struct hostInfo *ls_gethostinfo(resreq, numhosts, hostlist, listsize, options)
ls_gethostinfo()
has these parameters:char *resreq; Resource requirements that a host must satisfy int *numhosts; The number of hosts char **hostlist; An array of candidate hosts int listsize; Number of candidate hosts int options; Options, currently only DFT_FROMTYPEOn success,
ls_gethostinfo()
returns an array containing a hostInfo structure for each host. On failure, it returnsNULL
and setslserrno
to indicate the error.The
hostInfo
structure is defined inlsf.h
asstruct hostInfo { char hostName[MAXHOSTNAMELEN]; Host name char *hostType; Host type char *hostModel; Host model float cpuFactor; CPU factor of the host's CPUs int maxCpus; Number of CPUs on the host int maxMem; Size of physical memory on the host in MB int maxSwap; Amount of swap space on the host in MB int maxTmp Size of the /tmp file system on the host in MB int nDisk; Number of disks on the host int nRes; Size of the resources array char **resources; An array of resources configured for the host char *windows; Run windows of the host int numIndx; Size of the busyThreshold array float *busyThreshold; Array of load thresholds for determining if the host is busy char isServer; TRUE if the host is a server, FALSE otherwise char licensed; TRUE if the host has an LSF license, FALSE if it does not int rexPriority; Default priority for remote tasks execution on the host int licFeaturesNeeded; Flag showing available licenses #define LSF_BASE_LIC 0 LSF_Base #define LSF_BATCH_LIC 1 LSF_Manager #define LSF_JS_SCHEDULER_LIC 2 LSF_JobScheduler #define LSF_JS_LIC 3 LSF_JobScheduler_Server #define LSF_CLIENT_LIC 4 LSF_Client #define LSF_MC_LIC 5 LSF_MultiCluster #define LSF_ANALYZER_SERVER_LIC 6 LSF_Analyzer #define LSF_MAKE_LIC 7 LSF_Make #define LSF_PARALLEL_LIC 8 LSF_Parallel #define LSF_FLOAT_CLIENT_LIC 9 LSF_Float_Client #define LSF_NUM_LIC_TYPE 10 Number of license types };
On Solaris, when referencingMAXHOSTNAMELEN
,netdb.h
must be included beforelsf.h
orlsbatch.h
.
NULL
and0
were supplied for the hostlist and listsize parameters of thels_gethostinfo()
call. This causes all LSF hosts meeting resreq to be returned. If a host list parameter is supplied with this call, the selection of hosts will be limited to those belonging to the list.If
resreq
isNULL
, then the default resource requirements will be used. See Handling Default Resource Requirements for details.The values of maxMem and maxCpus (along with maxSwap, maxTmp, and nDisks) are determined when LIM starts on a host. If the host is unavailable, the master LIM supplies a negative value.
The following example shows how to use
ls_gethostinfo()
in a C program. It displays the name, host type, total memory, number of CPUs and special resources for each host that has more than 50MB of total memory.#include <netdb.h> /* Required for Solaris to reference MAXHOSTNAMELEN */ #include <lsf/lsf.h> #include <stdio.h> main() { struct hostInfo *hostinfo; char *resreq; int numhosts = 0; int options = 0; int i, j; /* only hosts with maximum memory larger than 50 Mbytes are of interest */ resreq = "maxmem>50"; /* get information on interested hosts */ hostinfo = ls_gethostinfo(resreq, &numhosts, NULL, 0, options); if (hostinfo == NULL) { ls_perror("ls_gethostinfo"); exit(-1); } /* print out the host names, host types, maximum memory, number of CPUs and number of resources */ printf("There are %d hosts with more than 50MB total memor y \n\n", numhosts); printf("%-11.11s %8.8s %6.6s %6.6s %9.9s\n", "HOST_NAME", "type", "maxMem", "ncpus", "RESOURCES") ; for (i = 0; i < numhosts; i++) { printf("%-11.11s %8.8s", hostinfo[i].hostName, hostinfo[i].hostType); if (hostinfo[i].maxMem > 0) printf("%6dM ", hostinfo[i].maxMem); else /* maxMem info not availabl e for this host*/ printf("%6.6s ", "-"); if (hostinfo[i].maxCpus > 0) printf("%6d ", hostinfo[i].maxCpus); else /* ncpus is not known for this host* / printf("%6.6s", "-"); for (j = 0; j < hostinfo[i].nRes; j++) printf(" %s", hostinfo[i].resources[j]); printf("\n"); } exit(0); }In the above example, resreq defines the resource requirements used to select the hosts. The variables you can use for resource requirements must be the resource names returned from
ls_info()
. You can run thelsinfo
command to obtain a list of valid resource names in your LSF cluster.The above example program produces output similar to the following:
%a.out
There are 4 hosts with more than 50MB total memory HOST_NAME type maxMem ncpus RESOURCES hostA HPPA10 128M 1 hppa hpux cs hostB ALPHA 58M 2 alpha cs hostD ALPHA 72M 4 alpha fddi hostC SUNSOL 54M 1 solaris fddiTo get specific host information use:
char *ls_gethosttype(
hostname)
Returns the type of a specific hostchar *ls_gethostmodel(
hostname)
Returns the model of a specific hostfloat *ls_gethostfactor(
hostname)
Returns the CPU factor of the specified hostManaging hosts
Using LSF Base APIs you can manage hosts in your cluster by:
- Removing hosts from a cluster
- Adding hosts to a cluster
- Locking a host in a cluster
- Unlocking a host in a cluster
To manage the hosts in your cluster you need to be root or the LSF administrator as defined in the file:
LSF_CONFDIR/lsf.cluster.<clustername>
By managing your hosts you can control the placement of jobs and manage your resources more effectively.
Before you remove a host from a cluster, you need to shut down the host's LIM. To shut down a host's LIM, use
ls_limcontrol()
:int ls_limcontrol (char *hostname, int opCode)
ls_limcontrol()
has the following parameters:char *hostname The host's name int opCode Operation code
opCode
describes thels_limcontrol()
operation. To shut down a host's LIM, choose the following operation code:
LIM_CMD_SHUTDOWN
The following code example demonstrates how to shut down a host's LIM using
ls_limcontrol()
:/****************************************************** * LSLIB -- Examples * * ls_limcontrol() * Shuts down or reboots a host's LIM. ******************************************************/
#include <lsf/lsf.h> #include <io.h> #include <stdlib.h> #include <stdio.h> int main(int argc, char ** argv) { int result; /* returned value from ls_limcontrol*/ int opCode; /*option*/ char* host; /*host*/ /* Checking for the correct format */ if (argc !=2) { fprintf(stderr, "usage: sudo %s <host>\n", argv[0]); exit(-1); } host = argv[1]; /* To shut down a host, assign LIM_CMD_SHUTDOWN to the opCode */ opCode = LIM_CMD_SHUTDOWN; printf("Shutting down LIM on host <%s>\n", host); result =ls_limcontrol(host, opCode); /* If there is an Error in execution, the program exits */ if (result == -1) { ls_perror("ls_limcontrol"); exit(-1); } /* Otherwise, indicate successful program execution */ else { printf("host <%s> shutdown successful.\n", host); exit (0); }To use the above example, at the command line type:
sudo ./a.out hostname
where
hostname
is the name of the host you want to move to another cluster.When you return a removed host to a cluster, you need to reboot the host's LIM. When you reboot the LIM, the configuration files are read again and the previous LIM status of the host is lost. To reboot a host's LIM, use
ls_limcontrol()
:int ls_limcontrol (char *hostname, int opCode)
To reboot a host's LIM, choose the following operation code (
opCode)
:
LIM_CMD_REBOOT
The following code example demonstrates how to reboot a host's LIM using
ls_limcontrol()
:/****************************************************** * LSLIB -- Examples * * ls_limcontrol() * Shuts down or reboots a host's LIM. ******************************************************/
#include <lsf/lsf.h> #include <io.h> #include <stdlib.h> #include <stdio.h> int main(int argc, char ** argv) { int result; /* returned value from ls_limcontrol*/ int opCode; /*option*/ char* host; /*host*/ /* Checking for the correct format */ if (argc !=2) { fprintf(stderr, "usage: sudo %s <host>\n", argv[0]); exit(-1); } host = argv[1]; /* To reboot a host, assign LIM_CMD_REBOOT to the opCode */ opCode = LIM_CMD_REBOOT; printf("Restarting LIMon host <%s>\n", host); result =ls_limcontrol(host, opCode); /* If there is an Error in execution, the program exits */ if (result == -1) { ls_perror("ls_limcontrol"); exit(-1); } /* Otherwise, indicate successful program execution */ else { printf("host <%s> has been rebooted. \n", host); } /*Reboot is successful and the program exits */ exit (0); }To use the above example, at the command line type:
sudo ./a.out hostname
where
hostname
is the name of the host you want to return to a cluster.
Locking a host prevents a host from being selected by the master LIM for task or job placement. Locking a host is useful for managing your resources. You can isolate machines in your cluster and apply their resources to particular work. If machine owners want private control over their machines, you can allow this indefinitely or for a period of time that you choose. Hosts can be unlocked automatically or unlocked manually.
To lock a host, use
ls_lockhost()
:int ls_lockhost(time_t duration)
ls_lockhost()
has the following parameter:time_t duration The number of seconds the host is lockedTo lock a host indefinitely, assign 0 seconds to
duration
. To automatically unlock a host, assign a value greater than 0 toduration
and the host will automatically unlock when time has expired.If you try to lock a host that is already locked,
ls_lockhost()
setslserrno
toLSE_LIM_ALOCKED
.The following code example demonstrates how to use
ls_lockhost()
to lock a host:/****************************************************** * LSLIB -- Examples * * ls_lockhost() * Locks the local host for a specified time. ******************************************************/ #include <lsf/lsf.h> #include <time.h> int main(int argc, char ** argv) { /* Declaring variables*/ u_long duration; /* Checking for the correct format */ if (argc !=2) { fprintf(stderr, "usage: sudo %s <duration>\n", argv[0]); exit(-1); } /* assigning the duration of the lockage*/ duration = atoi(argv[1]); /* If an error occurs, exit with an error msg*/ if (ls_lockhost(duration) !=0) { ls_perror("ls_lockhost"); exit(-1); } /* Ifls_lockhost()
is successful, then check to see if duration is > 0. Indicate how long the host is locked if duration is >0 */ if (duration > 0) { printf("Host is locked for %i seconds \n", (int) duration); } else /* Indicate indefinite lock on host */ { printf("Host is locked\n"); } /* successful exit */ exit(0); }Hosts that have been indefinitely locked by assigning the value 0 to the duration parameter of
ls_lockhost()
can only be manually unlocked. To manually unlock a host, usels_unlockhost()
:int ls_unlockhost(void)By unlocking a host, the master LIM can choose the host for task or job placement.
The following code example demonstrates how to use
ls_unlockhost()
to manually unlock a host:/****************************************************** * LSLIB -- Examples * * ls_unlockhost() * Unlocks an indefinitely locked local host. ******************************************************/ #include <lsf/lsf.h> #include <stdlib.h> #include <stdio.h> int main(int argc, char ** argv) { /* Checking for the correct format*/ if (argc !=1) { fprintf(stderr, "usage: sudo %s\n", argv[0]); exit(-1); } /* Call ls_unlockhost(). If an error occurs, print an error msg and exit.*/ if (ls_unlockhost() <0) { ls_perror("ls_lockhost"); exit(-1); } /* Indicate a successful ls_unlockhost() call and exit.*/ printf("Host is unlocked\n"); exit(0); }[ Top ]
Handling Default Resource Requirements
Some LSLIB functions require a resource requirement parameter. This parameter is passed to the master LIM for host selection. It is important to understand how LSF handles default resource requirements. See Administering Platform LSF for further information about resource requirements.
It is desirable for LSF to automatically assume default values for some key requirements if they are not specified by the user.
The default resource requirements depend on the specific application context. For example, the
lsload
command assumes `type==any
order[r15s:pg]
' as the default resource requirements, whilelsrun
assumes `type==local
order[r15s:pg]
' as the default resource requirements. This is because the user usually expectslsload
to show the load on all hosts. Withlsrun
, a task using run on the same host type as the local host, causes the task to be run on the correct host type.LSLIB provides the flexibility for the application programmer to set the default behavior.
LSF default resource requirements contain two parts, a type requirement and an order requirement. A type requirement ensures that the correct type of host is selected. Use an order requirement to order the selected hosts according to some reasonable criteria.
LSF appends a type requirement to the resource requirement string supplied by an application in the following situations:
resreq
isNULL
or an empty string.resreq
does not contain a boolean reasource, for example, `hppa
', and does not contain a type or model resource, for example, `type==solaris
', `model==HP715
'.The default type requirement can be either `
type==any
' or `type==$fromtype
' depending on whether or not the flagDFT_FROMTYPE
is set in the options parameter of the function call.DFT_FROMTYPE
is defined inlsf.h
.If
DFT_FROMTYPE
is set in the options parameter, the default type requirement is `type==$fromtype
'. IfDFT_FROMTYPE
is not set, then the default type requirement is `type==any
'.The value of
fromtype
depends on the function call. If the function has a fromhost parameter, thenfromtype
is the host type of the fromhost. fromhost is the host that submits the task. Otherwise,fromtype
islocal
.LSF also appends an order requirement,
order[r15s:pg]
, to the resource requirement string if an order requirement is not already specified.The table below lists some examples of how LSF appends the default resource requirements.
[ Top ]
Getting Dynamic Load Information
LSLIB provides several functions to obtain dynamic load information about hosts. dynamic load information is updated periodically by the LIM. The
lsInfo
data structure returned by thels_info(3)
API call (see Getting general cluster configuration information for details) stores the definition of all resources. LSF resources are classified into two groups, host-based resources and shared resources. See Administering Platform LSF for more information on host-based and shared resources.Getting dynamic host-based resource information
Dynamic host-based resources are frequently referred to as load indices, consisting of 12 built-in load indices and 256 external load indices which can be collected using an ELIM (see Administering Platform LSF for more information). The built-in load indices report load information about the CPU, memory, disk subsystem, interactive activities, etc. on each host. The external load indices are optionally defined by your LSF administrator to collect additional host-based dynamic load information for your site.
ls_load()
reports information about load indices:struct hostLoad *ls_load(resreq, numhosts, options, fromhost )On success,
ls_load()
returns an array containing a hostLoad structure for each host. On failure, it returnsNULL
and setslserrno
to indicate the error.
ls_load()
has the following parameters:char *resreq; Resource requirements that each host must satisfy int *numhosts; Initially contains the number of hosts requested int options; Option flags that affect the selection of hosts char *fromhost; Used in conjunction with the DFT_FROMTYPE option*numhosts determines how many hosts should be returned. If *numhosts is 0, information is requested on all hosts satisfying
resreq
. If numhosts isNULL
, load information is requested on one host. If numhosts is notNULL
, the number of hostLoad structures returned.The options parameter is constructed from the bitwise inclusive OR of zero or more of the option flags defined in
<lsf/lsf.h>
. The most commonly used flags are:
Exactly *numhosts hosts are desired. If
EXACT
is set, either exactly *numhosts hosts are returned, or the call returns an error. IfEXACT
is not set, then up to *numhosts hosts are returned. If *numhosts is0
, then theEXACT
flag is ignored and as many eligible hosts in the load sharing system (that is, those that satisfy the resource requirement) are returned.
Return only those hosts that are currently in the
ok
state. IfOK_ONLY
is set, hosts that arebusy
,locked
,unlicensed
, orunavail
are not returned. IfOK_ONLY
is not set, then some or all of the hosts whose status are notok
may also be returned, depending on the value of *numhosts and whether theEXACT
flag is set.
Normalize CPU load indices. If
NORMALIZE
is set, then the CPU run queue length load indicesr15s
,r1m
, andr15m
of each returned host are normalized. See Administering Platform LSF for different types of run queue lengths. The default is to return the raw run queue length.If
EFFECTIVE
is set, then the CPU run queue length load indices of each host returned are the effective load. The default is to return the raw run queue length. The optionsEFFECTIVE
andNORMALIZE
are mutually exclusive.Ignore the status of RES when determining the hosts that are considered to be "ok". If IGNORE_RES is specified, then hosts with RES not running are also considered to be "ok" during host selection.
This flag determines the default resource requirements. See Handling Default Resource Requirements for details.
Returns hosts with the same type as the fromhost which satisfy the resource requirements.
The fromhost parameter is used when
DFT_FROMTYPE
is set in options. If fromhost isNULL
, the local host is assumed.ls_load()
returns an array of the following data structure as defined in<lsf/lsf.h>
:struct hostLoad { char hostName[MAXHOSTNAMELEN]; Name of the host int status[2]; The operational and load status of the host float *li; Values for all load indices of this host };The returned hostLoad array is ordered according to the order requirement in the resource requirements. For details about the ordering of hosts, see Administering Platform LSF.
The following example takes no options, and periodically displays the host name, host status, and 1-minute effective CPU run queue length for each Sun SPARC host in the LSF cluster.
/****************************************************** * LSLIB -- Examples * * simload * Displays load information about all Solaris hosts in * the cluster. ******************************************************/
#include <stdio.h> #include <lsf/lsf.h> #include <string.h> #include <stdlib.h> int main() { int i; struct hostLoad *hosts; char *resreq="type==SUNSOL"; int numhosts = 0; int options = 0; char *fromhost = NULL; char field[20] = "*"; /* get load information on specified hosts */ hosts = ls_load(resreq, &numhosts, options, fromhost); if (hosts == NULL) { ls_perror("ls_load"); exit(-1); } /* print out the host name, host status and the 1-minute CPU run queue length */ printf("%-15.15s %6.6s%6.6s\n", "HOST_NAME", "status", "r1m"); for (i = 0; i < numhosts; i++) { printf("%-15.15s ", hosts[i].hostName); if (LS_ISUNAVAIL(hosts[i].status)) printf("%6s", "unavail"); else if (LS_ISBUSY(hosts[i].status)) printf("%6.6s", "busy"); else if (LS_ISLOCKED(hosts[i].status)) printf("%6.6s", "locked"); else printf("%6.6s", "ok"); if (hosts[i].li[R1M] >= INFINIT_LOAD) printf("%6.6s\n", "-"); else { sprintf(field + 1, "%5.1f", hosts[i].li[R1M]); if (LS_ISBUSYON(hosts[i].status, R1M)) printf("%6.6s\n", field); else printf("%6.6s\n", field + 1); } } exit(0); }The output of the above program is similar to the following:
%a.out
HOST_NAME status r1m hostB ok 0.0 hostC ok 1.2 hostA busy 0.6 hostD busy *4.3 hostF unavailIf the host status is
busy
because ofr1m
, then an asterisk (*
) is printed in front of the value of ther1m
load index.In the above example, the returned data structure hostLoad never needs to be freed by the program even if
ls_load()
is called repeatedly.Each element of the li array is a floating point number between 0.0 and
INFINIT_LOAD
(defined inlsf.h
). The index value is set toINFINIT_LOAD
by LSF to indicate an invalid or unknown value for an index.The li array can be indexed using different ways. The constants defined in
lsf.h
(see thels_load(3)
man page) can be used to index any built-in load indices as shown in the above example. If external load indices are to be used, the order in which load indices are returned will be the same as that of the resources returned byls_info()
. The variables numUsrIndx and numIndx in structure lsInfo can be used to determine which resources are load indices. See Advanced Programming Topics for a discussion of more flexible ways to map load index names to values.LSF defines a set of macros in
lsf.h
to test the status field. The most commonly used macros include:LSF macros to test status field
Getting dynamic shared resource information
Unlike host-based resources which are inherent properties contributing to the making of each host, shared resources are shared among a set of hosts. The availability of a shared resource is characterized by having multiple instances, with each instance being shared among a set of hosts.
ls_sharedresourceinfo()
can be used to access shared resource information:struct lsSharedResourceInfo *ls_sharedresourceinfo(resources, numResources, hostname, options)On success,
ls_sharedresourceinfo()
returns an array containing a shared resource information structure (struct
lsSharedResourceInfo
) for each shared resource. On failure,ls_sharedresourceinfo()
returnsNULL
and setslserrno
to indicate the error.
ls_sharedresourceinfo()
has the following parameters:char **resources; NULL terminated array of resource names int *numresources; Number of shared resources int hostName; Host name int options; Options (Currently set to 0)resources is a list (
NULL
terminated array) of shared resource names whose resource information is to be returned. SpecifyNULL
to return resource information for all shared resources defined in the cluster.numresources is an integer specifying the number of resource information structures (LS_SHARED_RESOURCE_INFO_T) to return. Specify 0 to return resource information for all shared resources in the cluster. On success, numresources is assigned the number of LS_SHARED_RESOURCE_INFO_T structures returned.
hostName is the integer name of a host. Specifying hostName indicates that only the shared resource information for the named host is to be returned. Specify
NULL
to return resource information for all shared resources defined in the cluster.options is reserved for future use. Currently, it should be set to 0.
ls_sharedresourceinfo()
returns an array of the following data structure as defined in<lsf/lsf.h>
:typedef struct lsSharedResourceInfo { char *resourceName; Resource name int nInstances; Number of instances LS_SHARED_RESOURCE_INST_T *instances; Pointer to the next instance } LS_SHARED_RESOURCE_INFO_TFor each shared resource, LS_SHARED_RESOURCE_INFO_T encapsulates an array of instances in the instances field. Each instance is represented by the data type LS_SHARED_RESOURCE_INST_T defined in
<lsf/lsf.h>
:typedef struct lsSharedResourceInstance { char *value; Value associated with the instance int nHosts; Number of hosts sharing the instance char **hostList; Hosts associated with the instance } LS_SHARED_RESOURCE_INST_T;The value field of the LS_SHARED_RESOURCE_INST_T structure contains the ASCII representation of the actual value of the resource. The interpretation of the value requires the knowledge of the resource (Boolean, Numeric or String), which can be obtained from the resItem structure accessible through the lsLoad structure returned by
ls_load()
. See Getting general cluster configuration information for details.The following example shows how to use
ls_sharedresourceinfo()
to collect dynamic shared resource information in an LSF cluster. This example displays information from all the dynamic shared resources in the cluster. For each resource, the resource name, instance number, value and locations are displayed.#include <stdio.h> #include <lsf/lsf.h> static struct resItem * getResourceDef(char *); static struct lsInfo * lsInfo; void int main() { struct lsSharedResourceInfo *resLocInfo; int numRes = 0; int i, j, k; lsInfo = ls_info(); if (lsInfo == NULL) { ls_perror("ls_info"); exit(-1); } resLocInfo = ls_sharedresourceinfo (NULL, &numRes, NULL, 0); if (resLocInfo == NULL) { ls_perror("ls_sharedresourceinfo"); exit(-1); } printf("%-11.11s %8.8s %6.6s %14.14s\n", "NAME", "INSTANCE", "VALUE", "LOCATIONS"); for (k = 0; k < numRes; k++) { struct resItem *resDef; resDef = getResourceDef(resLocInfo[k].resourceName); if (! (resDef->flags & RESF_DYNAMIC)) continue; printf("%-11.11s", resLocInfo[k].resourceName); for (i = 0; i < resLocInfo[k].nInstances; i++) { struct lsSharedResourceInstance *instance; if (i == 0) printf(" %8.1d", i+1); else printf(" %19.1d", i+1); instance = &resLocInfo[k].instances[i]; printf(" %6.6s", instance->value); for (j = 0; j < instance->nHosts; j++) if (j == 0) printf(" %14.14s\n", instance- >hostList[j]); else printf(" %41.41s\n", instance- >hostList[j]); } /* for */ } /* for */ } /* main */ static struct resItem * getResourceDef(char *resourceName) { int i; for (i = 0; i < lsInfo->nRes; i++) { if (strcmp(resourceName, lsInfo->resTable[i].name) == 0) return &lsInfo->resTable[i]; } /* Fail to find the matching resource */ fprintf(stderr, "Cannot find resource definition for <%s>\n", resourceName); exit (-1); }The output of the above program is similar to the following:
%a.out
NAME INSTANCE VALUE LOCATIONS dynamic1 1 2 hostA hostC hostD 2 4 hostB hostE dynamic2 1 3 hostA hostENote that the resource dynamic1 has two instances, one contains two resource units shared by
hostA
,hostC
andhostD
and the other contains four resource units shared byhostB
andhostE
. The dynamic2 resource has only one instance with three resource units shared byhostA
andhostE
.For configuration of shared resources, see the ResourceMap section of
lsf.cluster.cluster_name
file in the Platform LSF Reference.[ Top ]
Making a Placement Decision
If you are writing an application that needs to run tasks on the best available hosts, you need to make a placement decision as to which task each host should run.
Placement decisions take the resource requirements of the task into consideration. Every task has a set of resource requirements. These may be static, such as a particular hardware architecture or operating system, or dynamic, such as an amount of swap space for virtual memory.
LSLIB provides services for placement advice. All you have to do is to call the appropriate LSLIB function with appropriate resource requirements.
A placement advice can be obtained by calling either the
ls_load()
function or thels_placereq()
function.ls_load()
returns a placement advice together with load index values.ls_placereq()
returns only the qualified host names. The result list of hosts are ordered by preference, with the first being the best.ls_placereq()
is useful when a simple placement decision would suffice.ls_load()
can be used if the placement advice from LSF must be adjusted by your additional criteria. The LSF utilitieslsrun
,lsmake
,lslogin
, andlstcsh
all usels_placereq()
for placement decision.lsbatch
, on the other hand, usesls_load()
to get an ordered list of qualified hosts, and then makes placement decisions by consideringlsbatch
- specific policies.In order to make optimal placement decisions, it is important that your resource requirements best describe the resource needs of the application. For example, if your task is memory intensive, then your resource requirement string should have `
mem
' in the order segment, `fddi
order[mem:r1m]
'.
ls_placereq()
takes the form of:char **ls_placereq(resreq, num, options, fromhost)On success,
ls_placereq()
returns an array of host names that best meet the resource requirements. Hosts listings may be duplicated for hosts that have sufficient resources to accept multiple tasks (for example, multiprocessors).On failure,
ls_placereq()
returnsNULL
and setslserrno
to indicate the error.The parameters for
ls_placereq()
are very similar to those of thels_load()
function described in the previous section.LSLIB will append default resource requirement to
resreq
according to the rules described in Handling Default Resource Requirements.Preference is given to fromhost over remote hosts that do not have a significantly lighter load or greater resources. This preference avoids unnecessary task transfer and reduces overhead. If fromhost is
NULL
, then the local host is assumed.The following example takes a resource requirement string as an argument and displays the host in the LSF cluster that best satisfies the resource requirement.
#include <stdio.h> #include <lsf/lsf.h> main(argc, argv) int argc; char *argv[]; { char *resreq = argv[1]; char **best; int num = 1; int options = 0; char *fromhost = NULL; /* check the input format */ if (argc != 2 ) { fprintf(stderr, "Usage: %s resreq\n", argv[0]); exit(-2); } /* find the best host with the given condition (e.g. resource requirement) */ best = ls_placereq(resreq, &num, options, fromhost); if (best == NULL) { ls_perror("ls_placereq()"); exit(-1); } printf("The best host is <%s>\n", best[0]); exit(0); }The above program will produce output similar to the following:
%a.out "type==local order[r1m:ls]"
The best host is <hostD>LSLIB also provides a variant of
ls_placereq()
.ls_placeofhosts()
lets you provide a list of candidate hosts. See thels_policy(3)
man page for details.[ Top ]
Getting Task Resource Requirements
Host selection relies on resource requirements. To avoid the need to specify resource requirements each time you execute a task, LSF maintains a list of task names together with their default resource requirements for each user. This information is kept in three task list files: the system-wide defaults, the per-cluster defaults, and the per-user defaults.
A user can put a task name together with its resource requirements into his/her remote task list by running the
lsrtasks
command. Thelsrtasks
command can be used to add, delete, modify, or display a task entry in the task list. For more information on remote task list and an explanation of resource requirement strings, see Administering Platform LSF.
ls_resreq()
gets the resource requirements associated with a task name. Withls_resreq()
, LSF applications or utilities can automatically retrieve the resource requirements of a given task if the user does not explicitly specify it. For example, the LSF utilitylsrun
tries to find the resource requirements of the user-typed command automatically if `-R' option is not specified by the user on the command line.The syntax of
ls_resreq()
is:char *ls_resreq(taskname)If taskname does not appear in the remote task list,
ls_resreq()
returnsNULL
.Typically the resource requirements of a task are then used for host selection purpose. The following program takes the input argument as a task name, get the associated resource requirements from the remote task list, and then supply the resource requirements to a
ls_placereq()
call to get the best host for running this task.#include <stdio.h> #include <lsf/lsf.h> int main(int argc, char *argv[]) { char *taskname = argv[1]; char *resreq; char **best; /* check the input format */ if (argc != 2 ) { fprintf(stderr, "Usage: %s taskname\n", argv[0]); exit(-1); } resreq = ls_resreq(taskname); /* get the resource requirement for the given command */ if (resreq) printf("Resource requirement for %s is \"%s\".\n", taskname, resreq); else printf("Resource requirement for %s is NULL.\n", taskname); /* select the best host with the given resource requirement to run the job */ best = ls_placereq(resreq, NULL, 0, NULL); if (best == NULL) { ls_perror("ls_placereq"); exit(-1); } printf("Best host for %s is <%s>\n", taskname, best[0]); exit(0); }The above program will produce output similar to the following:
%a.out myjob
Resource requirement for myjob is "swp>50 order[cpu:mem]" Best host for myjob is <hostD>[ Top ]
Using Remote Execution Services
Remote execution of interactive tasks in LSF is supported through the Remote Execution Server (RES). The RES listens on a well-known port for service requests. Applications initiate remote execution by making an LSLIB call. For more information on Application and Platform LSF base interactions.
Initializing an application for remote execution
Before executing a task remotely, an application must call the
ls_initrex()
:int ls_initrex(numports, options)On success,
ls_initrex()
initializes the LSLIB for remote execution. If your application is installed as a setuid program,ls_initrex()
returns the number of socket descriptors bound to privileged ports. If your program is not installed as a setuid to root program,ls_initrex()
returns numports on success.On failure,
ls_initrex()
returns -1 and sets the global variablelserrno
to indicate the error.
ls_initrex()
must be called before any other remote execution function (seels_rex(3)
) or any remote file operation function (seels_rfs(3)
) in LSLIB can be called.
ls_initrex()
has the following parameters:int numports; The number of priviliged ports to create int options; Either KEEPUID or 0If your program is installed as a setuid to root program, numports file descriptors, starting from FIRST_RES_SOCK (defined in
<lsf/lsf.h>
), are bound to privileged ports byls_initrex()
. These sockets are used only for remote connections to RES. If numports is 0, then the system will use the default value LSF_DEFAULT_SOCKS defined inlsf.h
.By default,
ls_initrex()
restores the effective user ID to real user ID if the program is installed as a setuid to root program. If options is set toKEEPUID
(defined inlsf.h
),ls_initrex()
preserves the current effective user ID. This option is useful if the application needs to be a setuid to root program for some other purpose as well and does not want to go back to real user ID immediately afterls_initrex()
.
If KEE PUID flag is set in options, you must make sure that your application restores back to the real user ID at a proper time of the program execution.
ls_initrex()
function selects the security option according to the following rule: if the application program invoking it has an effective uid of root, then privileged ports are created. If there are no privileged port created and, at remote task start-up time, RES will use the authentication protocol defined byLSF_AUTH
in thelsf.conf
file.Running a task remotely
The example program below runs a command on one of the best available hosts. It makes use of:
ls_resreq()
described in Getting Task Resource Requirementsls_placereq()
described in Making a Placement Decisionls_initrex()
described in Initializing an application for remote executionand
ls_rexecv()
:int ls_rexecv(host, argv, options)
ls_rexecv()
executes a program on the specified host. It does not return if successful. It returns -1 on failure.
ls_rexecv()
is like a remoteexecvp
. If a connection with the RES on a host has not been established,ls_rexecv()
sets one up. The remote execution environment is set up to be exactly the same as the local one and is cached by the remote RES server.ls_rexecv()
has the following parameters:char *host; The execution host char *argv[]; The command and its arguments int options; See belowThe options argument is constructed from the bitwise inclusive OR of zero or more or the option flags defined in
<lsf/lsf.h>
with names starting with `REXF_'. the group of flags are as follows:Use a remote pseudo terminal as the stdin, stdout, and stderr of the remote task. This option provides a higher degree of terminal I/O transparency. This is needed only when executing interactive screen applications such as
vi
. The use of a pseudo-terminal incurs more overhead and should be used only if necessary. This is the most commonly used flag.Use the local client's current working directory as the current working directory for remote execution.
Request the remote RES to create a task port and return its number to the LSLIB.
Enable shell mode support if the REXF_USEPTY flag is also given. This flag is ignored if REXF_USEPTY is not given. This flag should be specified for submitting interactive shells, or applications which redefine, or applications which redefine the ctrl-C and ctrl- Z keys (e.g.
jove
).LSLIB also provides
ls_rexecve()
to specify the environment to be set up on the remote host.The program follows:
#include <stdio.h> #include <lsf/lsf.h> main(argc, argv) int argc; char *argv[]; { char *command; char *resreq; char **best; int num = 1; /* check the input format */ if (argc < 2 ) { fprintf(stderr, "Usage: %s command [argument ...]\n", argv[0]); exit(-1); } command = argv[1]; /* initialize the remote execution */ if (ls_initrex(1, 0) < 0) { ls_perror("ls_initrex"); exit(-1); } /* get resource requirement for the given command */ resreq = ls_resreq(command); best = ls_placereq(resreq, &num, 0, NULL); if (best == NULL) { ls_perror("ls_placereq()"); exit(-1); } /* start remote execution on the selected host for the job */ printf("<<Execute %s on %s>>\n", command, best[0]); ls_rexecv(best[0], argv + 1, 0); /* if the remote execution is successful, the following lines will not be executed */ ls_perror("ls_rexecv()"); exit(-1); }The output of the above program would be something like:
%a.out myjob
<<Execute myjob on hostD>> (output from myjob goes here ....)Any application that uses LSF's remote execution service must be installed for proper authentication. See Authentication.
The LSF command
lsrun
is implemented using thels_rexecv()
function. After remote task is initiated,lsrun
calls thels_rexecv()
function, which then executes NIOS to handle all input/output to and from the remote task and exits with the same status when remote task exits.See Advanced Programming Topics for an alternative way to start remote tasks.
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: March 13, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.