This is the cluster configuration file. There is one for each cluster, called ego.cluster.cluster_name. The cluster_name suffix is the name of the cluster defined in the Cluster section of ego.shared. All Symphony hosts are listed in this file, along with the list of Symphony administrators and the installed Symphony features.
The ego.cluster.cluster_name file contains configuration information that affects all Symphony applications. It defines cluster administrators, hosts that make up the cluster, attributes of each individual host such as host type or host model, and resources using the names defined in ego.shared.
Time interval, in seconds, that the LIM samples external load index information. If your elim executable is programmed to report values more frequently than every 5 seconds, set the ELIM_POLL_INTERVAL so that it samples information at a corresponding rate.
For host scavenging with FastRelease mode, configure ELIM_POLL_INTERVAL=1 for the the fastest response time.
If a value is defined, security for dynamically adding and removing hosts is enabled, and only hosts with IP addresses within the specified range can be added to or removed from a cluster dynamically.
If there is an error in the configuration of EGO_HOST_ADDR_RANGE (for example, an address range is not in the correct format), no host will be allowed to join the cluster dynamically and an error message will be logged in the LIM log. Address ranges are validated at startup, reconfiguration, or restart, so they must conform to the required format.
If a requesting host belongs to an IP address that falls in the specified range, the host will be accepted to become a dynamic Symphony host.
IP addresses are separated by spaces, and considered "OR" alternatives.
The asterisk (*) character indicates any value is allowed.
The dash (-) character indicates an explicit range of values. For example 1-4 indicates 1,2,3,4 are allowed.
Open ranges such as *-30, or 10-*, are allowed.
For IPv6 addresses, the double colon symbol (::) indicates multiple groups of 16-bits of zeros. You can also use (::) to compress leading and trailing zeros in an address filter, as shown in the following example:
EGO_HOST_ADDR_RANGE=1080::8:800:20fc:*
This definition allows hosts with addresses 1080:0:0:0:8:800:20fc:* (three leading zeros).
You cannot use the double colon (::) more than once within an IP address. You cannot use a zero before or after (::). For example, 1080:0::8:800:20fc:* is not a valid address.
If a range is specified with fewer fields than an IP address such as 10.161, it is considered as 10.161.*.*.
After you configure EGO_HOST_ADDR_RANGE, check the lim.log.host_name file to make sure this parameter is correctly set. If this parameter is not set or is wrong, this will be indicated in the log file.
EGO_HOST_ADDR_RANGE=100-110.34.1-10.4-56
All hosts belonging to a domain with an address having the first number between 100 and 110, then 34, then a number between 1 and 10, then, a number between 4 and 56 will be allowed access. No IPv6 hosts are allowed. Example: 100.34.9.45, 100.34.1.4, 102.34.3.20, etc.
EGO_HOST_ADDR_RANGE=100.172.1.13 100.*.30-54 124.24-*.1.*-34
The host with the address 100.172.1.13 will be allowed access. All hosts belonging to domains starting with 100, then any number, then a range of 30 to 54 will be allowed access. All hosts belonging to domains starting with 124, then from 24 onward, then 1, then from 0 to 34 will be allowed access. No IPv6 hosts are allowed.
EGO_HOST_ADDR_RANGE=12.23.45.*
All hosts belonging to domains starting with 12.23.45 are allowed. No IPv6 hosts are allowed.
The * character can only be used to indicate any value. The format of this example is not correct, and an error will be inserted in the LIM log and no hosts will be able to join the cluster dynamically. No IPv6 hosts are allowed.
EGO_HOST_ADDR_RANGE=100.*43 100.172.1.13
Although one correct address range is specified, because *43 is not correct format, the entire line is considered not valid. An error will be inserted in the LIM log and no hosts will be able to join the cluster dynamically. No IPv6 hosts are allowed.
All client IPv6 hosts with a domain address starting with 3ffe will be allowed access. No IPv4 hosts are allowed.
EGO_HOST_ADDR_RANGE = 3ffe:fffe::88bb:*
Expands to 3ffe:fffe:0:0:0:0:88bb:*.All IPv6 client hosts belonging to domains starting with 3ffe:fffe::88bb:* are allowed. No IPv4 hosts are allowed.
EGO_HOST_ADDR_RANGE = 3ffe-4fff:fffe::88bb:aa-ff 12.23.45.*
All IPv6 client hosts belonging to domains starting with 3ffe up to 4fff, then fffe::88bb, and ending with aa up to ff are allowed. IPv4 client hosts belonging to domains starting with 12.23.45 are allowed.
EGO_HOST_ADDR_RANGE = 3ffe-*:fffe::88bb:*-ff
All IPv6 client hosts belonging to domains starting with 3ffe up to ffff and ending with 0 up to ff are allowed. No IPv4 hosts are allowed.
Time interval, in seconds, at which the LIM daemons exchange load information.
On extremely busy hosts or networks, or in clusters with a large number of hosts, load may interfere with the periodic communication between LIM daemons. Setting EXINTERVAL to a longer interval can reduce network load and slightly improve reliability, at the cost of slower reaction to dynamic load changes.
Note that if you define the time interval as less than 5 seconds, EGO automatically resets it to 5 seconds.
Specifies the Symphony products that the cluster will run (you must also have a license for each product). The list of items is separated by space.
The PRODUCTS parameter is set automatically during Symphony installation to include LSF_Base, which is required to run Symphony. Specify additional product keywords if your cluster is fully licensed for the corresponding products.
For partially licensed products, do not include the product keyword in this parameter, configure the RESOURCES parameter in the Hosts section of this file instead.
You can also specify UNIX user group names, Windows user names, and Windows user group names.To specify a Windows user account or user group, include the domain name in uppercase letters (DOMAIN_NAME\user_name or DOMAIN_NAME\user_group).
The first administrator of the expanded list is considered the primary Symphony administrator. The primary administrator is the owner of the EGO configuration files, as well as the working files under EGO_SHAREDIR/cluster_name. If the primary administrator is changed, make sure the owner of the configuration files and the files under EGO_SHAREDIR/cluster_name are changed as well.
Administrators other than the primary Symphony administrator have the same privileges as the primary Symphony administrator except that they do not have permission to change EGO configuration files. They can perform clusterwide operations on jobs, queues, or hosts in the system.
For flexibility, each cluster may have its own Symphony administrators, identified by a user name, although the same administrators can be responsible for several clusters.
If the specified user or user group is a domain administrator, member of the Power Users group or a group with domain administrative privileges, the specified user or user group must belong to the Symphony user domain.
If the specified user or user group is a user or user group with a lower degree of privileges than outlined in the previous point, the user or user group must belong to the Symphony user domain and be part of the Global Admins group.
If the specified user or user group is not a workgroup administrator, member of the Power Users group, or a group with administrative privileges on each host, the specified user or user group must belong to the Local Admins group on each host.
The Host section is the last section in ego.cluster.cluster_name and is the only required section. It lists all the hosts in the cluster and gives configuration information for each host.
The order in which the hosts are listed in this section is important, because the first host listed becomes the Symphony master host. Since the master LIM makes all placement decisions for the cluster, it should be on a fast machine.
The LIM on the first host listed becomes the master LIM if this host is up; otherwise, the second host becomes the master if it is up, and so on. Also, to avoid the delays involved in switching masters if the first machine goes down, the master should be on a reliable machine. It is desirable to arrange the list such that the first few hosts in the list are always in the same subnet. This avoids a situation where the second host takes over as master when there are communication problems between subnets.
Begin Host
HOSTNAME model type server r1m pg tmp RESOURCES RUNWINDOW
hostA SparcIPC Sparc 1 3.5 15 0 (sunos frame) ()
hostD Sparc10 Sparc 1 3.5 15 0 (sunos) (5:18:30-1:8:30)
hostD ! ! 1 2.0 10 0 () ()
hostE ! ! 1 2.0 10 0 (linux !bigmem) ()
End Host
The name must be defined in the HostModel section of ego.shared. This determines the CPU speed scaling factor applied in load and placement calculations.
Optionally, the ! keyword for the model or type column, indicates that the host model or type is to be automatically detected by the LIM running on the host.
The static Boolean resources and static or dynamic numeric and string resources available on this host.
(fs frame hpux)
Begin Host
HOSTNAME model type server r1m pg tmp RESOURCES RUNWINDOW
...
hostE ! ! 1 2.0 10 0 (linux !bigmem) ()
...
End Host
Square brackets are not valid and the resource name must be alphanumeric.
Begin Host
HOSTNAME model type server r1m mem swp RESOURCES #Keywords
hostA ! ! 1 3.5 () () (mg elimres patchrev=3 owner=user1)
hostB ! ! 1 3.5 () () (specman=5 switch=1 owner=test)
hostC ! ! 1 3.5 () () (switch=2 rack=rack2_2_3 owner=test)
hostD ! ! 1 3.5 () () (switch=1 rack=rack2_2_3 owner=test)
End Host
Host type as defined in the HostType section of ego.shared
The strings used for host types are determined by the system administrator: for example, SUNSOL, DEC, or HPPA. The host type is used to identify binary-compatible hosts.
The host type is used as the default resource requirement. That is, if no resource requirement is specified in a placement request, the task is run on a host of the same type as the sending host.
Often one host type can be used for many machine models. For example, the host type name SUNSOL6 might be used for any computer with a SPARC processor running SunOS 6. This would include many Sun models and quite a few from other vendors as well.
Optionally, the ! keyword for the model or type column, indicates that the host model or type is to be automatically detected by the LIM running on the host.
The LIM uses these thresholds in determining whether to place remote jobs on a host. If one or more Symphony load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy, and LIM will not recommend jobs to that host.
The CPU run queue length threshold values (r15s, r1m, and r15m) are taken as effective queue lengths as reported by egosh resource view.
All of these fields are optional; you only need to configure thresholds for load indices that you wish to use for determining whether hosts are busy. Fields that are not configured are not considered when determining host status. The keywords for the threshold fields are not case sensitive.
The ResourceMap section defines shared resources in your cluster. This section specifies the mapping between shared resources and their sharing hosts. When you define resources in the Resources section of ego.shared, there is no distinction between a shared and non-shared resource. By default, all resources are not shared and are local to each host. By defining the ResourceMap section, you can define resources that are shared by all hosts in the cluster or define resources that are shared by only some of the hosts in the cluster.
This section must appear after the Host section of ego.cluster.cluster_name, because it has a dependency on host names defined in the Host section.
The first line consists of the keywords RESOURCENAME and LOCATION. Subsequent lines describe the hosts that are associated with each configured resource.
Begin ResourceMap
RESOURCENAME LOCATION
verilog (5@[all])
local ([host1 host2] [others])
End ResourceMap
The resource verilog must already be defined in the RESOURCE section of the ego.shared file. It is a static numeric resource shared by all hosts. The value for verilog is 5. The resource local is a numeric shared resource that contains two instances in the cluster. The first instance is shared by two machines, host1 and host2. The second instance is shared by all other hosts.
Defines the hosts that share the resource
For a static resource, you must define an initial value here as well. Do not define a value for a dynamic resource.
instance is a list of host names that share an instance of the resource. The reserved words all, others, and default can be specified for the instance:
all — Indicates that there is only one instance of the resource in the whole cluster and that this resource is shared by all of the hosts
(2@[all ~host3 ~host4])
means that 2 units of the resource are shared by all server hosts in the cluster made up of host1 host2 ... hostn, except for host3 and host4. This is useful if you have a large cluster but only want to exclude a few hosts.
The parentheses are required in the specification. The not operator can only be used with the all keyword. It is not valid with the keywords others and default.
others — Indicates that the rest of the server hosts not explicitly listed in the LOCATION field comprise one instance of the resource
indicates that there are 2 units of the resource on host1 and 4 units of the resource shared by all other hosts.
default — Indicates an instance of a resource on each host in the cluster
This specifies a special case where the resource is in effect not shared and is local to every host. default means at each host. Normally, you should not need to use default, because by default all resources are local to each host. You might want to use ResourceMap for a non-shared static resource if you need to specify different values for the resource on different hosts.
A resource name cannot contain any of the following characters:
A resource name cannot be any of the following reserved names:
To avoid conflict with inf and nan keywords in 3rd-party libraries, resource names should not begin with inf or nan (upper case or lower case). Resource requirment strings, such as -R "infra" or -R "nano" will cause an error. Use -R "defined(infxx)" or -R "defined(nanxx)", to specify these resource names.