ego.shared

The ego.shared file contains common definitions that are shared by Symphony clusters defined by ego.cluster.cluster_name files. This includes lists of cluster names, host types, host models, the special resources available, and external load indices.

This file is installed by default in the directory defined by EGO_CONFDIR.

Changing ego.shared configuration

After making any changes to ego.shared, run the following command:

egosh ego restart

Cluster section

(Required) Lists the cluster names recognized by Symphony.

Cluster section structure

The first line must contain the mandatory keyword ClusterName. The other keyword is optional.

Each subsequent line defines one cluster.

Example Cluster section

Begin Cluster
ClusterName  # Keyword
cluster1     
End Cluster

ClusterName

Defines all cluster names recognized by Symphony.

The cluster name referenced anywhere by Symphony must be defined here. The file names of cluster-specific configuration files must end with the associated cluster name.

HostType section

(Required) Lists the valid host types in the cluster. All hosts that can run the same binary executable are in the same host type.

HostType section structure

The first line consists of the mandatory keyword TYPENAME.

Subsequent lines name valid host types.

Example HostType section

Begin HostType
TYPENAME
SOL64
SOLSPARC
LINUX86LINUXPPC
LINUX64
NTX86
NTX64
NTIA64
End HostType

TYPENAME

Host type names are usually based on a combination of the hardware name and operating system. If your site already has a system for naming host types, you can use the same names for Symphony.

HostModel section

(Required) Lists models of machines and gives the relative CPU scaling factor for each model. All hosts of the same relative speed are assigned the same host model.

Symphony uses the relative CPU scaling factor to normalize the CPU load indices so that tasks are more likely to be sent to faster hosts. The CPU factor affects the calculation of task execution time limits and accounting. Using large or inaccurate values for the CPU factor can cause confusing results when CPU time limits or accounting are used.

HostModel section structure

The first line consists of the mandatory keywords MODELNAME, CPUFACTOR, and ARCHITECTURE.

Subsequent lines define a model and its CPU factor.

Example HostModel section

Begin HostModel MODELNAME  CPUFACTOR     ARCHITECTURE
PC400        13.0        (i86pc_400 i686_400)
PC450        13.2        (i86pc_450 i686_450)
Sparc5F       3.0        (SUNWSPARCstation5_170_sparc)
Sparc20       4.7        (SUNWSPARCstation20_151_sparc)
Ultra5S      10.3        (SUNWUltra5_270_sparcv9 SUNWUltra510_270_sparcv9)
End HostModel

ARCHITECTURE

(Reserved for system use only) Indicates automatically detected host models that correspond to the model names.

CPUFACTOR

Though it is not required, you would typically assign a CPU factor of 1.0 to the slowest machine model in your system and higher numbers for the others. For example, for a machine model that executes at twice the speed of your slowest model, a factor of 2.0 should be assigned.

MODELNAME

Generally, you need to identify the distinct host types in your system, such as MIPS and SPARC first, and then the machine models within each, such as SparcIPC, Sparc1, Sparc2, and Sparc10.

About automatically detected host models and types

When you first install Symphony, you do not necessarily need to assign models and types to hosts in ego.cluster.cluster_name. If you do not assign models and types to hosts in ego.cluster.cluster_name, LIM automatically detects the model and type for the host.

Automatic detection of host model and type is useful because you no longer need to make changes in the configuration files when you upgrade the operating system or hardware of a host and reconfigure the cluster. Symphony will automatically detect the change.

Mapping to CPU factors

Automatically detected models are mapped to the short model names in lsf.shared in the ARCHITECTURE column. Model strings in the ARCHITECTURE column are only used for mapping to the short model names.

Example ego.shared file:
Begin HostModel
MODELNAME   CPUFACTOR     ARCHITECTURE
SparcU5     5.0           (SUNWUltra510_270_sparcv9)
PC486       2.0           (i486_33 i486_66)
PowerPC     3.0           (PowerPC12 PowerPC16 PowerPC31)
End HostModel

If an automatically detected host model cannot be matched with the short model name, it is matched to the best partial match and a warning message is generated.

If a host model cannot be detected or is not supported, it is assigned the DEFAULT model name and an error message is generated.

Naming convention

Models that are automatically detected are named according to the following convention:
hardware_platform [_processor_speed[_processor_type]]
where:
  • hardware_platform is the only mandatory component

  • processor_speed is the optional clock speed and is used to differentiate computers within a single platform

  • processor_type is the optional processor manufacturer used to differentiate processors with the same speed

  • Underscores (_) between hardware_platform, processor_speed, processor_type are mandatory.

Resource section

Optional. Defines resources (must be done by the cluster administrator).

Resource section structure

The first line consists of the keywords. RESOURCENAME and DESCRIPTION are mandatory. The other keywords are optional. Subsequent lines define resources.

Example Resource section

Begin Resource
RESOURCENAME  TYPE    INTERVAL INCREASING  DESCRIPTION        # Keywords
   fs         Boolean ()       ()          (File server)
   cs         Boolean ()       ()          (Compute server)
   frame      Boolean ()       ()          (Hosts with FrameMaker licence)
   bigmem     Boolean ()       ()          (Hosts with very big memory)
   diskless   Boolean ()       ()          (Diskless hosts)
   linux      Boolean ()       ()          (LINUX UNIX)
   nt         Boolean ()       ()          (Windows NT)
   mg         Boolean ()       ()          (Management hosts)
   scode      Numeric 5        Y           (Host scavenging code)
   scvg       Boolean ()       ()          (Resource tag identifying scavenge-capable 
                                           hosts)
   agent_control
              String  5        ()          (Host scavenging flag)
   cit        Numeric 5        N           (Amount of time in minutes that a CPU has 
                                            been idle)
   uit_t      Numeric 5        Y           (Idle time threshold, in minutes)
   cu_t       Numeric 5        Y           (Adjusted CPU utilization threshold, as a
                                           percentage)
   cit_t      Numeric 5        Y           (CPU idle time threshold, in minutes)
   define_ncpus_procs   
              Boolean ()       ()          (ncpus := procs)
   define_ncpus_cores
              Boolean ()       ()          (ncpus := cores)
   define_ncpus_threads 
              Boolean ()       ()          (ncpus := threads)
   svrscvg    Boolean ()       ()          (Resource tag identifying server scavenge
                                           capable hosts)
   vmscvg     Boolean ()       ()          (Resource tag identifying harvesting scavenge
                                           capable hosts)
   acu        Numeric 5        Y           (Adjusted CPU utilization which not include 
                                           CPU usage of symphony and exempt process 
                                           list, as a percentage)   
   exempt_process
              String  5        ()          (process list which will be excluded for
                                           calculating CPU usage)
   close_process
              String  5        ()          (process list which will trigger host close
                                           or not open)
End Resource

RESOURCENAME

The name you assign to the new resource. An arbitrary character string.
  • A resource name cannot begin with a number.

  • A resource name cannot contain any of the following characters:

    :  .  (  )  [  +  - *  /  !  &  | <  >  @  =
  • A resource name cannot be any of the following reserved names:

    cpu cpuf io logins ls idle maxmem maxswp maxtmp type model status it 
    mem ncpus define_ncpus_cores define_ncpus_procs 
    define_ncpus_threads ndisks pg r15m r15s r1m swap swp tmp ut
  • To avoid conflict with inf and nan keywords in 3rd-party libraries, resource names should not begin with inf or nan (upper case or lower case). Resource requirment strings, such as -R "infra" or -R "nano" will cause an error. Use -R "defined(infxx)" or -R "defined(nanxx)", to specify these resource names.

  • Resource names are case sensitive

  • Resource names can be up to 39 characters in length

  • For Solaris machines, the keyword int is reserved and cannot be used.

TYPE

The type of resource:
  • Boolean—Resources that have a value of 1 on hosts that have the resource and 0 otherwise.

  • Numeric—Resources that take numerical values, such as all the load indices, number of processors on a host, or host CPU factor.

  • String— Resources that take string values, such as host type, host model, host status.

Default

If TYPE is not given, the default type is Boolean.

INTERVAL

Optional. Applies to dynamic resources only.

Defines the time interval (in seconds) at which the resource is sampled by the ELIM.

If INTERVAL is defined for a numeric resource, it becomes an external load index.

Default

If INTERVAL is not given, the resource is considered static.

INCREASING

Applies to numeric resources only.

If a larger value means greater load, INCREASING should be defined as Y. If a smaller value means greater load, INCREASING should be defined as N.

DESCRIPTION

Brief description of the resource.