Tutorial 2: Request Host Allocation in a Cluster with Synchronous Notifications
This tutorial describes how to create a registered EGO client that requests host allocation in a cluster and starts a container on the host. The client also reads notifications synchronously from the cluster regarding resource changes.
Using this tutorial, you will ...
- Open a connection to Platform EGO
- Print out cluster information
- Check if there are any registered clients connected to Platform EGO
- Log on to Platform EGO
- Register the client with Platform EGO
- Print out allocation and container reply info from a previous connection
- Print out host group information
- Request resource allocation from Platform EGO and print out the allocation ID
- Check for an incoming resource allocation message from Platform EGO on the open connection and print message
- Start a container on Platform EGO and print out the container ID
- Check for registered clients connected to Platform EGO and print out information
Step 1: Preprocessor directives
The first step is to include a reference to the system and API header files. The samples.h header file contains the declaration of methods that are implemented in the samples.
#include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <string.h> #include <time.h> #include "vem.api.h" #include "samples.h"Step 2: Implement the principal method
Lines 4-8: define and initialize a data structure that is used to request a connection with the EGO host cluster. The data structure contains a reference to a configuration file where the master host name and port numbers are stored.
Line 10: pass the data structure as an argument to the vem_open () method, which opens a connection to the master host. If the connection attempt is successful, a handle is returned; otherwise the method returns NULL. The handle acts as a communication channel to the master host and all subsequent communication occurs through this handle.
Lines 18-19: the vem_name_t structure is initialized with NULL. This structure holds the cluster name, system name, and version. The vem_uname () method is passed the communication handle and, if successful, returns a valid vem_name_t structure (defined as clustername); otherwise the method returns NULL.
Line 26: the cluster info is printed out to the screen.
Lines 29-46: define the client info structure. Use vem_locate() to get all registered clients. Since NULL is provided as the client name, all registered clients will be located and the method returns the number of registered clients. Note that Platform EGO is equipped with a number of default clients (services) such as the Service Controller, so as a minimum, the info relevant to these clients is printed out and the associated memory is released.
Lines 47-49: authenticate the user to Platform EGO.
1 int 2 sample2() 3 { 4 vem_openreq_t orequest; 5 vem_handle_t *vhandle = NULL; 6 7 orequest.file = "ego.conf"; // default libvem.conf 8 orequest.flags=0; 9 10 vhandle = vem_open(&orequest); 11 12 if (vhandle == NULL) { 13 // error opening 14 fprintf(stderr, "Error opening cluster: %s\n", vem_strerror(vemerrno)); 15 return -1; 16 } 17 18 vem_name_t *clusterName = NULL; 19 clusterName = vem_uname(vhandle); 20 if (clusterName == NULL) { 21 // error connecting 22 fprintf(stderr, "Error connecting to cluster: %s\n", vem_strerror(vemerrno)); 23 return -2; 24 } 25 26 fprintf(stdout, " Connected... %s %s %4.2f\n", clusterName->clustername, 27 clusterName- >sysname, clusterName->version); 28 29 vem_clientinfo_t *clients; 30 int rc = vem_locate(vhandle, NULL, &clients); 31 if (rc >=0) { 32 if (rc == 0) { 33 printf("No registered clients exist\n"); 34 } else { 35 int i=0; 36 for (i=0; i<rc; i++) { 37 printf("%s %s %s\n", clients[i].name, clients[i].description, 38 clients[i].location); 39 } 40 // free 41 vem_clear_clientinfo(clients); 42 } 43 } else { 44 // error connecting 45 fprintf(stderr, "Error geting clients: %s\n", vem_strerror(vemerrno)); 46 } 47 } if (login(vhandle, username, password)<0) { 48 fprintf(stderr, "Error logon: %s\n", vem_strerror(vemerrno)); 49 }Lines 50-63: define the vem_allocation_info_reply_t and vem_container_info_reply_t structures. If a client gets disconnected and then re-registers, its existing allocations and containers are returned to these structures. If the client had never registered before, the structures would be empty. Define and initialize a structure (rreq) that holds client info for registration purposes. Note that on line 58, the callback member (cb) is set to NULL. This means that it is the client's responsibility to periodically check the open connection via vem_select()/vem_read() to get incoming messages and take action accordingly. Register with Platform EGO via the open connection using vem_register().
Lines 64-67: print out information related to the allocation requests and containers. Once the info is printed out, the memory for the allocations is freed.
Lines 73-79: the method collects the information for the requested hostgroup. In this case, the requested hostgroup in the input argument is set to NULL, which means that information about all hostgroups is requested. If the method call is successful, hostgroup information is printed out to the screen.
50 vem_allocation_info_reply_t aireply; 51 vem_container_info_reply_t cireply; 52 vem_registerreq_t rreq; 53 54 rreq.name = "sample2_client"; 55 rreq.description = "Sample2"; 56 rreq.flags = VEM_REGISTER_TTL; 57 rreq.ttl = 3; 58 rreq.cb = NULL; // would need to read messages explicitly; 59 60 rc = vem_register(vhandle, &rreq, &aireply, &cireply); 61 if (rc < 0) { 62 fprintf(stderr, "Error registering: %s\n", vem_strerror(vemerrno)); 63 } 64 print_vem_allocation_info_reply(&aireply); 65 print_vem_container_info_reply(&cireply); 66 // freeup any previous allocations 67 release_vem_allocation(vhandle, &aireply); 68 69 vem_hostgroupreq_t hgroupreq; 70 hgroupreq.grouplist = NULL; 71 vem_hostgroup_t *hgroup; 72 73 rc = vem_gethostgroupinfo(vhandle, &hgroupreq, &hgroup); 74 if (rc < 0) { 75 fprintf(stderr, "Error getting hostgroup: %s\n", vem_strerror(vemerrno)); 76 } else { 77 printf("%s %s %d %d\n", hgroup->groupName, hgroup->members, hgroup->free, 78 hgroup->allocated); 79 }Lines 80-101: initialize the data structure (vem_allocreq_t) that specifies the allocation request. Method vem_alloc() requests resource allocation using the allocation request info (vem_allocreq_t structure) as one of the input arguments. If the request is successful, the allocation ID is printed out to the screen.
80 vem_allocreq_t areq; 81 areq.name = "Sample2Alloc"; 82 areq.consumer = "/SampleApplications/EclipseSamples"; 83 areq.hgroup = "ComputeHosts"; 84 #ifndef WIN32_RESOURCE 85 areq.resreq = "LINUX86"; 86 #else 87 areq.resreq = "NTX86"; 88 #endif 89 areq.minslots = 1; 90 areq.maxslots = 1; 91 areq.tile = 0; 92 vem_allocation_id_t alocid; 93 vem_allocfreereq_t afree; 94 rc = vem_alloc(vhandle, &areq, &alocid); 95 if (rc < 0) { 96 fprintf(stderr, "Error allocating: %s\n", vem_strerror(vemerrno)); 97 goto bailout; 98 99 } else { 100 printf("allocated: %s\n", alocid); 101 }Lines 102-123: define and initialize a container specification including the setting of its resource limits to default values. The container specification essentially defines a job that the user wants to be executed. The conspec.command method specifies the actual binary that should be executed. In the sample, we want the program "sleep" to be executed. The UNIX sleep command takes the number of seconds to sleep as an input argument.
102 vem_container_spec_t conspec; 103 memset(&conspec, 0, sizeof(vem_container_spec_t)); 104 #ifndef WIN32_RESOURCE 105 conspec.command = "sleep 240"; 106 conspec.execUser = "lsfadmin"; // "egoadmin"; 107 conspec.umask = 0777; 108 conspec.execCwd = "/tmp"; 109 conspec.envC = 0; 110 #else 111 // sleep needs to be installed on the cluster NT hosts 112 // or if ping is available, use something like ping -n xxx 127.0.0.1 > nul 113 conspec.command = "sleep 240"; 114 conspec.execUser = "lsf\\lsfadmin"; //"egouser"; // "lsfadmin"; // "egoadmin"; 115 conspec.umask = 0777; 116 conspec.execCwd = "c:\\"; 117 conspec.envC = 0; 118 #endif 119 int i; 120 for (i=0; i<VEM_RLIM_NLIMITS; i++) { 121 conspec.rlimits[i].rlim_cur = VEM_RLIM_DEFAULT; 122 conspec.rlimits[i].rlim_max = VEM_RLIM_DEFAULT; 123 }Lines 124-130: define and initialize various structures and assign container and allocation IDs.
Lines 132-163: check to see if there is any incoming data on an open connection for up to 60 seconds (configurable timeout). If successful, the message is read from the open connection. A switch statement is used to interpret the message code enumeration and the corresponding message is printed out to the screen. If the message cannot be read, free the memory for the allocation ID.
124 vem_startcontainerreq_t conreq; 125 vem_container_id_t conid = NULL; 126 conreq.allocId = alocid; 127 struct timeval tv; 128 struct vem_message msg; 129 struct vem_allocreply *rep = NULL; 130 struct vem_allocreclaim *reclaim = NULL; 131 132 tv.tv_sec = 60; // 60 seconds timeout 133 rc = vem_select(vhandle, &tv); 134 if(rc < 0) { 135 printf("vem_select error\n"); 136 goto cleanup; 137 } 138 if(rc == 0) { 139 printf("vem_select may have problem, please set longer timeout \n"); 140 goto cleanup; 141 } 142 rc = vem_read(vhandle, &msg); 143 if(rc < 0) { 144 printf("Read message failed\n"); 145 goto cleanup; 146 } 147 switch(msg.code) { 148 case RESOURCE_ADD: 149 rep = (struct vem_allocreply *)msg.content; 150 printf("Got alloc reply for %s %d hosts\n", rep->consumer, rep->nhost); 151 break; 152 case RESOURCE_RECLAIM: 153 reclaim = (struct vem_allocreclaim*)msg.content; 154 printf("vem wants its resources back for allocation %s\n", 155 reclaim->reclaim->consumer); 156 rc = -1; 157 goto cleanup; 158 break; 159 default: 160 printf("unknown message code %d\n", msg.code); 161 goto cleanup; 162 break; 163 } /* switch() */Lines 164-168: get the hostname for the allocation and print it out to the screen. Initialize the workload container request structure (conreq) with the hostname, container name, and the container specification (conspec).
Lines 170-175: start the workload container on the specified host and, if successful, print out the container ID.
Lines 178-193: use vem_locate() to get all registered clients. Since NULL is provided as the client name, all registered clients will be located and the method returns the number of registered clients. Note that Platform EGO is equipped with a number of default clients (services) such as the Service Controller, so as a minimum, the info relevant to these clients is printed out and the associated memory is released. If successful, print out the client info and free the associated memory.
164 char *host = rep->host[0].name; 165 printf("Allocated host: %s\n", host); 166 conreq.hostname = host; 167 conreq.name = "Sample2Container"; 168 conreq.spec = &conspec; 169 170 rc = vem_startcontainer(vhandle, &conreq, &conid); 171 if (rc < 0) { 172 fprintf(stderr, "Error starting container: %s\n", vem_strerror(vemerrno)); 173 goto cleanup; 174 } 175 printf("Started container %s\n", conid); 176 // Currently no way to get container from id. 177 //print_vem_container(vem_container_t *container); 178 rc = vem_locate(vhandle, NULL, &clients); 179 if (rc >=0) { 180 if (rc == 0) { 181 printf("No registered clients exist\n"); 182 } else { 183 int i=0; 184 for (i=0; i<rc; i++) { 185 printf("%s %s %s\n", clients[i].name, clients[i].description, 186 clients[i].location); 187 } 188 vem_clear_clientinfo(clients); 189 } 190 } else { 191 // error connecting 192 fprintf(stderr, "Error geting clients: %s\n", vem_strerror(vemerrno)); 193 } 194 // wait for job to finish 195 #ifdef WIN32 196 Sleep(60000); 197 #else 198 sleep(30); 199 #endif
200 cleanup: 201 afree.allocId = alocid; 202 rc = vem_allocfree(vhandle, &afree); 203 if (rc < 0) { 204 fprintf(stderr, "Error freeing allocation: %s\n", vem_strerror(vemerrno)); 205 } 206 bailout: 207 rc = vem_unregister(vhandle); 208 if (rc < 0) { 209 fprintf(stderr, "Error unregistering: %s\n", vem_strerror(vemerrno)); 210 } 211 if (logout(vhandle)<0) { 212 fprintf(stderr, "Error logoff: %s\n", vem_strerror(vemerrno)); 213 } 214 // free memory 215 vem_free_containerId(conid); 216 //vem_free_containerSpec(&conspec); // crashes 217 218 leave: 219 vem_free_uname(clusterName); 220 vem_close(vhandle); 221 if(host != NULL) 222 free(host); 223 224 return 0; 225 }Step 3: Free all resource allocations
This method iterates through each allocation, as identified by its allocation ID, and frees its memory. Freeing an allocation is the same as cancelling it, i.e., all resources associated with the allocation are released.
void release_vem_allocation(vem_handle_t *vhandle, vem_allocation_info_reply_t *aireply) { int i; for(i=0; i<aireply->nallocation; i++){ // free alocid memory vem_allocfreereq_t afree; afree.allocId = aireply->allocation[i].allocId; int rc = vem_allocfree(vhandle, &afree); if (rc < 0) { fprintf(stderr, "Error freeing allocation: %s\n", vem_strerror(vemerrno)); } } }Step 4: Print allocation info
These three methods iterate through each allocation, printing out the allocation ID, allocation request info, host name, host slots, and a list of host attributes.
void print_vem_allocation_info_reply(vem_allocation_info_reply_t *aireply) { int i; for(i=0; i<aireply->nallocation; i++){ print_vem_allocation(&aireply->allocation[i]); } } void print_vem_allocation(vem_allocation_t *alloc) { printf("AllocId=%s\n", alloc->allocId); print_vem_allocreq(alloc->allocReq); int i, j; for(i=0; i<alloc->nhost; i++){ printf("Name=%s Slots=%d Attributes ", alloc->host[i].name, alloc->host[i].slots); for(j=0; j<alloc->hostattr[i].attrC; j++){ vem_attribute_t *attr = &alloc->hostattr[i].attrV[j]; printf("%s=", attr->name); print_vem_value(&attr->value_t); } printf("\n"); } } void print_vem_allocreq(vem_allocreq_t *allocreq) { printf("AllocReq %s %s %s %s %d %d %d\n", allocreq->name, allocreq->consumer, allocreq->hgroup, allocreq->resreq, allocreq->maxslots, allocreq->minslots, allocreq->flags ); }Step 5: Print container info
These four methods iterate through each container, printing out the container ID, state, and other container-related fields. The print_vem_container_state () and print_vem_container_exit_reason methods () use switch statements to interpret the meaning of the enumeration members.
void print_vem_container_info_reply(vem_container_info_reply_t *cireply) { int i; for(i=0; i<cireply->ncontainer; i++){ print_vem_container(cireply->container); } } void print_vem_container(vem_container_t *container) { printf("Container\n"); printf("Id=%s\nState=", container->id); print_vem_container_state(container->state); printf("\nName=%s\nAllocId=%s\nConsumer=%s Start=%ld, End=%ld\nHost=%s ExitStatus=%d ExitReason=", container->name, container->allocId, container->consumer, container->startTime, container->endTime, container->host, container->exitStatus); print_vem_container_exit_reason(container->exitReason); //TODO add the rest of the fields // print rest } void print_vem_container_state(vem_container_state_t state) { switch(state) { case CONTAINER_NULL: printf(" 0, internal state"); break; case CONTAINER_START: printf(" 1, start"); break; case CONTAINER_RUN: printf(" 2, running"); break; case CONTAINER_SUSPEND: printf(" 3, suspend"); break; case CONTAINER_FINISH: printf(" 4, finish"); break; case CONTAINER_UNKNOWN: printf(" 5, unknown, host unreachable "); break; case CONTAINER_ZOMBIE: printf(" 6, zombie, unknown container is terminated"); break; case CONTAINER_MAX_STATE: printf(" Number of container state"); break; } }
void print_vem_container_exit_reason (vem_container_exit_reason_t rcode) { switch(rcode) { case ER_NULL: printf(" 0, no reason"); break; case ER_SETUP_NO_MEM: printf(" 1, exit bacause of setup fail");break; case ER_SETUP_FORK: printf(" 2, fork fail");break; case ER_SETUP_PGID: printf(" 3, fail to setpgid"); break; case ER_SETUP_ENV: printf(" 4, fail to set env variables");break; case ER_SETUP_LIMIT: printf(" 5, fail to set process limits");break; case ER_SETUP_NO_USER: printf(" 6, user account doesn't exist");break; case ER_SETUP_PATH: printf(" 7, fail to change container cwd");break; case ER_SIG_KILL: printf(" 8, terminated by sigkill");break; case ER_UNKNOWN: printf(" 9, unknown reason ");break; case ER_PEM_UNREACH: printf(" 10, fail to reach pem host");break; case ER_PEM_SYN: printf(" 11, vemkd and pem sync issue");break; case ER_BAD_ALLOC_HOST: printf(" 14, host is not allocated");break; case ER_NOSUCH_CLIENT: printf(" 15, client doesn't exist");break; case ER_START: printf(" 16, container start fails");break; case LAST_EXIT_REASON: printf(" last exit reason ");break; } printf("\n"); }Run the client application
- Select Run > Run.
The Run dialog appears.
- In the Configurations list, either select an EGO C Client Application or click New for a new configuration.
For a new configuration, enter the configuration name.
- Enter the project name and C/C++ Application name.
- Click Apply and then Run.
Sample Output
![]()
[ Top ]
[ Platform Documentation ]
Date Modified: July 12, 2006
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2006 Platform Computing Corporation. All rights reserved.