Learn more about Platform products at http://www.platform.com



Tutorial 3: Request Host Allocation in a Cluster with Asynchronous Callback Notifications

This tutorial describes how to create a registered EGO client that requests host allocation in a cluster and starts a container on the host. The sample uses callbacks for notifications from the cluster about resource change and container/host state change.

Using this tutorial, you will ...


Step 1: Preprocessor directives and method declarations

The first step is to include a reference to the system and API header files. The samples.h header file contains the method declarations that are common to all of the samples. In addition, we declare the methods that are specific to this sample.

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include "vem.api.h"
#include "samples.h"

static int addResourceCB(vem_allocreply_t *areply);
static int reclaimForceCB(vem_allocreclaim_t *areclaim);
static int containerStateChgCB(vem_containerstatechg_t *cschange);
static int hostStateChangeCB(vem_hoststatechange_t *hschange);
// holds allocation information
static vem_allocreply_t *allocReply = NULL;
static char *allocated_host_name = NULL;
static int barrier = 0;
static vem_container_id_t jobContainerId = NULL;
static int jobFinished = 0;


Step 2: Implement the principal method

Lines 4-7: define and initialize a data structure that is used to request a connection with the EGO host cluster. The data structure contains a reference to a configuration file where the master host name and port numbers are stored.

Line 8: pass the data structure as an argument to the vem_open () method, which opens a connection to the master host. If the connection attempt is successful, a handle is returned; otherwise the method returns NULL. The handle acts as a communication channel to the master host and all subsequent communication occurs through this handle.

Lines 14-15: the vem_name_t structure (defined as clusterName) is initialized with NULL. This structure holds the cluster name, system name, and version. The vem_uname () method is passed the communication handle and, if successful, returns a valid vem_name_t structure ; otherwise the method returns NULL

Line 21: the cluster info is printed out to the screen.

Lines 22-39: define the client info structure. Use vem_locate() to get all registered clients. Since NULL is provided as the client name, all registered clients will be located and the method returns the number of registered clients. Note that Platform EGO is equipped with a number of default clients (services) such as the Service Controller, so as a minimum, the info relevant to these clients is printed out and the associated memory is released.

1	 int 
2	 sample3()
3	 {
4	 vem_openreq_t orequest;
5	 vem_handle_t *vhandle = NULL;
6	 orequest.file = "ego.conf"; // default libvem.conf
7	 orequest.flags=0;
8	 vhandle = vem_open(&orequest);
9	 if (vhandle == NULL) {
10	 // error opening
11	 fprintf(stderr, "Error opening cluster: %s\n",  vem_strerror(vemerrno));
12	 	 return -1;
13	 }
14	 vem_name_t *clusterName = NULL;
15	 clusterName = vem_uname(vhandle);
16	 if (clusterName == NULL) {
17	 // error connecting
18	 fprintf(stderr, "Error connecting to cluster: %s\n",  vem_strerror(vemerrno));
19	 return -2;
20	 }
21	  fprintf(stdout, " Connected... %s %s %4.2f\n", clusterName->clustername, 
clusterName->sysname, clusterName->version);
22	 vem_clientinfo_t *clients;
23	   int  rc = vem_locate(vhandle, NULL, &clients); 
24	   if (rc >=0) {
25	     if (rc == 0) {
26	    	   printf("No registered clients exist\n");
27	     } else {
28	   	   int i=0;
29	   	   for (i=0; i<rc; i++) {
30	    	     printf("%s %s %s\n", clients[i].name, clients[i].description,
31	    	     clients[i].location);
32	   	   }
33	   	   // free
34	   	   vem_clear_clientinfo(clients);  	   
35	     }
36	   } else {
37	   	 // error connecting
38	    	 fprintf(stderr, "Error geting clients: %s\n",  vem_strerror(vemerrno));
39	   }

Lines 40-42: authenticate the user to Platform EGO.

Lines 43-47: define and initialize a structure for callback methods. These callback methods are invoked by Platform EGO when resources are added or reclaimed, or when a change occurs to host status or a container. When Platform EGO wants to communicate about these events, it invokes these methods thereby calling back to the client.

Lines 48-59: Define the vem_allocation_info_reply_t and vem_container_info_reply_t structures. If a client gets disconnected and then re-registers, its existing allocations and containers are returned to these structures. If the client had never registered before, the structures would be empty. Define and initialize a structure (rreq) that holds client info for registration purposes. (This includes assigning the client callback structure (cbf) to the callback member of the rreq structure; see Step 3: Client callback methods.) Register with Platform EGO via the open connection using vem_register().

40	  if (login(vhandle, username, password)<0) {
41	    	 fprintf(stderr, "Error logon: %s\n",  vem_strerror(vemerrno));
42	   }
43	   vem_clientcallback_t cbf;
44	   cbf.addResource = addResourceCB;
45	   cbf.reclaimForce = reclaimForceCB;
46	   cbf.containerStateChg = containerStateChgCB;
47	   cbf.hostStateChange = hostStateChangeCB;
48	  vem_allocation_info_reply_t aireply;
49	   vem_container_info_reply_t  cireply;
50	   vem_registerreq_t rreq;
51	 rreq.name = "sample3_client";
52	   rreq.description = "Sample3";
53	   rreq.flags = VEM_REGISTER_TTL;
54	   rreq.ttl = 3;
55	   rreq.cb = &cbf; // NULL, would need to read messages explicitly;
56	  rc = vem_register(vhandle, &rreq, &aireply, &cireply);
57	   if (rc < 0) {
58	     	 fprintf(stderr, "Error registering: %s\n",  vem_strerror(vemerrno));
59	  }

Lines 60-63: print out information related to the allocation requests and containers. Once the info is printed out, the memory for the allocations is freed.

Lines 65-75: the vem_gethostgroupinfo() method collects the information for the requested hostgroup. In this case, the requested hostgroup in the input argument is set to NULL, which means that information about all hostgroups is requested. If the method call is successful, hostgroup information is printed out to the screen.

Lines 76-96: initialize the data structure (vem_allocreq_t) that specifies the allocation request. vem_alloc() requests resource allocation using the allocation request info (vem_allocreq structure) as one of the input arguments. If the request is successful, the allocation ID is printed out to the screen.

60	 print_vem_allocation_info_reply(&aireply);
61	   print_vem_container_info_reply(&cireply);
62	   // freeup any previous allocations
63	   release_vem_allocation(vhandle, &aireply); 
64	   
65	   vem_hostgroupreq_t hgroupreq;
66	   hgroupreq.grouplist = NULL;
67	   vem_hostgroup_t *hgroup;
68	   rc = vem_gethostgroupinfo(vhandle, &hgroupreq, &hgroup); 
69	   if (rc < 0) {
70	     fprintf(stderr, "Error getting hostgroup: %s\n", 
71	  vem_strerror(vemerrno));  	 
72	   } else {
73	   	 printf("%s %s %d %d\n", hgroup->groupName, hgroup->members, hgroup->free,
74	  hgroup->allocated);
75	   }
76	   vem_allocreq_t areq;
77	   areq.name = "Sample2Alloc";
78	   areq.consumer = "/SampleApplications/EclipseSamples"; 
79	   areq.hgroup = "ComputeHosts";
80	 #ifndef WIN32_RESOURCE
81	   areq.resreq = "LINUX86";
82	 #else
83	   areq.resreq = "NTX86";
84	 #endif
85	   areq.minslots = 1;
86	   areq.maxslots = 1;
87	   areq.flags = VEM_ALLOC_EXCLUSIVE;
88	   vem_allocation_id_t alocid;
89	   vem_allocfreereq_t afree;
90	   rc = vem_alloc(vhandle, &areq, &alocid);
91	   if (rc < 0) {
92	     	 fprintf(stderr, "Error allocating: %s\n",  vem_strerror(vemerrno));  	 
93	     	 goto bailout;
94	   } else {
95	     printf("allocated: %s\n",  alocid);  	 
96	   }

Lines 97-121: define and initialize a container specification including the setting of its resource limits to default values. The container specification essentially defines a job that the user wants to be executed. The conspec.command method specifies the actual binary that should be executed. In the sample, we want the program "sleep" to be executed. The UNIX sleep command takes the number of seconds to sleep as an input argument.

Lines 122-124: define and initialize various structures and assign container and allocation IDs.

Lines 126-135: a while loop suspends program execution until a hostname for the allocation is found. The barrier variable is set when the notification from Platform EGO arrives after which it can proceed to run a container on the allocated resource.. The hostname is printed out.

Lines 136-138: initialize the workload container request structure (conreq) with the hostname, container name, and the container specification (conspec).

97	 vem_container_spec_t conspec;
98	   memset(&conspec, 0, sizeof(vem_container_spec_t));
99	 
100	 #ifndef WIN32_RESOURCE
101	   conspec.command = "sleep 120";
102	   conspec.execUser = "lsfadmin"; // "egoadmin";
103	   conspec.umask = 0777;
104	   conspec.execCwd = "/tmp";
105	   conspec.envC = 0;
106	 #else
107	   // sleep needs to be installed on the cluster NT hosts 
108	   // or if ping is available, use something like ping -n xxx 127.0.0.1 > nul
109	   conspec.command = "sleep 120";
110	   conspec.execUser = "lsf\\lsfadmin"; //"egouser"; // "lsfadmin"; //
111	  "egoadmin";
112	   conspec.umask = 0777;
113	   conspec.execCwd = "c:\\";
114	   conspec.envC = 0;
115	 #endif
116	 
117	   int i;
118	   for (i=0; i<VEM_RLIM_NLIMITS; i++) {
119	   	 conspec.rlimits[i].rlim_cur = VEM_RLIM_DEFAULT;
120	     conspec.rlimits[i].rlim_max = VEM_RLIM_DEFAULT;
121	   }
122	   vem_startcontainerreq_t conreq;
123	   vem_container_id_t      conid = NULL;
124	   conreq.allocId = alocid;
125	   // find the hostname for allocation from the CB fn
126	   while (barrier == 0) {
127	   	 // wait until we have a host allocated
128	    sleep(1);
129	   }
130	   if (allocReply == NULL || allocReply->nhost ==0) {
131	   	  fprintf(stderr, "Error allocating host: %s\n",  vem_strerror(vemerrno));  
132	 	 	 	     goto cleanup;
133	   }
134	   char *host = allocated_host_name;
135	   printf("Allocated host: %s\n",  host);  	 
136	   conreq.hostname = host; // allocReply->host[0].name;
137	   conreq.name = "Sample2Container";
138	   conreq.spec = &conspec;

Lines 139-146: start the workload container on the specified host and, if successful, print out the container ID.

Lines 147-168: use vem_locate() to get all registered clients. Since NULL is provided as the client name, all registered clients will be located and the method returns the number of registered clients. If successful, print out the client info and free the associated memory.

139	 rc = vem_startcontainer(vhandle, &conreq, &conid);
140	   if (rc < 0) { 	 fprintf(stderr, "Error starting container: %s\n", 
141	  vem_strerror(vemerrno));
142	     	 jobContainerId = "INVALID";
143	     	 goto cleanup;
144	   }
145	   jobContainerId = conid;
146	   printf("Started container %s\n", conid);
147	 rc = vem_locate(vhandle, NULL, &clients); 
148	   if (rc >=0) {
149	     if (rc == 0) {
150	    	   printf("No registered clients exist\n");
151	     } else {
152	   	   int i=0;
153	   	   for (i=0; i<rc; i++) {
154	    	     printf("%s %s %s\n", clients[i].name, clients[i].description,
155	    	     clients[i].location);
156	   	   }
157	       vem_clear_clientinfo(clients);  
158	     }
159	   } else {
160	   	 // error connecting
161	    	 fprintf(stderr, "Error geting clients: %s\n",  vem_strerror(vemerrno));
162	   }
163	 // wait for job to be finished
164	   while (!jobFinished) {
165	   	 //wait
166	   	 sleep(10);
167	   }
168	   vem_free_containerId(conid);


Step 3: Client callback methods

These callback methods are invoked by Platform EGO when resources are added or reclaimed, or when a change occurs to host status or a container. When Platform EGO wants to communicate about these events, it invokes these methods thereby calling back to the client.

Lines 169-179: this method is called by Platform EGO when resources have been added to an allocation in order to tell the client which resources have been provided for its use. This method prints out the allocation and consumer IDs, the number of hosts allocated, host names and number of slots, and host attributes.

Lines 180-186: this method is called by Platform EGO when resources need to be reclaimed. Resources may be reclaimed either for policy reasons, or because a resource has been found to be down or unavailable. The method prints out the host info including host name and slots for each host being reclaimed.

Lines 187-200: this method is called by Platform EGO in order to communicate status changes in containers to the clients that started them. The method prints out the container ID and its associated state; the container state is enumerated in the vem.common.h file.

Lines 201-207: this method is called by Platform EGO when a host changes state. The method prints out the host name and its new host state.

169	 int 
170	 addResourceCB(vem_allocreply_t *areply)
171	 {
172	 	 printf("addResource Call Back\n");
173	 	 allocReply = areply;
174	 	 allocated_host_name = malloc(strlen(allocReply->host[0].name));
175	 	 strcpy(allocated_host_name, allocReply->host[0].name);
176	 	 barrier = 1;
177	     print_vem_allocreply(areply);
178	     return 0;
179	 }
180	 int 
181	 reclaimForceCB(vem_allocreclaim_t *areclaim)
182	 {
183	 	 printf("reclaimForce Call Back\n");
184	 	 print_vem_allocreclaim(areclaim);
185	 	 return 0;
186	 }
187	 int 
188	 containerStateChgCB(vem_containerstatechg_t *cschange)
189	 {
190	 	 printf("containerStateChg Call Back\n");
191	 	 printf("%s %d\n", cschange->containerId, cschange->newState);
192	  	 while(jobContainerId == NULL) {sleep(1);} // wait until container has been
193	  created
194	     if(jobContainerId && !strcmp(cschange->containerId, jobContainerId)) {
195	     	 if(cschange->newState == CONTAINER_FINISH) {
196	     	 	 jobFinished = 1;
197	     	 }
198	     }
199	 	 return 0;
200	 }
201	 int 
202	 hostStateChangeCB(vem_hoststatechange_t *hschange)
203	 {
204	 	 printf("hostStateChange Call Back\n");
205	 	 printf("%s %d\n", hschange->name, hschange->newState);
206	     return 0;
207	 }


Run the client application

  1. Select Run > Run.

    The Run dialog appears.

  2. In the Configurations list, either select an EGO C Client Application or click New for a new configuration.

    For a new configuration, enter the configuration name.

  3. Enter the project name and C/C++ Application name.
  4. Click Apply and then Run.

Sample Output

[ Top ]


[ Platform Documentation ]


      Date Modified: July 12, 2006
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2006 Platform Computing Corporation. All rights reserved.