This tutorial walks you through the sample application code and guides you through the process of building, packaging, deploying, and running the sample client and service.
You will learn the minimum amount of code that you need to create a Symphony application for the IBM Cell Broadband Engine (BE).
This section describes the high-level interaction between a client and a Symphony service. The following diagram shows the message flow between the client and the hosts in a Symphony cluster.
The client opens a session with the session director (not shown) on the management host. Once the client is authenticated, the client communicates directly with the session manager assigned to the application.
The Symphony session manager (SSM), which also runs on the management host, is the workload manager associated with a single application. The session manager routes messages from the client to the compute hosts and from the compute hosts to the clients.The session manager obtains resources to service its sessions and starts/manages service instance manager (SIM) processes on compute hosts.
The service instance manager starts, monitors, and manages a service instance (SI), passing inputs and outputs between the session manager and service instance.
The service code sample, which this tutorial is based on, is designed to be run on a minimum of one Cell blade with two BE processors. The following diagram shows the architecture of a single Cell BE processor with Symphony installed.
The Power Processor Element (PPE) is the main processor in the Cell BE. The PPE is responsible for overall control of the system and runs the operating system for all applications on the Cell BE. The Symphony service runs on the PPE, and individual computational tasks are off-loaded to the SPEs. The PPE then waits for and coordinates the results returning from the SPEs.
The Synergistic Processor Element (SPE) handles the compute-intensive tasks. Each SPE is an independent processor, and is optimized to run SPE threads spawned by the PPE.
In this sample, the client sends a configurable amount of tasks to the Symphony service. The tasks contain the input data for the calculation program that performs a simple addition of two integers on the SPEs. The service, which runs on the PPE, spawns one thread on each SPE that passes the input data to the calculation program. The calculation program is executed on each SPE concurrently. When the programs have completed the work, the results are collected by the service and relayed back to the client.
All the Symphony Service classes and methods are applicable to the PPE. Because of the memory limitation of the SPE, none of the Symphony Service classes and methods are applicable to the SPE. For example, the Message object cannot be passed to the SPE or instantiated in the SPE and none of its methods can be executed in the SPE.
All basic data type objects can be passed between the PPE and SPE. For example, the integer value stored in the Message object can be accessed in the PPE and passed to the SPE. The value can also be passed from the SPE to PPE.
For more information about the scope of Symphony service APIs, refer to Appendix A: Symphony API Summary.
Since there is no standalone Developer Edition for the Cell BE that would allow development and testing on a single host, the Cell BE requires the following components to be installed, as a minimum:
The Symphony package for the management host is dependent on the management host platform. To obtain the Symphony package, go to my.platform.com and select . Download the appropriate package for your host.
For getting started with Symphony, go to my.platform.com and select .
Review the sample application code to learn how you can create a simple synchronous application for the IBM Cell BE.
The sample allows you to enter two integers via command line when running the client. One integer is stored in class MyMessage (MyMessage.h and MyMessage.cpp) and the other is stored in class MyCommonData (MyCommonData.h and MyCommonData.cpp). MyMessage and MyCommonData are passed to the service. On the service side, the two integers are fetched from MyMessage and MyCommonData and passed to the SPE where they are added together. The result is passed back to the PPE and then sent back to the client where it is displayed on the screen.
When you run the client, it opens a session and sends n input messages (tasks) to the service running on the PPE of the Cell BE. The service spawns m threads that run concurrently on each SPE and perform the simple addition. The client application is synchronous so it sends input and blocks the output until all the results are returned.
Your client application needs to handle data that it sends as input, and output data that it receives from the service.
Once your message class is declared, implement handlers for serialization and deserialization.
In MyMessage.cpp, we implement methods to handle the data. For data types that are supported by the Symphony SDK, see the appropriate API reference.
The MyCommonData class, which inherits from the Message class, handles the common data for the client and service. The class declaration and definition are contained in MyCommonData.h and MyCommonData.cpp
Once your common data class is declared, implement handlers for serialization and deserialization.
In MyCommonData.cpp, we implement methods to handle the data. For data types that are supported by the Symphony SDK, see the appropriate API reference.
To add flexibility, the client program is designed to receive up to five arguments for setting parameters or for displaying help, as follows:
In this sample, we initialize the parameters with default values so that the program can run without passing arguments. In cases where arguments are used, a switch block parses the inputs and overwrites the default values.
In SyncClient.cpp, when you initialize, you initialize the Symphony client infrastructure. You initialize once per client.
To send data to be calculated in the form of input messages, you connect to an application.
You specify an application name, a user name, and password. The application name must match that defined in the application profile. The default security callback encapsulates the callback for the user name and password.
A session is a way of logically grouping tasks that are sent to a service for execution. The tasks are sent and received synchronously.
When creating a session, you need to specify the session attributes by using the SessionCreationAttributes object. In this sample, we create a SessionCreationAttributes object called attributes and set four parameters in the object.
The first parameter is the session name. This is optional. The session name can be any descriptive name you want to assign to your session. It is for information purposes, such as in the command-line interface.
The second parameter is the session type. The session type is optional. You can leave this parameter blank and system default values are used for your session.
The third parameter is the session flag, which we specify as ReceiveSync. You must specify it as shown. This indicates to Symphony that this is a synchronous session.
The fourth parameter is the common data value that will be shared among tasks in the session.
We pass the attributes object to the createSession() method, which returns a pointer to the session.
The session type must be the same session type as defined in your application profile.
You define characteristics for the session with the session type in the application profile.
Pass the number of tasks to the fetchTaskOutput() method to retrieve the output messages that were produced by the service. This method blocks until the output for all tasks is retrieved. The return value is an enumeration that contains the completed task results.Iterate through the task results and extract the messages using the populateTaskOutput() method. Display the task ID and the results from the output message.
Any exceptions thrown take the form of SoamException. Catch all Symphony exceptions to know about exceptions that occurred in the client application, service, and middleware.
The calculation code is contained in service_spu.c. This is the program that transfers, via DMA, the data from the PPE to the SPEs for execution. Memory flow controller (MFC) commands are used to transfer the data between the PPE and SPEs. You can see from this code sample that each SPE simply calculates the addition of the task input value and the common data value. from the client.
The Symphony service code provides inputs to calculation code that is executed on the SPEs. To take advantage of the Cell BE architecture, the service creates individual threads that run concurrently on individual SPEs. Each thread has its own context.
This sample uses the following basic algorithm to run multiple SPE contexts:
For a service to be managed by Symphony, it needs to be in a container object. This is the service container.
In SampleService.cpp, we inherited from the ServiceContainer class.
The service is implemented within an executable. At a minimum, we need to create within our main function an instance of the service container and run it.
Load the session common data into memory by implementing onSessionEnter() before the onInvoke() call. When common data is available, Symphony invokes onSessionEnter() once after the service is bound to your session.
Use the populateCommonData() method of the sessionContext object to load the common data.
Symphony calls onInvoke() on the service container once per task. Once you inherit from the ServiceContainer class, implement handlers so that the service can function properly.
To gain access to the data from the client, you must present an instance of the message object to the populateTaskInput() method on the task context. The task context contains all information and functionality that is available to the service during an onInvoke() call in relation to the task that is being processed.
You pass the message object, which comes from the client application, to populateTaskInput(). During this call, the data sent from the client is used to populate the message object. Task context such as the number of SPEs to run per task and the task input value are then loaded into local variables for use by the service code.
Since this service will use n SPEs concurrently, it is necessary for the service to create n threads. Each of these threads will run a single SPE context at a time. The thread runs on the SPE and is responsible for running the calculation program and retrieving the result.
Since we will be running tasks on n SPEs, we need to create an array to hold the parameters of each thread.
The mm_parms structure is instantiated as parms, which is used on the service side for passing messages between the PPE and SPE. The mm_parms structure is declared and defined in params.h.
Members of the mm_params structure include:
taskInput stores the integer that is entered on the command line when running SyncClient with the -i parameter.
commonDataInput stores the integer entered on the command line when running SyncClient with the -c parameter
taskOutput stores the addition result in the SPE, which is passed back to the PPE.
The PPE starts the calculation program by creating a thread on each SPE. The PPE uses the spe_context_create(), spe_program_load(), and spe_context_run() library calls provided in the SPE runtime management library.
The context for the SPE thread contains the persistent data about the SPE. Before being able to use an SPE, the SPE context data structure has to be created and initialized. This is done by calling spe_context_create(), which returns a pointer to the newly created SPE context when it is successfully created.
Before being able to run an SPE context, an SPE program has to be loaded into the context using the spe_program_load() call. You must pass a valid pointer to the SPE context and the address of the SPE program to spe_program_load().
The pthread_create() function creates a new thread of control that executes concurrently with the calling thread. The pthread_create() function requires you to pass a variable that will hold the ID of the newly created thread, the function (ppu_thread_function) that the thread will execute, and the SPE context pointer. The ppu_thread_function receives the SPE context pointer as its sole argument and calls spe_context_run(), which executes the SPE context on a physical SPE. This subroutine causes the current PPE thread to transition to an SPE thread by passing its execution control from the PPE to the SPE whose context it is scheduled to run on.
Wait for all the SPEs to complete the calculations and then return execution control to the PPE. The pthread_join() function suspends execution of the calling thread until all the SPE threads have terminated.
As each SPE thread terminates, destroy the thread context to release the associated resources and free the memory used by the SPE context data structures.
Once the computations are complete, we collect and format the results. When the results are completely assembled, they are added to the output message object. This object is then passed to the setTaskOutput() method, which sends the results to the client.
Symphony uses the number of CPUs to derive the default number of slots. You must configure EGO_DEFINE_NCPUS in $EGO_CONFDIR/ego.conf on the master host to set the correct number of CPUs for the Cell BE host.
To make full use of the SPE, the following two modes are recommended.
Define the number of slots based on the number of CPU cores. Use this mode when you want to effectively share the SPEs among sessions and applications.
To define the number of slots based on CPU cores, set EGO_DEFINE_NCPUS=cores and create one SPE thread for each task.
Since one Cell BE host has two processors and each processor has eight cores, one Cell BE host will have 16 slots. Therefore, up to 16 tasks can run on one Cell BE host concurrently. If one task only creates one SPE thread, the 16 tasks can make full use of the SPEs.
Define the number of slots based on the number of processors. Configuring one slot per multiple SPEs is advantageous if your program is making use of advanced multi-core optimizations to speed up calculations.
To define the number of slots based on processors, set EGO_DEFINE_NCPUS=procs and create eight SPE threads for each task.
Since one Cell BE host has two processors, one Cell BE host will have two slots. Therefore, up to two tasks can run on one Cell BE host concurrently. If one task creates eight SPE threads, the two tasks can make full use of the SPEs.