Runtime options affecting parallel processing can be specified with the XLSMPOPTS environment variable. This environment variable must be set before you run an application, and uses basic syntax of the form:
.-:-------------------------------------------. V | >>-XLSMPOPTS-- = -+---+----runtime_option_name-- = ---option_setting---+--+---+->< '-"-' '-"-'
You can specify option names and settings in uppercase or lowercase. You can add blanks before and after the colons and equal signs to improve readability. However, if the XLSMPOPTS option string contains imbedded blanks, you must enclose the entire option string in double quotation marks (").
XLSMPOPTS=PARTHDS=4:SCHEDULE=DYNAMIC=5
The following are the available runtime option settings for the XLSMPOPTS environment variable:
When a thread becomes free, it takes the next chunk from its initially assigned partition. If there are no more chunks in that partition, then the thread takes the next available chunk from a partition initially assigned to another thread.
The work in a partition initially assigned to a sleeping thread will be completed by threads that are active.
The affinity scheduling type does not appear in the OpenMP API standard.
Active threads are assigned these chunks on a "first-come, first-do" basis. Chunks of the remaining work are assigned to available threads until all work has been assigned.
If a thread is asleep, its assigned work will be taken over by an active thread once that thread becomes available.
Active threads are assigned chunks on a "first-come, first-do" basis. The first chunk contains ceiling(number_of_iterations/number_of_threads) iterations. Subsequent chunks consist of ceiling(number_of_iterations_left / number_of_threads) iterations.
If n is not specified, the chunks will contain ceiling(number_of_iterations/number_of_threads) iterations. Each thread is assigned one of these chunks. This is known as block scheduling.
If a thread is asleep and it has been assigned work, it will be awakened so that it may complete its work.
Specifying schedule with no suboption is equivalent to schedule=runtime.
Some applications cannot use more threads than the maximum number of processors available. Other applications can experience significant performance improvements if they use more threads than there are processors. This option gives you full control over the number of user threads used to run your program.
The default value for num is the number of processors available on the system.
Set num so it is within the acceptable upper limit. num can be up to 256 MB for 32-bit mode, or up to the limit imposed by system resources for 64-bit mode. An application that exceeds the upper limit may cause a segmentation fault.
The glibc library is compiled by default to allow a stack size of 2 MB. Setting num to a value greater than this will cause the default stack size to be used. If larger stack sizes are required, you should link the program to a glibc library compiled with the FLOATING_STACKS parameter turned on.
SDL stands for System Detail Level and can be MCM, L2CACHE, PROC_CORE, or PROC. If the SDL value is not specified, or an incorrect SDL value is specified, the SMP runtime issues an error message.
The list of three integers n1,n2,n3 determines how to divide threads among resources (one of SDLs). n1 is the starting resource_id, n2 is the number of requested resources, and n3 is the stride, which specifies the increment used to determine the next resource_id to bind. n1,n2,n3 must all be specified; otherwise, the SMP runtime issues an error message and default binding rules apply.
When the number of resources specified in bind is greater than the number of threads, the extra resources are ignored.
When the number of threads t is greater than the number of resources x, t threads are divided among x resources according to the following formula:
The ceil(t/x) threads are bound to the first (t mod x) resources. The floor(t/x) threads will be bound to the remaining resources.
XLSMPOPTS="bind=PROC=0,16,2"
chuser "capabilities=CAP_PROPAGATE,CAP_NUMA_ATTACH" username
SDL stands for System Detail Level and can be MCM, L2CACHE, PROC_CORE, or PROC. If the SDL value is not specified, or an incorrect SDL value is specified, the SMP runtime issues an error message.
The list of x integers i1,i2...ix enumerates the resources (one of SDLs) to be used during binding. When the number of integers in the list is greater than or equal to the number of threads, the position in the list determines the thread ID that will be bound to the resource.
When the number of resources specified in bindlist is greater than the number of threads, the extra resources are ignored.
When the number of threads t is greater than the number of resources x, t threads will be divided among x resources according to the following formula:
The ceil(t/x) threads are bound to the first (t mod x) resources. The floor(t/x) threads will be bound to the remaining resources.
XLSMPOPTS="bindlist=MCM=0,1,2,3"This example code binds threads to MCM 0,1,2,3. When the code runs with four threads, thread 0 is bound to MCM 0, thread 1 is bound to MCM 1, thread 2 is bound to MCM 2, and thread 3 is bound to MCM 3. When the code runs with six threads, threads 0 and 1 are bound to MCM 0, threads 2 and 3 are be bound to MCM 1, thread 4 is bound to MCM 2, and thread 5 is bound to MCM 3.
XLSMPOPTS="bindlist=L2CACHE=0,1,0,1,0,1,0,1"
chuser "capabilities=CAP_PROPAGATE,CAP_NUMA_ATTACH" username
When a thread completes its work, the thread continues executing in a tight loop looking for new work. One complete scan of the work queue is done during each busy-wait state. An extended busy-wait state can make a particular application highly responsive, but can also harm the overall responsiveness of the system unless the thread is given instructions to periodically scan for and yield to requests from other applications.
A complete busy-wait state for benchmarking purposes can be forced by setting both spins and yields to 0.
The default value for num is 100.
When a thread sleeps, it completely suspends execution until another thread signals that there is work to do. This provides better system utilization, but also adds extra system overhead for the application.
The default value for num is 100.
The default value for num is 500.
The allowed values for this option are the numbers from 0 to 32. If num is 0, all profiling is turned off, and overheads that occur because of profiling will not occur. If num is greater than 0, running time of the loop is monitored once every num times through the loop. The default for num is 16. Values of num exceeding 32 are changed to 32.
It is important to note that dynamic profiling is not applicable to user-specified parallel loops.
Typically, num is set to be equal to the parallelization overhead. If the computation in a parallelized loop is very small and the time taken to execute these loops is spent primarily in the setting up of parallelization, these loops should be executed sequentially for better performance.
seqthreshold acts as the reverse of parthreshold.