Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Using Platform LSF Parallel Application Integrations


Contents

[ Top ]


Using LSF with ANSYS

LSF use supports various ANSYS solvers through a common integration console built- in to the ANSYS GUI. The only change the average ANSYS user sees is the addition of a Run using LSF? button on the standard ANSYS console.

Using ANSYS with LSF simplifies distribution of jobs, and improves throughput by removing the need for engineers to worry about when or where their jobs run. They simply request job execution and know that their job will be completed as fast as their environment will allow.

Requirements

Configuring LSF for ANSYS

During installation, lsfinstall adds the Boolean resource ansys to the Resource section of lsf.shared.

Host configuration (optional)

If only some of your hosts can accept ANSYS jobs, configure the Host section of lsf.cluster.cluster_name to identify those hosts.

Edit LSF_ENVDIR/conf/lsf.cluster.cluster_name file and add the ansys resource to the hosts that can run ANSYS jobs:

Begin Host
HOSTNAME    model   type  server  r1m   mem   swp  RESOURCES
...
hostA       !       !       1     3.5   ()    ()   ()
hostB       !       !       1     3.5   ()    ()   (ansys)
hostC       !       !       1     3.5   ()    ()   ()
...
End Host

Submitting jobs through ANSYS

To start a job, choose the Batch menu item. The following dialog is displayed:

Initial Jobname

The name given to the job for easier recognition at runtime.

Input filename

Specifies the file of ANSYS commands you are submitting for batch execution. You can either type in the desired file name or click on the ... button, to display a file selection dialog box.

Output filename

Specifies the file to which ANSYS directs text output by the program. If the file name already exists in the working directory, it will be overwritten when the batch job is started.

Memory requested

The memory requirements for the job.

Run using LSF?

Launches ANSYS LSF, a separately licensed product.

Run in background?

Runs the ANSYS job in background or in foreground mode.

Include input listing in output?

Includes or excludes the input file listing at the beginning of the output file.

Parameters to be defined

Additional ANSYS parameters

Time[Date] to execute

Specifies a start time and date to start the job. This option is active after Run in background? has been changed to Yes. To use this option, you must have permission to run the at command on UNIX systems.

Additional LSF configuration

You can also configure additional options to specify LSF job requirements such as queue, host, or desired host architecture:

Available Hosts

Allows users to specify a specific host to run the job on.

Queue

Allows users to specify which queue they desire instead of the default.

Host Types

Allows users to specify a specific architecture for their job.

Submitting jobs through the ANSYS command-line

Submitting a command line job requires extra parameters to run correctly through LSF.

Syntax

bsub -R ansys [bsub_options] ansys_command -b -p productvar 
<input_name >&output_name

-R

Run the job on hosts with the Boolean resource ansys configured

bsub_options

Regular options to bsub that specify the job parameters

ansys_command

The ANSYS executable to be executed on the host (for example, ansys57)

-b

Run the job in ANSYS batch mode

-p productvar

ANSYS product to use with the job

<input_name

ANSYS input file. (You can also use the bsub -i option.)

>&output_name

ANSYS output file. (You can also use the bsub -o option.)

[ Top ]


Using LSF with NCBI BLAST

LSF accepts jobs running NCBI BLAST (Basic Local Alignment Search Tool).

Requirements

Configuring LSF for BLAST jobs

During installation, lsfinstall adds the Boolean resource blast to the Resource section of lsf.shared.

Host configuration (optional)

If only some of your hosts can accept BLAST jobs, configure the Host section of lsf.cluster.cluster_name to identify those hosts.

Edit LSF_ENVDIR/conf/lsf.cluster.cluster_name file and add the blast resource to the hosts that can run BLAST jobs:

Begin Host
HOSTNAME    model   type  server  r1m   mem   swp  RESOURCES
...
hostA       !       !       1     3.5   ()    ()   ()
hostB       !       !       1     3.5   ()    ()   (blast)
hostC       !       !       1     3.5   ()    ()   ()
...
End Host

Submitting BLAST jobs

Use BLAST parallel provided with LSF to submit BLAST jobs.

BLAST parallel is a PERL program that distributes BLAST searches across a cluster by splitting both the query file and the reference database and merging the result files after all BLAST jobs finish.

See the README in the LSF_MISC/examples/blastparallel/ for information about installing, configuring, and using BLAST parallel.

[ Top ]


Using LSF with FLUENT

LSF is integrated with FLUENT products from ANSYS Inc., allowing FLUENT jobs to take advantage of the checkpointing and migration features provided by LSF. This increases the efficiency of the software and means data is processed faster.

FLUENT 5 offers versions based on system vendors' parallel environments (usually MPI using the VMPI version of FLUENT 5.) Fluent also provides a parallel version of FLUENT 5 based on its own socket-based message passing library (the NET version).

This chapter assumes you are already familiar with using FLUENT software and checkpointing jobs in LSF.

See Administering Platform LSF for more information about checkpointing in LSF.

Requirements

Optional requirements

Configuring LSF for FLUENT jobs

During installation, lsfinstall adds the Boolean resource fluent to the Resource section of lsf.shared.

LSF also installs the echkpnt.fluent and erestart.fluent files in LSF_SERVERDIR.

Host configuration (optional)

If only some of your hosts can accept FLUENT jobs, configure the Host section of lsf.cluster.cluster_name to identify those hosts.

Edit LSF_ENVDIR/conf/lsf.cluster.cluster_name file and add the fluent resource to the hosts that can run FLUENT jobs:

Begin Host
HOSTNAME    model   type  server  r1m   mem   swp  RESOURCES
...
hostA       !       !       1     3.5   ()    ()   ()
hostB       !       !       1     3.5   ()    ()   (fluent)
hostC       !       !       1     3.5   ()    ()   ()
...
End Host

Checkpointing in FLUENT

FLUENT 5 is integrated with LSF to use the LSF checkpointing capability. At the end of each iteration, FLUENT looks for the existence of a checkpoint file (check) or a checkpoint exit file (exit). If it detects the checkpoint file, it writes a case and data file, removes the checkpoint file, and continues iterating. If it detects a checkpoint exit file, it writes a case and data file, then exits.

Use the bchkpnt command to create the checkpoint and checkpoint exit files, which forces FLUENT to checkpoint, or checkpoint and exit itself. FLUENT also creates a journal file with instructions to read the checkpointed case and data files, and continue iterating. FLUENT uses this file when it is restarted with the brestart command.

echkpnt and erestart

LSF installs echkpnt.fluent and erestart.fluent, which are special versions of echkpnt and erestart to allow checkpointing with FLUENT. Use bsub -a fluent to make sure your job uses these files.

Checkpoint directories

When you submit a checkpointing job, you specify a checkpoint directory.

Before the job starts running, LSF sets the environment variable LSB_CHKPNT_DIR. The value of LSB_CHKPNT_DIR is a subdirectory of the checkpoint directory specified in the command line. This subdirectory is identified by the job ID and only contains files related to the submitted job.

Checkpoint trigger files

When you checkpoint a FLUENT job, LSF creates a checkpoint trigger file (check) in the job subdirectory, which causes FLUENT to checkpoint and continue running. A special option is used to create a different trigger file (exit) to cause FLUENT to checkpoint and exit the job.

FLUENT uses the LSB_CHKPNT_DIR environment variable to determine the location of checkpoint trigger files. It checks the job subdirectory periodically while running the job. FLUENT does not perform any checkpointing unless it finds the LSF trigger file in the job subdirectory. FLUENT removes the trigger file after checkpointing the job.

Restarting jobs

If a job is restarted, LSF attempts to restart the job with the -restart option appended to the original FLUENT command. FLUENT uses the checkpointed data and case files to restart the process from that checkpoint, rather than repeating the entire process.

Each time a job is restarted, it is assigned a new job ID, and a new job subdirectory is created in the checkpoint directory. Files in the checkpoint directory are never deleted by LSF, but you may choose to remove old files once the FLUENT job is finished and the job history is no longer required.

Submitting FLUENT jobs

Use bsub to submit the job, including parameters required for checkpointing.

Syntax

The syntax for the bsub command to submit a FLUENT job is:

bsub 

[-R fluent] -a fluent [-k checkpoint_dir | -k "checkpoint_dir [checkpoint_period]" [bsub options] FLUENT command [FLUENT options] -lsf

-R fluent

Optional. Specify the fluent shared resource if the FLUENT application is only installed on certain hosts in the cluster

-a fluent

Use the esub for FLUENT jobs, which automatically sets the checkpoint method to fluent to use the checkpoint and restart programs for FLUENT jobs, echkpnt.fluent and erestart.fluent.

The checkpointing feature for FLUENT jobs requires all of the following parameters:

-k checkpoint_dir

Regular option to bsub that specifies the name of the checkpoint directory.

checkpoint_period

Regular option to bsub that specifies the time interval in minutes that LSF will automatically checkpoint jobs.

FLUENT command

Regular command used with FLUENT software.

-lsf

Special option to the FLUENT command. Specifies that FLUENT is running under LSF, and causes FLUENT to check for trigger files in the checkpoint directory if the environment variable LSB_CHKPNT_DIR is set.

Examples

Note


When using the net version of FLUENT 5, pam is not used to launch FLUENT, so the JOB_STARTER argument of the queue should not be set. Instead, LSF sets an environment variable to contain a list of hosts and FLUENT uses this list to launch itself.

Checkpointing, restarting, and migrating FLUENT jobs

Checkpointing

bchkpnt [bchkpnt_options] [-k] [job_ID]

Restarting

brestart [brestart options] checkpoint_directory [job_ID]

Migrating

bmig [bsub_options] [job_ID]

Examples

[ Top ]


Using LSF with Gaussian

Platform LSF accepts jobs running the Gaussian electronic structure modeling program.

Requirements

Configuring LSF for Gaussian jobs

During installation, lsfinstall adds the Boolean resource gaussian to the Resource section of lsf.shared.

Host configuration (optional)

If only some of your hosts can accept Gaussian jobs, configure the Host section of lsf.cluster.cluster_name to identify those hosts.

Edit LSF_ENVDIR/conf/lsf.cluster.cluster_name file and add the gaussian resource to the hosts that can run Gaussian jobs:

Begin Host
HOSTNAME    model   type  server  r1m   mem   swp  RESOURCES
...
hostA       !       !       1     3.5   ()    ()   ()
hostB       !       !       1     3.5   ()    ()   (gaussian)
hostC       !       !       1     3.5   ()    ()   ()
...
End Host

Submitting Gaussian jobs

Use bsub to submit the job, including parameters required for Gaussian.

[ Top ]


Using LSF with Lion Bioscience SRS

SRS is Lion Bioscience's Data Integration Platform, in which data is extracted by all other Lion Bioscience applications or third-party products. LSF works with the batch queue feature of SRS to provide load sharing and allow users to manage their running and completed jobs.

Requirements

Configuring LSF for SRS jobs

During installation, lsfinstall adds the Boolean resource lion to the Resource section of lsf.shared.

Host configuration (optional)

If only some of your hosts can accept SRS jobs, configure the Host section of lsf.cluster.cluster_name to identify those hosts.

Edit LSF_ENVDIR/conf/lsf.cluster.cluster_name file and add the lion resource to the hosts that can run SRS jobs:

Begin Host
HOSTNAME    model   type  server  r1m   mem   swp  RESOURCES
...
hostA       !       !       1     3.5   ()    ()   ()
hostB       !       !       1     3.5   ()    ()   (lion)
hostC       !       !       1     3.5   ()    ()   ()
...
End Host

SRS batch queues

You must also configure SRS for batch queues. When SRS batch queueing is enabled, users select from the available batch queues displayed next to the application Launch button in the Application Launch page.

See the SRS administration manual for information about setting up a batch queue system. No additional configuration is required in LSF.

Submitting and monitoring SRS jobs

Submitting jobs

Use bsub to submit the job, including parameters required for SRS.

Monitoring jobs

As soon as the application is submitted, you can monitor the progress of the job. When applications are launched and batch queues are in use, an icon appears. The icon looks like a "new mail" icon in an email program when jobs are running, and looks like a "read mail" icon when all launched jobs are complete. You can click this icon at any time to:

You can also view the application results or launch another application against those results, using the results of the initial job as input for the next job.

See the SRS Administrator's Manual for more information.

[ Top ]


Using LSF with LSTC LS-DYNA

LSF is integrated with products from Livermore Software Technology Corporation (LSTC). LS-DYNA jobs can use the checkpoint and restart features of LSF and take advantage of both SMP and distributed MPP parallel computation.

To submit LS-DYNA jobs through LSF, you only need to make sure that your jobs are checkpointable.

See Administering Platform LSF for more information about checkpointing in LSF.

Requirements

Optional requirements

Configuring LSF for LS-Dyna jobs

During installation, lsfinstall adds the Boolean resource ls_dyna to the Resource section of lsf.shared.

LSF also installs the echkpnt.ls_dyna and erestart.ls_dyna files in LSF_SERVERDIR.

Host configuration (optional)

If only some of your hosts can accept LS-DYNA jobs, configure the Host section of lsf.cluster.cluster_name to identify those hosts.

Edit LSF_ENVDIR/conf/lsf.cluster.cluster_name file and add the ls_dyna resource to the hosts that can run LS-DYNA jobs:

Begin Host
HOSTNAME    model   type  server  r1m   mem   swp  RESOURCES
...
hostA       !       !       1     3.5   ()    ()   ()
hostB       !       !       1     3.5   ()    ()   (ls_dyna)
hostC       !       !       1     3.5   ()    ()   ()
...
End Host

LS-DYNA integration with LSF checkpointing

LS-DYNA is integrated with LSF to use the LSF checkpointing capability. It uses application-level checkpointing, working with the functionality implemented by LS- DYNA. At the end of each time step, LS-DYNA looks for the existence of a checkpoint trigger file, named D3KIL.


LS-DYNA jobs always exit with 0 even when checkpointing. LSF will report that the job has finished when it has checkpointed.

Use the bchkpnt command to create the checkpoint trigger file, D3KIL, which LS- DYNA reads. The file forces LS-DYNA to checkpoint, or checkpoint and exit itself. The existence of a D3KIL file and the checkpoint information that LSF writes to the checkpoint directory specified for the job are all LSF needs to restart the job.

Checkpointing and tracking of resources of SMP jobs is supported.


With pam and Task Starter, you can track resources of MPP jobs, but cannot checkpoint. If you do not use pam and Task Starter, checkpointing of MPP jobs is supported, but tracking is not.

echkpnt and erestart

LSF installs echkpnt.ls_dyna and erestart.ls_dyna, which are special versions of echkpnt and erestart to allow checkpointing with LS-DYNA. Use bsub -a ls_dyna to make sure your job uses these files.

The method name ls_dyna, uses the esub for LS-DYNA jobs, which sets the checkpointing method LSB_ECHKPNT_METHOD="ls_dyna" to use echkpnt.ls_dyna and erestart.ls_dyna.

Checkpoint directories

When you submit a checkpointing job, you specify a checkpoint directory.

Before the job starts running, LSF sets the environment variable LSB_CHKPNT_DIR to a subdirectory of the checkpoint directory specified in the command line, or the CHKPNT parameter in lsb.queues. This subdirectory is identified by the job ID and only contains files related to the submitted job.

For checkpointing to work when running an LS-DYNA job from LSF, you must CD to the directory that LSF sets in $LSB_CHKPNT_DIR after submitting LS-DYNA jobs. You must change to this directory whether submitting a single job or multiple jobs. LS- DYNA puts all its output files in this directory.

Checkpoint trigger files

When you checkpoint a job, LSF creates a checkpoint trigger file named D3KIL in the working directory of the job.

The D3KIL file contains an entry depending on the desired checkpoint outcome:


The other possible LS-Dyna switch parameters are not relevant to LSF checkpointing.

LS-DYNA does not remove the D3KIL trigger file after checkpointing the job.

Restarting Jobs

If a job is restarted, LSF attempts to restart the job with the -r restart_file option used to replace any existing -i or -r options in the original LS-DYNA command. LS-DYNA uses the checkpointed data to restart the process from that checkpoint point, rather than starting the entire job from the beginning.

Each time a job is restarted, it is assigned a new job ID, and a new job subdirectory is created in the checkpoint directory. Files in the checkpoint directory are never deleted by LSF, but you may choose to remove old files once the LS-DYNA job is finished and the job history is no longer required.

Submitting LS-DYNA jobs

To submit DYNA jobs, redirect a job script to the standard input of bsub, including parameters required for checkpointing. With job scripts, you can manage two limitations of LS-DYNA job submissions:

To submit LS-DYNA jobs with job submission scripts, embed the LS-DYNA job in the job script. Use the following format to run the script:

% bsub < jobscript

bsub syntax

Inside your job scripts, the syntax for the bsub command to submit an LS-DYNA job is either of the following:

[-R ls_dyna] -k "checkpoint_dir method=ls_dyna" | -k "checkpoint_dir [checkpoint_period] method=ls_dyna" [bsub_options] LS_DYNA_command [LS_DYNA_options]

OR:

[-R ls_dyna] -a ls_dyna -k "checkpoint_dir" | -k "checkpoint_dir [checkpoint_period]" [bsub options] LS_DYNA_command [LS_DYNA_options]

-R ls_dyna

Optional. Specify the ls_dyna shared resource if the LS-DYNA application is only installed on certain hosts in the cluster.

method=ls_dyna

Mandatory. Use the esub for LS-DYNA jobs, which automatically sets the checkpoint method to ls_dyna to use the checkpoint and restart programs echkpnt.ls_dyna and erestart.ls_dyna. Alternatively, use bsub -a to specify the ls_dyna esub.

The checkpointing feature for LS-DYNA jobs requires all of the following parameters:

-k checkpoint_dir

Mandatory. Regular option to bsub that specifies the name of the checkpoint directory. Specify the ls_dyna method here if you do not use the bsub -a option.

checkpoint_period

Regular option to bsub that specifies the time interval in minutes that LSF will automatically checkpoint jobs.

LS_DYNA_command

Regular LS-DYNA software command and options.

Preparing your job scripts

Environment variables

Specify any environment variables required for your LS-DYNA jobs. For example:

LS_DYNA_ENV=VAL;export LS_DYNA_ENV

If you do not set your environment variables in the job script, then you must add some lines to the script to restore environment variables. For example:

if [ -f $LSB_CHKPNT_DIR/.envdump ]; then
.$LSB_CHKPNT_DIR/.envdump
fi

Change directory

Ensure that your jobs run in the checkpoint directory set by LSF, by adding the following line after your bsub commands:

cd $LSB_CHKPNT_DIR

LS-DYNA command

Write the LS-DYNA command you want to run. For example:

/usr/share/ls_dyna_path/ls960 endtime=2 
i=/usr/share/ls_dyna_path/airbag.deploy.k ncpu=1

Example job scripts

All scripts must contain the ls_dyna method and the cd command to the checkpoint directory set by LSF.

See Administering Platform LSF for information about submitting jobs with job scripts.

Checkpointing, restarting, and migrating LS-DYNA jobs

Checkpointing

bchkpnt [bchkpnt_options] [-k] [job_ID]

See Platform LSF Command Reference for more information about bchkpnt.

Restarting

brestart [brestart_options] checkpoint_directory [job_ID]

See Platform LSF Command Reference for more information about brestart.

Migrating

bmig [bsub_options] [job_ID]

See Platform LSF Command Reference for more information about bmig.

[ Top ]


Using LSF with MSC Nastran

MSC Nastran Version 70.7.2 ("Nastran") runs in a Distributed Parallel mode, and automatically detects a job launched by LSF, and transparently accepts the execution host information from LSF.

The Nastran application checks if the LSB_HOSTS or LSB_MCPU_HOSTS environment variable is set in the execution environment. If either is set, Nastran uses the value of the environment variable to produce a list of execution nodes for the solver command line. Users can override the hosts chosen by LSF to specify their own host list.

Requirements

Configuring LSF for Nastran jobs

During installation, lsfinstall adds the Boolean resource nastran to the Resource section of lsf.shared.

No additional executable files are needed.

Host configuration (optional)

If only some of your hosts can accept Nastran jobs, configure the Host section of lsf.cluster.cluster_name to identify those hosts.

Edit LSF_ENVDIR/conf/lsf.cluster.cluster_name file and add the nastran resource to the hosts that can run Nastran jobs:

Begin Host
HOSTNAME    model   type  server  r1m   mem   swp  RESOURCES
...
hostA       !       !       1     3.5   ()    ()   ()
hostB       !       !       1     3.5   ()    ()   (nastran)
hostC       !       !       1     3.5   ()    ()   ()
...
End Host

Submitting Nastran jobs

Use bsub to submit the job, including parameters required for the Nastran command line.

Syntax

bsub -n num_processors [-R nastran] bsub_options 
nastran_command

Nastran dmp variable

You must set the Nastran dmp variable to the same number as the number of processors the job is requesting (-n option of bsub).

Examples

Nastran on Linux using LAM/MPI

You must write a script that will pick up the LSB_HOSTS variable and provide the chosen hosts to the Nastran program. You can then submit the script using bsub:

bsub -a nastran lammpi -q hpc_linux -n 2 -o out -e err -R "span[ptile=1]" 
lsf_nast

This will submit a 2-way job which will put its standard output in the file named out and standard error in a file named err. The ptile=1 option tells LSF to choose at most 1 CPU per node chosen for the job.

Sample lsf_nast script

The following sample lsf_nast script only represents a starting point, but deals with the host specification for LAM/MPI. This script should be modified at your site before use.

#! /bin/sh
#
# lsf script to use with Nastran and LAM/MPI.
#
#
#Set information for Head node:
DAT=/home/user1/lsf/bc2.dat
#
#Set information for Cluster node:
TMPDIR=/home/user1/temp
#
LOG=${TMPDIR}/log
LSB_HOST_FILE=${TMPDIR}/lsb_hosts
:> ${LOG}
# The local host MUST be in the host file.
echo ${LSB_SUB_HOST} > ${LSB_HOST_FILE}
#
#
# Create the lam hosts file:
for HOST in $LSB_HOSTS
do
echo $HOST >> ${LSB_HOST_FILE}
done
#
cd ${TMPDIR}
rcp ${LSB_SUB_HOST}:${DAT} .
id
# recon -v ${LSB_HOST_FILE}
# cat ${LSB_HOST_FILE}
# pwd
recon -v ${LSB_HOST_FILE} >> ${LOG} 2>&1
lamboot -v ${LSB_HOST_FILE} >> ${LOG} 2>&1
NDMP=`sed -n -e '$=' ${LSB_HOST_FILE}`
HOST="n0"
(( i=1 ))
while (( i < $NDMP )) ; do
HOST="$HOST:n$i"
(( i += 1 ))
done
echo DAT=${DAT##*/}
pwd
nast707t2 ${DAT##*/} dmp=${NDMP} scr=yes bat=no hosts=$HOST >> 
${LOG}
2>&1
wipe -v ${LSB_HOST_FILE} >> ${LOG} 2>&1
#
# Bring back files:
DATL=${DAT##*/}
rcp ${DATL%.dat}.log ${LSB_SUB_HOST}:${DAT%/*}
rcp ${DATL%.dat}.f04 ${LSB_SUB_HOST}:${DAT%/*}
rcp ${DATL%.dat}.f06 ${LSB_SUB_HOST}:${DAT%/*}
#
# End

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: January 10, 2011
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2011 Platform Computing Corporation. All rights reserved.