Fixed Bugs for Platform LSF™ Version 7 Update 6

Release Date:   September 2009

 

The following bugs have been fixed in the September 2009 update (LSF 7 Update 6) since the March 2009 update (LSF 7 Update 5):

 

132890

Date

2009-08-26

 

Description

profile.lsf, hostsetup points to linux2.6-glibc2.3-ia64 instead of linux2.6-glibc2.4-sn-ipf

 

Component

 

 

Platform

All

 

Impact

Cluster cannot be set up properly

 

 

133086

Date

2009-08-19

 

Description

HP-MPI error "Vendor MPI API process xxxx is terminated by signal 11" under LSF HPC EP5

 

Component

pam

 

Platform

All

 

Impact

HP

 

129878

Date

2009-08-12

 

Description

The job is not dispatched to the specified host.

 

Component

 

 

Platform

All

 

Impact

bsub -m does not work

 

129623

Date

2009-08-11

 

Description

Long-running jobs are triggering idle exceptions. bjobs -l shows those jobs are running. process explorer shows those jobs are suspended. Jobs must be resumed manually.

 

Component

sbatchd

 

Platform

windows

 

Impact

Time spent managing idle jobs.

 

 

131740

Date

2009-08-10

 

Description

blimits shows value of 1 for slots per processor despite a limit of 0.5.

 

Component

Blimits

 

Platform

output of blimits –c

 

Impact

Output is confusing

 

132433

Date

2009-08-07

 

Description

using "," in select resoruce requirement string causes LSF to ignore the queue level

 

Component

Bsub

 

Platform

All

 

Impact

Jobs get dispatched to the wrong host

 

 

131293

Date

2009-08-06

 

Description

bpeek fails if defined spool directory is not accessible.

 

Component

Bpeek

 

Platform

All

 

Impact

if $JOB_SPOOL_DIR or $HOME/.lsbatch are both unwritable and res is down, bpeek cannot get job output.

 

132028

Date

2009-07-29

 

Description

IMPT_JOBBKLG does not work

 

Component

Mbatchd

 

Platform

All

 

Impact

IMPT_JOBBKLG does not work

 

131664

Date

2009-07-27

 

Description

Runlimit is not observed for interative jobs with job-level pre-exec

 

Component

Sbatchd

 

Platform

All

 

Impact

Runlimit is not observed for interative jobs with job-level pre-exec

 

131708

Date

2009-07-27

 

Description

The supplementary group ID was changed in the POST-EXEC script.

 

Component

Sbatchd

 

Platform

UNIX

 

Impact

Security concerns

 

128001

Date

2009-07-26

 

Description

User share account mistakenly charged, affecting job dispatching order

 

Component

Mbatchd

 

Platform

All

 

Impact

User share account mistakenly charged, affecting job dispatching order

 

130991

Date

2009-07-24

 

Description

When lsf.conf on master host is larger than 8k, Windows hosts could not join the cluster and fail to get conf from the master.

 

Component

Lim

 

Platform

All

 

Impact

windows side of the cluster is down

 

 

129836

Date

2009-07-21

 

Description

pim loads wrong library for get_weighted_memory_size() causing function not to be found and the memory usage is over counted when memory shared by multi-process.

 

Component

Pim

 

Platform

linux2.6-glibc2.3-sn_ipf

 

Impact

Memory usage is wrong.

 

131607

Date

2009-07-16

 

Description

xlsbatch filters do not work for multiple users.

 

Component

xlsbatch.exe

 

Platform

Windows

 

Impact

xlsbatch filters do not work for multiple users.

 

114194

Date

2009-07-14

 

Description

pim coredump when there are lots of dead processes

 

Component

Pim

 

Platform

All

 

Impact

pim coredump when there are lots of dead processes

 

130374

Date

2009-07-10

 

Description

lim on non-shared SOLARIS slave can't startup without LSF_DYNAMIC_HOST_WAIT_TIME

 

Component

Lim

 

Platform

Solairs

 

Impact

lim on non-shared SOLARIS slave can't startup without LSF_DYNAMIC_HOST_WAIT_TIME

 

130925

Date

2009-07-10

 

Description

Window opened by tspeek cannot be closed.

 

Component

lstsc.dll

 

Platform

Windows

 

Impact

Window opened by tspeek cannot be closed.

 

131071

Date

2009-07-07

 

Description

With queue administrators to all, a normal user cannot manage the queue.

 

Component

mbatchd

 

Platform

All

 

Impact

With queue administrators to all, a normal user cannot manage the queue.

 

119759

Date

2009-07-03

 

Description

Pre and Post-exec environment user group IDs may not be consistent with that of job execution environment.

 

Component

libbat.a sbatchd libbat.so bsub

 

Platform

UNIX

 

Impact

Pre and Post-exec environment user group IDs may not consistent with that of job execution environment.

 

127937

Date

2009-07-02

 

Description

job limits are not honored with running jobs more than the number of available cpus if bpost or bmod is being used

 

Component

 

 

Platform

All

 

Impact

job limits feature (queue-level or general limits) is not working

 

127135

Date

2009-07-01

 

Description

A user’s earlier advance reservations are overridden by later ones after reaching 100 advance reservations.

 

Component

mbatchd

 

Platform

All

 

Impact

A user’s earlier advance reservations are overridden by later ones after reaching 100 advance reservations.

 

119531

Date

2009-06-25

 

Description

General limits no followed for array jobs when using bmod or bpost on an array with some elements already running.

 

Component

 

 

Platform

All

 

Impact

scheduler becomes unavailable frequently

 

130335

Date

2009-06-24

 

Description

lsrun command slow in WAN network

 

Component

res

 

Platform

windows

 

Impact

lsrun command slow in WAN network

 

127825

Date

2009-06-23

 

Description

If the license project for a job has distribution for another feature the job will pend. However, if the license project is defined in Projects section but does not have distribution for any feature, the job runs.

 

Component

mbatchd schmod_default.so

 

Platform

All

 

Impact

Pending jobs.

 

128706

Date

2009-06-22

 

Description

With LSF_FLEXLM_ENABLE_THREAD=Y in lsf.conf, a lim restart results in the host slot limit not being obeyed.

 

Component

mbschd

 

Platform

All

 

Impact

jobs fail due to too many jobs running on one host

 

130227

Date

2009-06-22

 

Description

batch command fails if option is used without any space

 

Component

liblsf.so bresize bgmod liblsf.a libbat.a brsvadd brsvmod libbat.so bmod blaunch bsub

 

Platform

All

 

Impact

batch command fails if option is used without any space

 

127579

Date

2009-06-19

 

Description

LSF failed to send email.

 

Component

n/a

 

Platform

Windows

 

Impact

LSF failed to send email.

 

129513

Date

2009-06-18

 

Description

When there are running jobs in LSF6.2 and the cluster upgrades to LSF7.0.3, "duration" will not take effect for the running jobs.

 

Component

mbatchd

 

Platform

All

 

Impact

When there are running jobs in LSF6.2 and the cluster upgrades to LSF7.0.3, "duration" will not take effect for the running jobs.

 

124249

Date

2009-06-12

 

Description

When the customer maximizes or resizes the window opened by tspeek on windows XP SP3 client hosts, the window closes with an error window popup.

 

Component

lstsc.dll

 

Platform

windows

 

Impact

Cannot maximize or resize the windows opened by tspeek.

 

129169

Date

2009-06-12

 

Description

mbschd coredumps

 

Component

mbschd

 

Platform

All

 

Impact

jobs cannot be scheduled

 

129801

Date

2009-06-12

 

Description

bmig is not working

 

Component

sbatchd

 

Platform

linux2.6-glibc2.3-x86_64-xt3

 

Impact

bmig is not working


129254

Date

2009-06-10

 

Description

New parser interpreter maintains a cache that failed to parse the right value.

 

Component

mbschd

 

Platform

All

 

Impact

New parser functionality broken.

 

128749

Date

2009-06-10

 

Description

With LSB_MIXED_PATH_ENABLE enabled, JOB_STARTER will not execute on a Windows host.

 

Component

sbatchd mbatchd

 

Platform

All

 

Impact

With LSB_MIXED_PATH_ENABLE enabled, JOB_STARTER will not execute on a Windows host.

 

127172

Date

2009-06-05

 

Description

Free tokens are not being reallocated by License Scheduler when there is demand from other license projects.

 

Component

mbd

 

Platform

All

 

Impact

Licenses will be underutilized.

 

128815

Date

2009-06-04

 

Description

Failure of LSF 7.0 API C++ program complied & run by Visual Studio 2008

 

Component

sbatchd mbatchd libbat.lib liblsf.lib

 

Platform

windows

 

Impact

Failure of LSF 7.0 API C++ program complied & run by Visual Studio 2008

 

127762

Date

2009-06-01

 

Description

Application res processes are taking up more system resources (mostly cputime) than the actual application processes.

 

Component

res

 

Platform

solaris

 

Impact

Light interactive jobs are running slower, job submission is becoming slower.

 

128174

Date

2009-05-30

 

Description

Jobs pend with resreq not satisfied, but exact reason is not clear

 

Component

mbschd schmod_default.so

 

Platform

All

 

Impact

Difficult to find reason for pending jobs

 

127245

Date

2009-05-27

 

Description

Assume we have defined license ownership for Project_A and that a job is submitted to this project. If non-LS resource requirements of the job are not satisfied, LS still suspends the job.

 

Component

mbschd schmod_default.so

 

Platform

All

 

Impact

Low utilization of licenses.

 

128483

Date

2009-05-22

 

Description

btop and bbottom don't work in X-GUI

 

Component

xlsbatch.exe

 

Platform

Windows

 

Impact

Cannot adjust the position of the job in X-GUI

 

128318

Date

2009-05-22

 

Description

Dynamic debugging with several log classes does not work.

 

Component

No

 

Platform

All

 

Impact

LC_MEMORY, LC_RESREQ, LC_XDRVERSION and LC_FLEX do not work in dynamic debugging.

 

128030

Date

2009-05-21

 

Description

lsbeventsloader can parse most of event records in lsb.stream file

 

Component

liblsbstream.so

 

Platform

aix5-64

 

Impact

data loss

 

127688

Date

2009-05-21

 

Description

The job output format between interactive and non-interactive jobs is inconsistent.

 

Component

sbatchd

 

Platform

All

 

Impact

The job output format between interactive and non-interactive jobs is inconsistent

 

126035

Date

2009-05-19

 

Description

Excessive suspension due to resource preemption

 

Component

mbschd schmod_preemption.so

 

Platform

All

 

Impact

Excessive suspension due to resource preemption

 

127676

Date

2009-05-14

 

Description

broken backward compatibility - batch library depends on 'dl' library

 

Component

liblsf.so lsf.h libbat.so

 

Platform

non-windows

 

Impact

Lacking openMPI back compatibility

 

127238

Date

2009-05-14

 

Description

Inconsistent resource pool in multi-queues for queue-based fairshare is not detected by badmin ckconfig

 

Component

mbatchd

 

Platform

All

 

Impact

User will not be aware if the fairshare is working as configured or not

 

127607

Date

2009-05-12

 

Description

mpichp4 job fails when submitted by "pam -g 1 mpichp4_wrapper mpi_program ..."

 

Component

mpichp4_wrapper

 

Platform

All

 

Impact

mpichp4 job cannot run.

 

125548

Date

2009-05-11

 

Description

Setting LD_ASSUME_KERNEL could cause LSF batch job to exit.

 

Component

sbatchd

 

Platform

Linux

 

Impact

Setting LD_ASSUME_KERNEL could cause LSF batch job to exit.

 

126639

Date

2009-05-11

 

Description

Every 15 minutes, mbatchd sends all share acct data to mbschd, causing mbschd to be busy.

 

Component

mbschd mbatchd

 

Platform

All

 

Impact

mbschd is busy

 

122448

Date

2009-05-07

 

Description

Job submission fails occasionally.

 

Component

liblsf.so b* libbat.a liblsf.a ls* libbat.so

 

Platform

All

 

Impact

Job submission fails occasionally, requiring workaround.

 

126804

Date

2009-05-07

 

Description

Modifying rusage for a running job may not take effect.

 

Component

mbatchd

 

Platform

All

 

Impact

Running job is not updated with new rusage.

 

115430

Date

2009-05-06

 

Description

When slot limits are configured, LSF can fail to dispatch a parallel job even though the job should start.  For parallel jobs in reservation queues, LSF can reserve resources for these jobs in violation of limits.

 

Component

mbschd schmod_reserve.so

 

Platform

Windows

 

Impact

LSF does not schedule jobs according to the configured policies.

 

127013

Date

2009-04-28

 

Description

7.0.5 mbatchd error "An enforced user group”

 

Component

mbatchd

 

Platform

All

 

Impact

Fairshare is ignored after playing events file. For newly submitted jobs, mbatchd reports the same error.

 

115643

Date

2009-04-26

 

Description

If two slave queues have the same priority different from the master’s  queue priority, the job scheduling and dispatch sequence are not fairshare.

 

Component

schmod_fairshare.so schmod_reserve.so mbschd schmod_preemption.so schmod_mc.so mbatchd schmod_default.so

 

Platform

All

 

Impact

Job scheduling is not as anticipated.

 

120343

Date

2009-04-20

 

Description

LSF 7.0.3 reports job pending reasons on irrelevant hosts different from LSF 5.1.

 

Component

Mbd

 

Platform

All

 

Impact

LSF 7.0.3 pending reasons are mostly irrelevant. mbschd is publishing the pending reasons to mbatchd with potential performance impact.

 

126303

Date

2009-04-17

 

Description

mbschd exits when handling bmod events after a resume decision is made in the same scheduling cycle

 

Component

mbschd

 

Platform

All

 

Impact

Scheduler exits and restarts

 

126071

Date

2009-04-15

 

Description

After badmin mbdrestart, the ADJUST value changed to unreasonable value for job with rusage resreq.

 

Component

mbatchd

 

Platform

All

 

Impact

X

 

125838

Date

2009-04-13

 

Description

If an array element depends on other elements within the job array, the whole job array can't be cleaned.

 

Component

mbatchd

 

Platform

All

 

Impact

If an array element depends on other elements within the job array, the whole job array can't be cleaned.

 

126027

Date

2009-04-13

 

Description

ENFORCE_ONE_UG_LIMITS feature does not work properly when host group contains "all"

 

Component

 

 

Platform

All

 

Impact

ENFORCE_ONE_UG_LIMITS feature does not work properly when host group contains "all"

 

101754

Date

2009-04-09

 

Description

Fairshare queue rejects job submissions from some users. Problem occurs when:

1.      A hierarchical user group defined with members being all users.

2.      One or more user groups are defined with one or more specific users. The user group(s) are defined after the hierarchical user group(s) in lsb.users.

3.      3. "badmin mbdrestart" is used to reconfigure the cluster.

 

Component

mbatchd

 

Platform

All

 

Impact

Some users can not submit jobs to a fairshare queue.

 

125547

Date

2009-04-07

 

Description

bhosts -R on a static client behaves differently from on a server. On a server it will list the hosts whose types are same as the server executed the command. On static client it will list all hosts

 

Component

mbatchd

 

Platform

All

 

Impact

bhosts -R on a static client behaves different from on a server.

 

125361

Date

2009-04-06

 

Description

Unexpected aprun jobs placement with sbatchd from the build 118796

 

Component

sbatchd

 

Platform

linux2.6-glibc2.3-x86_64-xt3

 

Impact

Jobs dispatched to the same crayxt4 front nodes fail to run or run on the wrong reservation

 

114627

Date

2009-03-30

 

Description

In documentation 3 parameters are required to turn on dynamic add hosts: LSF_MASTER_LIST LSF_HOST_ADDR_RANGE LSF_DYNAMIC_HOST_WAIT_TIME

In practice only the first 2 parameters are required.

 

Component

lim

 

Platform

All

 

Impact

Hosts are added to the cluster dynamically unexpectedly.

 

125136

Date

2009-03-20

 

Description

bjobs –u behavior changed

 

Component

mbatchd

 

Platform

All

 

Impact

bjobs –u behavior changed

 

124088

Date

2009-03-19

 

Description

Corrupted lsb.events file, same jobid assigned to different jobs

 

Component

mbatchd

 

Platform

All

 

Impact

Two mbatchds write to the same lsb.events file causing its data to become corrupted.

 

124842

Date

2009-03-18

 

Description

If stdout is redirected, bpeek output is redundant.

 

Component

 

 

Platform

All

 

Impact

Redundant output using bpeek.

 

124753

Date

2009-03-18

 

Description

The bparams command does not list LSB_STOP_ASKING_LICENSES_TO_LS.

 

Component

libbatch.so libbat.a mbatchd libbat.so

 

Platform

All

 

Impact

Cannot use bparams to check mbatchd parameter setting.

 

124442

Date

2009-03-18

 

Description

mbatchd restart/reconfig is slow due to too many user groups

 

Component

mbatchd

 

Platform

All

 

Impact

Slow restart/startup.

 

124316

Date

2009-03-11

 

Description

No debug log created under LSB_CMD_LOGDIR if application launched with <pathname>/application-name.

 

Component

LSF libraries all binaries

 

Platform

All

 

Impact

No debug log created under LSB_CMD_LOGDIR if application launched with <pathname>/application-name.

 

123307

Date

2009-03-09

 

Description

When normal users change settings in the "Options" menu (e.g. select "Show queues") and try to "Options -> Save settings", this does not work. The next time xlsbatch.exe is started the old settings are used again.

 

Component

xlsbatch.exe

 

Platform

Windows

 

Impact

Users can’t save settings.

 

123501

Date

2009-03-06

 

Description

sbatchd dumps core upon startup

 

Component

sbatchd

 

Platform

UNIX

 

Impact

sbatchd dumps core upon startup

 

124289

Date

2009-03-05

 

Description

On HP-UX, LSF_CMD_LOGDIR and LSB_CMD_LOGDIR do not create command debug log file.

 

Component

binaries libraries and all

 

Platform

HP-UX

 

Impact

On HP-UX, LSF_CMD_LOGDIR and LSB_CMD_LOGDIR do not create command debug log file.

 

123601

Date

2009-03-05

 

Description

blimits displays incorrect limit information when PARALLEL_SCHED_BY_SLOT is enabled

 

Component

blimits

 

Platform

All

 

Impact

blimits displays incorrect limit information when PARALLEL_SCHED_BY_SLOT is enabled

 

123471

Date

2009-03-03

 

Description

When using lsload -l customer may see the wrong value of external resources.

 

Component

Lsload

 

Platform

All

 

Impact

When using lsload -l customer may see the wrong value of external resources.

 

122775

Date

2009-02-24

 

Description

IMPT_JOBBKLG does not work as described

 

Component

mbatchd

 

Platform

All

 

Impact

IMPT_JOBBKLG does not work as described

 

123342

Date

2009-02-24

 

Description

INFO-level messages logged repeatedly

 

Component

mbatchd

 

Platform

All

 

Impact

mbatchd log grows quickly, filling up disk space.

 

123958

Date

2009-02-24

 

Description

bsub aborted with large .lsfhosts file.

 

Component

libbat.a libbat.so bsub

 

Platform

All

 

Impact

Unable to submit jobs

 

122521

Date

2009-02-24

 

Description

On submission cluster after mbdrestart, bpeek cannot get the correct output file.

 

Component

libbat.a mbatchd lsbatch.h libbat.so bpeek

 

Platform

All

 

Impact

On submission cluster after mbdrestart, bpeek cannot get the correct output file.

 

122877

Date

2009-02-18

 

Description

lsload reports incorrect r15s if a multithreaded process is running

 

Component

lim

 

Platform

linux2.6

 

Impact

lsload reports incorrect r15s if a multithreaded process is running

 

123942

Date

2009-02-27

 

Description

Old exited jobs from the remote cluster not removed from cluster.

 

Component

mbatchd

 

Platform

All

 

Impact

Old exited jobs from the remote cluster not removed from cluster.

 

123309

Date

2009-02-26

 

Description

C program using LSF 7.0 API cannot run when compiled by VS2008

 

Component

liblsf.lib

 

Platform

Windows

 

Impact

C program using LSF 7.0 API cannot run when compiled by VS2008

 

122924

Date

2009-02-25

 

Description

After bmod to run on a different execution host, job is dispatched but exits with exit code 255.

 

Component

mbatchd

 

Platform

All

 

Impact

bmod not working

 

123180

Date

2009-02-16

 

Description

When a new job preempts a starting job, the preempted job sometimes runs indefinitely.

 

Component

sbatchd

 

Platform

All

 

Impact

A job can run indefinitely.

 

123098

Date

2009-02-12

 

Description

lshosts -l terminated with a Windows dialog "An unhandled win32 exception occurred" when there is a host whose status is "unavail".

 

Component

lshosts

 

Platform

win2003 x64

 

Impact

No information from lshosts -l

 

104416

Date

2009-01-22

 

Description

pam dies inside mpirun(), job fails

 

Component

pam

 

Platform

linux2.6-glibc2.3-x86_64

 

Impact

Cannot run MPI job on SGI x86_64 platform

 

Technical Support

support@platform.com

www.platform.com

 

North America: +1 905 948 4297

Europe: +44 1256 370 530

Asia: +86 10 6238 1125

Toll-free: 1-877-444-4573

 

Platform Support

Platform Computing Corporation

3760 14th Avenue

Markham, Ontario

Canada L3R 3T7

Copyright

© 1994 - 2009 Platform Computing Corporation

All Rights Reserved.

Although the information in this document has been carefully reviewed, Platform Computing Corporation  (“Platform”) does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document.

UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED IN THIS DOCUMENT IS PROVIDED “AS IS” AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA, OR SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM.

 

Document redistribution policy : This document is protected by copyright and you may not redistribute or translate it into another language, in part or in whole. You may only redistribute this document internally within your organization (for example, on an intranet).

Trademarks

LSF is a registered trademark of Platform Computing Corporation in the United States and in other jurisdictions.

 

ACCELERATING INTELLIGENCE, THE BOTTOM LINE IN DISTRIBUTED COMPUTING, PLATFORM COMPUTING, CLUSTERWARE, PLATFORM ACTIVECLUSTER, IT INTELLIGENCE, SITEASSURE, PLATFORM SYMPHONY, PLATFORM JOBSCHEDULER, PLATFORM INTELLIGENCE, PLATFORM INFRASTRUCTURE INSIGHT, PLATFORM WORKLOAD INSIGHT, and the PLATFORM and LSF logos are trademarks of Platform Computing Corporation in the United States and in other jurisdictions.

 

UNIX is a registered trademark of The Open Group in the United States and in other jurisdictions.

 

Microsoft is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries.

Windows is a registered trademark of Microsoft Corporation in the United States and other countries.

 

Other products or services mentioned in this document are identified by the trademarks or service marks of their respective owners.