Fixed Bugs for Platform LSF™ Version 7 Update 4

Release Date:   October 2008

 

The following bugs have been fixed in the October 2008 update (LSF 7 Update 4) since the May 2008 update (LSF 7 Update 3):

 

112864

Date

2008-09-24

 

Description

LIM cannot detect correct CPU and core number on non virtual machine if there is /proc/xen

 

Component

lim

 

Platform

linux

 

Impact

LIM gets wrong CPU and core number.

 

109205

Date

2008-09-23

 

Description

eexec is not run after interactive job finishes

 

Component

sbatchd

 

Platform

All

 

Impact

eexec does not run

 

107031

Date

2008-09-23

 

Description

SLA does not work with fairshare

 

Component

schmod_fairshare.so mbschd

 

Platform

All

 

Impact

SLA goal is not reached

 

109779

Date

2008-09-16

 

Description

lsload / lsplace -R "status==ok" does not show correct RES status properly when RES is down.

 

Component

Lim

 

Platform

All

 

Impact

lsload / lsplace does not show RES status properly. Users cannot pick hosts with RES daemons running.

 

109128

Date

2008-09-16

 

Description

No email sent for job idle exceptions

 

Component

mbatchd

 

Platform

Windows

 

Impact

LSF administrator is not notified about idle jobs

 


 

112643

Date

2008-09-11

 

Description

Jobs are pending with reason "New job is waiting for scheduling"

 

Component

mbatchd

 

Platform

All

 

Impact

Job will pend forever till system clock is reset. In a system where ntpd is enabled to sync up time, or other mechanisms which will roll back the system clock, some jobs will stay in pending status forever with reason "New job is waiting for scheduling;". The problem lies in synchronization between mbatchd and mbschd. Run badmin reconfig as workaround.

 

109183

Date

2008-09-01

 

Description

PAM and TS have exited, but LSF still reports the job as RUN

 

Component

pam

 

Platform

linux unix

 

Impact

Resources are occupied by the unfinished jobs

 

108900

Date

2008-08-29

 

Description

After upgrade, mbatchd restart batch commands do not respond for a long time

 

Component

Mbatchd

 

Platform

All

 

Impact

Admin is forced to remove the lsb.events file and restart mbatchd again to get the batch system working

 

106777

Date

2008-08-28

 

Description

When submitting a job with a project name which contains spaces, bacct -P cannot recognize the job

 

Component

Bacct

 

Platform

All

 

Impact

Cannot use bacct to check project name string which has spaces.

 

105490

Date

2008-08-28

 

Description

When customer defines "default" limit, the value applies to both user and user group. Customer wants a method to apply the limit only for user, not user group.

 

Component

mbschd mbatchd

 

Platform

All

 

Impact

Administrators have to configure user group limits one by one when they want to only apply "default" limit to user, not user group.

 


 

113378

Date

2008-08-27

 

Description

When RESOURCE_RESERVE_PER_SLOT is defined in lsb.params, a host-specific resource reported by elim does not report the correct value for the resource once a parallel job (that has reserved this resource) has started.

 

Component

Mbatchd

 

Platform

All

 

Impact

Resource reservation value is wrong

 

112514

Date

2008-08-27

 

Description

bhist prints out same job event status two times

 

Component

bhist

 

Platform

All

 

Impact

Misleading bhist output

 

112256

Date

2008-08-27

 

Description

High priority job does not preempt jobs with less slots

 

Component

schmod_preemption.so

 

Platform

All

 

Impact

OPTIMAL_MINI_JOB preemption policy does not work

 

113844

Date

2008-08-24

 

Description

log_jobdata(): Job failed in getHostFactor() message appears in execution cluster mbatchd log file, even though jobs can run successfully.

 

Component

Mbatchd

 

Platform

All

 

Impact

Misleading mbatchd log message

 

113299

Date

2008-08-21

 

Description

To modify user-specified requested processor through an esub, both LSB_SUB_NUM_PROCESSORS and LSB_SUB_MAX_NUM_PROCESSORS need to be specified and they need to be written to LSB_SUB_MODIFY_FILE in a certain order.

 

Component

bsub

 

Platform

All

 

Impact

This behavior can cause esub to not make a modification that it should.

 


 

111507

Date

2008-08-21

 

Description

NFS (root=nobody) MC-Lease mode purging lsb.lease.state file fails, lsb.lease.state.tmp left behind

 

Component

Mbatchd

 

Platform

All

 

Impact

NFS (root=nobody) MC-Lease purging lsb.lease.state file fails, lsb.lease.state.tmp left behind

 

112296

Date

2008-08-20

 

Description

When mbatchd restarts with bad events, lsb.events.0 is missing

 

Component

Mbatchd

 

Platform

All

 

Impact

History information for some jobs is lost.

 

112285

Date

2008-08-19

 

Description

If duplicate event logging is configured, when LSF_SHAREDIR goes down and comes back up, the child mbatchd, which writes to LSF_SHAREDIR, does not respond for 15 minutes and then dies.

 

Component

Mbatchd

 

Platform

All

 

Impact

Data duplication is delayed for 15 minutes. The parent mbatchd dies, and a new one starts which adds further to the delay.

 

113707

Date

2008-08-15

 

Description

MPI job gets different result by using LSF and out of LSF.

 

Component

intelmpi_wrapper mpich2_wrapper

 

Platform

Intelmpi_wrapper(all linux except cray) mpich2_wrapper(all except slurm)

 

Impact

MPI local option is set to global option. MPI Program may run with wrong arguments.

 

113264

Date

2008-08-14

 

Description

Mandatory first execution host does not work at queue level with "RES_REQ=span

 

Component

schmod_parallel.so

 

Platform

All

 

Impact

Mandatory first execution host does not work.

 

108727

Date

2008-08-14

 

Description

bjobs -G does not work as documented – behaves like bjobs -u

 

Component

Mbatchd

 

Platform

All

 

Impact

bjobs -G does not work as documented

 

110814

Date

2008-08-13

 

Description

SGI-MPI (vendor-MPI) mpirun options are not recognized by pam

 

Component

Pam

 

Platform

All

 

Impact

Cannot use SGI-MPI mpirun options on pam command line when using pam SGI-MPI integration

 

111728

Date

2008-08-07

 

Description

First execution node is not the same after migrating a parallel job in MultiCluster environment.

 

Component

Mbatchd

 

Platform

All

 

Impact

All jobs fail after migration

 

112053

Date

2008-08-06

 

Description

LDAP authentication for PMC not supported. Users cannot log in to PMC with local Linux account.

 

Component

PMC

 

Platform

All

 

Impact

Customers who use LDAP instead of NIS cannot log in to PMC with local Linux account.

 

97887

Date

2008-08-04

 

Description

Jobs with high LSF version will get lost if master and master candidate do not have the same version

 

Component

mbatchd

 

Platform

All

 

Impact

Jobs get lost.

 

111673

Date

2008-08-04

 

Description

bladmin reconfig fails on x86_64 platform

 

Component

bladmin blhosts

 

Platform

Unix

 

Impact

Cannot use bladmin reconfig to restart License Scheduler

 

110694

Date

2008-07-29

 

Description

Cross-queue fairshare scheduling does not work when two slave queues belonging to two cross-queue sets have the same priority

 

Component

mbschd

 

Platform

All

 

Impact

Cross-queue fairshare scheduling does not work

 

106508

Date

2008-07-29

 

Description

Users get incorrect emails about license overuse and number of available license counts

 

Component

Lim

 

Platform

All

 

Impact

Potential overuse of license resources

 

111150

Date

2008-07-27

 

Description

Job pends with reason "Job's resource requirements not satisfied"

 

Component

schmod_mc.so

 

Platform

All

 

Impact

Cannot submit job from with resource requirement specified

 

111717

Date

2008-07-25

 

Description

badmin reconfig dispatches more jobs than available licenses

 

Component

mbatchd

 

Platform

All

 

Impact

badmin reconfig dispatches more jobs than available licenses

 

111994

Date

2008-07-24

 

Description

Multiple clusters exist on a group of hosts so users can change environment from one cluster to another then back again. profile.lsf or/and cshrc.lsf do not set the environment correctly. The PATH environment variable is set to LSF_BINDIR of the wrong cluster.

 

Component

install

 

Platform

All

 

Impact

profile.lsf and cshrc.lsf set the wrong LSF environment.

 

111672

Date

2008-07-24

 

Description

Host does not have a software license

 

Component

lim

 

Platform

All

 

Impact

Cluster cannot work without restarting the LIM

 

111838

Date

2008-07-23

 

Description

Deadline constraint policy violated under specific run-window configurations

 

Component

mbatchd

 

Platform

All

 

Impact

Deadline constraint policy does not work. Jobs run when they should not.

 

110981

Date

2008-07-23

 

Description

Add submission and execution cluster name in email notification

 

Component

sbatchd

 

Platform

All

 

Impact

Email notification incomplete

 

109587

Date

2008-07-20

 

Description

Job pends forever after mbatchd restart with 'Dependency not statisfied'

 

Component

mbatchd

 

Platform

All

 

Impact

Job pends forever after mbatchd restart with 'Dependency not statisfied'

 

111569

Date

2008-07-17

 

Description

blimits -w cannot show full length of SLOT MEM TMP SWAP

 

Component

blimits

 

Platform

All

 

Impact

blimits -w cannot show full length of SLOT MEM TMP SWAP

 

106541

Date

2008-07-17

 

Description

Cannot query jobs using bhist by specifying user group

 

Component

bhist

 

Platform

All

 

Impact

bhist output incomplete

 

110799

Date

2008-07-15

 

Description

bsub ignores resource requirement string in the SchedulerParams section of jsdl - xml specification

 

Component

bsub

 

Platform

All

 

Impact

bsub ignores resource requirement string in the SchedulerParams section of jsdl - xml specification

 

110964

Date

2008-07-13

 

Description

Need improved LSF environment setup

 

Component

sbatchd

 

Platform

All

 

Impact

LSB_EXEC_CLUSTER and LSB_SUB_CLUSTER environment variables should be available for job preparation. However, these variables are not available if the job executes on the submission cluster. LSB_EXEC_CLUSTER variable should be available for all jobs.

 

89242

Date

2008-07-10

 

Description

Duplicate emails are received for same host exceptions

 

Component

mbatchd

 

Platform

All

 

Impact

Duplicate emails are received for same host exceptions

 

111051

Date

2008-07-09

 

Description

lsb.stream file permission is set to 600 after running bmod

 

Component

utopia/lsbatch/lib/liblsbstream.so

 

Platform

linux2.6-glibc2.3-x86_64

 

Impact

Users cannot read lsb.stream file

 

110938

Date

2008-07-08

 

Description

PERF stops writing to the LSB_EVENTS table if charged SAAP field is more than 64 chars

 

Component

PERF

 

Platform

All

 

Impact

Event data loading  stopped

 

110227

Date

2008-07-06

 

Description

lsmake gets error message if LSF daemon ports are defined in /etc/services instead of lsf.conf

 

Component

res

 

Platform

All

 

Impact

Cannot use the right res port.

 

109659

Date

2008-07-06

 

Description

Error messages about long project names are inconsistent

 

Component

bacct blimits bmod bhist bjobs mbatchd bsub

 

Platform

All

 

Impact

Confusing error messages

 

110576

Date

2008-07-04

 

Description

LD_LIBRARY_PATH gets reappended if user submits a job with job command environment

 

Component

sbatchd

 

Platform

All

 

Impact

User scripts fail

 


 

110322

Date

2008-06-30

 

Description

Incorrect error messages are logged in mbatchd log

 

Component

bld

 

Platform

unix

 

Impact

Incorrect messages are visible in logs

 

109930

Date

2008-06-30

 

Description

Job get dispatched to wrong host

 

Component

mbschd

 

Platform

All

 

Impact

Jobs fail because of incorrect execution host

 

108459

Date

2008-06-27

 

Description

CONTROL ACTION is invoked even though SLA has been met

 

Component

mbatchd

 

Platform

All

 

Impact

Incorrect job control action is invoked

 

107678

Date

2008-06-26

 

Description

eexec runs as an unexpected user group if LSF_EEXEC_USER is set

 

Component

sbatchd

 

Platform

All

 

Impact

eexec cannot run

 

106413

Date

2008-06-24

 

Description

Customer is using Solutions#89820 (Enhance bjobs/LSF batch API to just fetch summary information of jobs). When using this fix, bjobs fails with "xdr encode/decode error" if 100s of job IDs are specified at the same time for bjobs.

 

Component

bjobs mbatchd

 

Platform

All

 

Impact

user scripts that use a lot of job IDs in their bjobs query fail.

 

110461

Date

2008-06-23

 

Description

badmin hclose/hopen and lsadmin reslogon/reslogoff fails on slave host

 

Component

lsadmin badmin

 

Platform

All

 

Impact

badmin hclose/hopen and lsadmin reslogon/reslogoff fails on slave host

 


 

110026

Date

2008-06-19

 

Description

MBD never switches out LOG_SWITCH event

 

Component

mbatchd

 

Platform

All

 

Impact

MBD never switches out LOG_SWITCH event

 

87626

Date

2008-06-13

 

Description

Totalview integration requires sleep() to debug application

 

Component

 

 

Platform

All

 

Impact

Totalview integration should work around this problem.

 

107903

Date

2008-06-10

 

Description

If cpuset destroy API fails after rla restart, an error of status file corruption is reported in rla log

 

Component

rla

 

Platform

linux2.4-glibc2.2-sn-ipf linux2.4-glibc2.3-sn-ipf linux2.6-glibc2.3-sn-ipf linux2.6-glibc2.4-sn-ipf

 

Impact

rla cannot clean up left over cpusets

 

108199

Date

2008-05-27

 

Description

Job fails even though its tasks are successfully finished. pam waits for none-existent tasks then kills the job.

 

Component

pam

 

Platform

All

 

Impact

Job fails

 

106832

Date

2008-05-16

 

Description

New jobs are not queued at the bottom

 

Component

mbatchd

 

Platform

All

 

Impact

Dispatch order are not correct

 

105599

Date

2008-05-15

 

Description

When LSF_EAUTH_KEY is configured in /etc/lsf.sudoers jobs submitted through the Windows clients fail with error “C:\Documents and Settings\user1 >bsub -R "type==any" dir User permission denied. Job not submitted.”

 

Component

 

 

Platform

All

 

Impact

Cannot submit job through Windows client with LSF_EAUTH_KEY set in /etc/lsf.sudoers.


Technical Support

support@platform.com

www.platform.com

 

North America: +1 905 948 4297

Europe: +44 1256 370 530

Asia: +86 10 6238 1125

Toll-free: 1-877-444-4573

 

Platform Support

Platform Computing Corporation

3760 14th Avenue

Markham, Ontario

Canada L3R 3T7

Copyright

© 1994 - 2008 Platform Computing Corporation

All Rights Reserved.

Although the information in this document has been carefully reviewed, Platform Computing Corporation  (“Platform”) does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document.

UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED IN THIS DOCUMENT IS PROVIDED “AS IS” AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA, OR SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM.

 

Document redistribution policy : This document is protected by copyright and you may not redistribute or translate it into another language, in part or in whole. You may only redistribute this document internally within your organization (for example, on an intranet).

Trademarks

LSF is a registered trademark of Platform Computing Corporation in the United States and in other jurisdictions.

 

ACCELERATING INTELLIGENCE, THE BOTTOM LINE IN DISTRIBUTED COMPUTING, PLATFORM COMPUTING, CLUSTERWARE, PLATFORM ACTIVECLUSTER, IT INTELLIGENCE, SITEASSURE, PLATFORM SYMPHONY, PLATFORM JOBSCHEDULER, PLATFORM INTELLIGENCE, PLATFORM INFRASTRUCTURE INSIGHT, PLATFORM WORKLOAD INSIGHT, and the PLATFORM and LSF logos are trademarks of Platform Computing Corporation in the United States and in other jurisdictions.

 

UNIX is a registered trademark of The Open Group in the United States and in other jurisdictions.

 

Microsoft is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries.

Windows is a registered trademark of Microsoft Corporation in the United States and other countries.

 

Other products or services mentioned in this document are identified by the trademarks or service marks of their respective owners.