Release Date: May 2008
The following bugs have been fixed in the May 2008 update (LSF 7 Update 3) since the November 2007 update (LSF 7 Update 2):
95571 |
Date |
2008-04-29 |
|
Description |
brsvdel returns an exit code 0 when users try to delete a reservation ID which is not valid |
|
Component |
brsvdel |
|
Platform |
All |
|
Impact |
Affects customer scripts which expect a non zero return value. |
105479 |
Date |
2008-04-29 |
|
Description |
bjobs shows "exit" status, bjobs -l shows "zombi" status |
|
Component |
bjobs mbatchd |
|
Platform |
All |
|
Impact |
Cannot tell which status is correct, which may affect other running jobs |
104503 |
Date |
2008-04-29 |
|
Description |
The report of Active Job States Statistics by Queue is inaccurate. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Inaccurate data in LSF reports |
91838 |
Date |
2008-04-28 |
|
Description |
User job failed to query registry key of HKEY_CURRENT_USER |
|
Component |
sbatchd.exe |
|
Platform |
Windows |
|
Impact |
User job may fail |
72032 |
Date |
2008-04-28 |
|
Description |
In AIX, lsrcp cannot work properly when source and target files are identical. |
|
Component |
libbase.so liblsf.a res |
|
Platform |
AIX |
|
Impact |
The same file cannot be detected correctly on AIX |
103932 |
Date |
2008-04-28 |
|
Description |
Advance reservation files are not created under LSB_LOCALDIR when the duplicate event logging feature is turned on |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Users may lose advance reservation definitions |
102929 |
Date |
2008-04-28 |
|
Description |
brequeue is not executing quickly because mbd always handle pending signals on other jobs first and keep retrying for them |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Jobs cannot be brequeued because of being blocked by some jobs with re-trying pending signals. |
99763 |
Date |
2008-04-25 |
|
Description |
Both LS and LSF have to do scheduling to launch a job which has a requirement for an LS managed token. Often, there is contention for more than one type of resource (licenses, slots, memory are the usual ones). In a possible scenario, the job can satisfy all resource requirements but not the required LS token. In this case, mbatchd asks bld to reallocate for the demand. When bld does allocate sufficient tokens for the job, the scheduler may have consumed all slots already and the pending reason now changes to "not enough slots", and mbatchd removes the token demand. bld then deallocates based on lack of demand and the process can start all over when slots free up. |
|
Component |
mbatchd |
|
Platform |
UNIX |
|
Impact |
Jobs cannot be dispatched |
105542 |
Date |
2008-04-25 |
|
Description |
mbatchd silently shuts down sbatchd connection |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Reduced cluster usage and LSF mbatchd performance |
104394 |
Date |
2008-04-24 |
|
Description |
Project name is not assigned to user group name for job array when ENFORCE_FAIRSHARE_PROJ is enabled |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
User jobs may be charged with unexpected SAAP |
102799 |
Date |
2008-04-24 |
|
Description |
LSF integration with Linux-PAM to set resource limit for individual user/usergroups |
|
Component |
bqueues bjobs sbatchd mbatchd |
|
Platform |
All |
|
Impact |
None |
100453 |
Date |
2008-04-24 |
|
Description |
lim under Xen always reports hardware information through Xen command |
|
Component |
lim |
|
Platform |
All |
|
Impact |
Information reported by lim may be wrong |
104141 |
Date |
2008-04-23 |
|
Description |
"lsadmin limstartup" cannot start Symphony multi-head cluster |
|
Component |
badmin lsadmin |
|
Platform |
All |
|
Impact |
Cannot start multi-head cluster |
104026 |
Date |
2008-04-23 |
|
Description |
Misleading application profile error message from bsub |
|
Component |
n/a |
|
Platform |
All |
|
Impact |
Unclear error message |
106144 |
Date |
2008-04-22 |
|
Description |
Interactive job is terminated after sbatchd restart in LSF HPC/SLURM integartion. |
|
Component |
sbatchd |
|
Platform |
linux2.4-glibc2.3-ia64-slurm linux2.4-glibc2.3-x86-slurm linux2.4-glibc2.3-x86_64-slurm linux2.6-glibc2.3-ia64-slurm linux2.6-glibc2.3-x86_64-slurm |
|
Impact |
Interactive jobs are terminated prematurely after sbatchd restarts |
105498 |
Date |
2008-04-22 |
|
Description |
Code in xdr_parameterInfo() breaks backward compability |
|
Component |
bparams mbatchd |
|
Platform |
All |
|
Impact |
bparams may fail in a partially upgraded cluster |
104510 |
Date |
2008-04-21 |
|
Description |
mbatchd is killed by SIGKILL but no relevant messages are logged. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
No relevant mbatchd logs makes the troubleshooting difficult and time consuming |
106243 |
Date |
2008-04-20 |
|
Description |
With duplicate event logging enabled, mbatchd data replication child is slow. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
Batch system not available |
105385 |
Date |
2008-04-18 |
|
Description |
Queue level and application level order[] string should not be ignored by any resource requirement |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Incorrect resource requirements processing |
91148 |
Date |
2008-04-17 |
|
Description |
LSF dispatches a job submitted with a RUNLIMIT even though the job run time overlaps with an existing advance reservation. |
|
Component |
bsub |
|
Platform |
All |
|
Impact |
Advance reservation cannot be used reliably. |
96936 |
Date |
2008-04-16 |
|
Description |
ntblstatus command exists on a system not equipped with Federation switches. |
|
Component |
poejob |
|
Platform |
AIX 5-32 AIX 5-64 |
|
Impact |
Job fails because LSF detects wrong integration type |
105445 |
Date |
2008-04-15 |
|
Description |
mbatchd hangs when duplicate event switching is turned on |
|
Component |
mbatchd |
|
Platform |
Linux 2.6 |
|
Impact |
mbatchd cannot start |
92592 |
Date |
2008-04-10 |
|
Description |
mbschd schedules jobs slowly with more than 300 resource requirements |
|
Component |
schmod_default.so mbschd |
|
Platform |
All |
|
Impact |
Job dispatch is slow |
105549 |
Date |
2008-04-10 |
|
Description |
lim is unlicensed on HP-XC on linux2.6-glibc2.3-x86_64 |
|
Component |
lim |
|
Platform |
linux2.6-glibc2.3-x86_64 |
|
Impact |
lim is unlicensed |
103073 |
Date |
2008-04-10 |
|
Description |
Replace ssh with blaunch and the job fails right away. |
|
Component |
blaunch |
|
Platform |
All |
|
Impact |
After replacing ssh with blaunch, some jobs fail |
102687 |
Date |
2008-04-08 |
|
Description |
Redirect command "cleartool setview -exec "sbatchd -d /sw/platform/lsf/conf -s 8:6 -2" view" stdout/stderr into /tmp/daemons.wrap.log |
|
Component |
daemons.wrap daemon.wrap |
|
Platform |
UNIX and Linux |
|
Impact |
None |
99043 |
Date |
2008-04-06 |
|
Description |
res periodically posts requests to master lim, causing master lim to slow down |
|
Component |
res |
|
Platform |
All |
|
Impact |
Master lim slows down |
96946 |
Date |
2008-04-01 |
|
Description |
Cannot limit the job range for fairshare queue dynamic user |
|
Component |
sbatchd mbatchd |
|
Platform |
All |
|
Impact |
Customer enhancement |
103436 |
Date |
2008-03-31 |
|
Description |
lsbevents loader core dumps parsing events with parallel jobs. lsb.stream is read by the loader. |
|
Component |
LSF reporting |
|
Platform |
All |
|
Impact |
lsb.streams is not read by the lsbevents loader. All job related reports have no data. |
105555 |
Date |
2008-03-30 |
|
Description |
lim logs incorrect license requirement when more than 10,000 licenses needed |
|
Component |
lim |
|
Platform |
All |
|
Impact |
Message in LIM log with incorrect license count |
99731 |
Date |
2008-03-27 |
|
Description |
Single parameter value in form of space seperated multiple string is broken to multiple parameter values in mpichp4 integration |
|
Component |
mpirun.lsf mpichp4_wrapper |
|
Platform |
All |
|
Impact |
User application failure due to wrong parameters passed in |
95661 |
Date |
2008-03-27 |
|
Description |
With SLOTS_PER_PROCESSOR is set in lsb.resources, A job using advance reservation cannot be dispatched after suspending another normal job for slot during the active perioid of the reservation. |
|
Component |
schmod_advrsv.so |
|
Platform |
All |
|
Impact |
Idle slots unusable, the job pends. |
68729 |
Date |
2008-03-27 |
|
Description |
When an advance reservation is active, and non advance reservation jobs are suspended, but the queue-level slots are not released defined by "QJOB_LIMIT". So users still cannot use the cpu reserved by advanced reservation. |
|
Component |
|
|
Platform |
All |
|
Impact |
Users cannot use slots reserved by advanced reservation. |
87837 |
Date |
2008-03-25 |
|
Description |
mbatchd log is filled up with "rusageJob: Job fails in getJobData()" if a job is forced to be killed |
|
Component |
bkill mbatchd |
|
Platform |
All |
|
Impact |
mbatchd logs many error messages and job remains running in sbatchd |
103632 |
Date |
2008-03-24 |
|
Description |
Support memory limit on both process level and job level in HPC cluster |
|
Component |
bapp libbatch.so libbat.a sbatchd mbatchd |
|
Platform |
All |
|
Impact |
Customer enhancement |
96563 |
Date |
2008-03-20 |
|
Description |
NTBL_JOB_KEY in poejob is not generated correctly -- could be out of bounds |
|
Component |
poejob |
|
Platform |
AIX 5-32, AIX5-64 |
|
Impact |
Job fails |
96209 |
Date |
2008-03-20 |
|
Description |
InfiniBand integration on AIX needs to handle unavailable when IBM API returns zero ports |
|
Component |
lsnrt_windows |
|
Platform |
AIX 5-32, AIX 5-64 |
|
Impact |
Job fails if no ports are available for POE jobs over InfiniBand |
95121 |
Date |
2008-03-19 |
|
Description |
Network ID configured to be a large number with IPV6 enabled on AIX POE over InfiniBand becomes 0 when loading nrt windows |
|
Component |
lsnrt_windows |
|
Platform |
AIX 5.3-32, AIX 5.3-64 |
|
Impact |
nrt windows cannot be loaded correctly |
105152 |
Date |
2008-03-17 |
|
Description |
rla may die if it cannot delete a cpuset. |
|
Component |
rla |
|
Platform |
cpuset integration |
|
Impact |
Minimal impact observed by the user since a new rla is started by the sbatchd. |
101658 |
Date |
2008-03-12 |
|
Description |
In MultiCluster environment, user submits a job to the remote cluster; - When the job has already done in the remote cluster, user still can see it from the local cluster. job usage information not sent back to submission cluster. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Job is not returned from the remote execution cluster. The job is gone from the remote cluster but still in running status in the submission cluster. bkill and bkill -r do not remove the job.
"badmin mbdrestart" on the remote cluster needed to remove the job. Then, the job is in pending mode again in the submission cluster (and is dispatched again) and can be killed. |
97668 |
Date |
2008-03-11 |
|
Description |
LSF 7.0 LIM appends lsf.conf parameters in ego.conf without any warning or notification |
|
Component |
LIM |
|
Platform |
All |
|
Impact |
Customer is not aware of changes made to ego.conf |
102521 |
Date |
2008-03-11 |
|
Description |
Job information is not cleaned out of fairshare queue if EXPIRED_HOURS is greater than or equal to CLEAN_PERIOD |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Incorrect job information |
103166 |
Date |
2008-03-10 |
|
Description |
Output file can be written to root directory violating permissions |
|
Component |
sbatchd |
|
Platform |
AIX |
|
Impact |
Security risk. |
78745 |
Date |
2008-03-10 |
|
Description |
Job name with wildcard characters does not match all matching jobs if JOB_DEP_LAST_SUB is set. |
|
Component |
bparams mbatchd |
|
Platform |
All |
|
Impact |
Scripts that depend on the correct behavior do not work. |
100261 |
Date |
2008-03-10 |
|
Description |
Code bug in mbatchd causing confusing debugging data |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Hard for support to do troubleshooting |
102671 |
Date |
2008-03-04 |
|
Description |
bsub may reject job submission even with a valid resource requirement string |
|
Component |
bsub |
|
Platform |
All |
|
Impact |
Job fails to submit even with a valid resource requirement string |
101470 |
Date |
2008-03-03 |
|
Description |
LSF HPC pam cannot start without setting LSF_LIBDIR. |
|
Component |
pam |
|
Platform |
All |
|
Impact |
LSF HPC jobs fail |
102499 |
Date |
2008-02-29 |
|
Description |
Non-admin user with no permission to a specific queue can bmod -q jobs to it if the jobs are forwarded from another cluster in MultiCluster environment |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Non-admin user can use bmod -q to modify the jobs forwarded from submission cluster to a queue to which the user has no permission |
102218 |
Date |
2008-02-29 |
|
Description |
bjobs -l and bacct -l show jobs consuming more memory and/or swap space than the max value the given host has. |
|
Component |
res |
|
Platform |
windows |
|
Impact |
Host is not available, affecting scheduling of other jobs. |
102690 |
Date |
2008-02-28 |
|
Description |
Submitting a JSDL job with non-TTY mode fails |
|
Component |
bsub |
|
Platform |
All |
|
Impact |
Job fails |
96667 |
Date |
2008-02-25 |
|
Description |
Output of bpeek on other machine is slower than the output of bpeek on job execution host |
|
Component |
bpeek |
|
Platform |
All |
|
Impact |
Performance issue |
101311 |
Date |
2008-02-21 |
|
Description |
No RMS distribution tar file available on Platform FTP site |
|
Component |
all |
|
Platform |
rms2.82-linux2.6-glibc2.3-ia64 |
|
Impact |
Customers cannot upgrade to LSF7.0 EP2 |
99696 |
Date |
2008-02-19 |
|
Description |
PAM HP MPI integration does not work |
|
Component |
pam |
|
Platform |
HP-UX, Linux2.6-glibc2.3-x86_64 |
|
Impact |
If host IP addresses contain 0, pam loses track of remote processes and job is shut down prematurely. |
102059 |
Date |
2008-02-02 |
|
Description |
LIM failed to start with "Run time error R6034" on Windows XP-x64 and Windows 2003-x64 |
|
Component |
lsf6.2_win.exe |
|
Platform |
win2003-x64 win2003-ia64 |
|
Impact |
LIM cannot start on win2003-x64 and win2003-ia64 |
100456 |
Date |
2008-02-01 |
|
Description |
First character of output is missing for jobs submitted with bsub -Ip/Is option |
|
Component |
res |
|
Platform |
Linux 2.4 |
|
Impact |
Cannot get exact output of the job. |
94763 |
Date |
2008-01-31 |
|
Description |
Cannot use lsrcp on files larger than 2 GB across AIX and HP-UX machines |
|
Component |
lsrcp res |
|
Platform |
AIX 5-32, AIX 5-64, HP/UX 11-32, HP/UX 11-64 |
|
Impact |
lsrcp does not work |
99364 |
Date |
2008-01-28 |
|
Description |
lstcsh missing in x86_64 or ia64 packages |
|
Component |
lstcsh |
|
Platform |
linux2.6-glibc2.3-ia64 linux2.6-glibc2.3-x86_64 |
|
Impact |
lstcsh not available |
100443 |
Date |
2008-01-28 |
|
Description |
With short jobs, EGO-SLA cannot get enough slots even there are free slots |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
SLA performance is not good. |
99662 |
Date |
2008-01-27 |
|
Description |
Jobs submitted to HP-UX IA64 exit with code 9 |
|
Component |
sbatchd |
|
Platform |
HP-UX 11.31 IA64 |
|
Impact |
Jobs fail |
96900 |
Date |
2008-01-25 |
|
Description |
bsub on floating clients do not retry during lim restart, and get a wrong error message |
|
Component |
lim |
|
Platform |
All |
|
Impact |
bsub on floating clients do not retry during lim restart |
96047 |
Date |
2008-01-25 |
|
Description |
Customer gets a confusing email when job is killed because "bsub -t" time constraint is earlier than "submission time +RUNLIMIT" |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Inaccurate email message |
93768 |
Date |
2008-01-17 |
|
Description |
Adapter windows are not cleaned due to signal sent to the clean up program by LSF and conflicts in strtok() library calls |
|
Component |
ntbl_api lsntbl_api poe_w poejob lsnbl_api |
|
Platform |
AIX5-32, AIX5-64 |
|
Impact |
Jobs fail or are not dispatched because of lack of adapter windows |
101146 |
Date |
2008-01-16 |
|
Description |
brsvs fails for system advance reservations on a host with MXJ=0 |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
brsvs does not work. |
99700 |
Date |
2008-01-15 |
|
Description |
Job does not dispatch after job is requeued several times |
|
Component |
mbschd |
|
Platform |
All |
|
Impact |
Jobs are not dispatched |
100544 |
Date |
2008-01-09 |
|
Description |
After a job is DONE in a fairshare queue and EXPIRED_HOURS is set, mbatchd goes down and does not restart. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Batch system not available |
100642 |
Date |
2008-01-08 |
|
Description |
Chunk jobs submitted using job arrays stay in WAIT status. When job ID is equal to mbatchd PID on the remote execution host, sbatchd loses track of the job. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
Chunk job feature not available |
97704 |
Date |
2008-01-02 |
|
Description |
Run limit that bhist displays on execution host is changed |
|
Component |
bhist |
|
Platform |
All |
|
Impact |
Incorrect run time is displayed by bhist |
99424 |
Date |
2007-12-26 |
|
Description |
LSF does not schedule job according to host preference when resource_reserve and host partition fairshare is defined but not used |
|
Component |
schmod_reserve.so schmod_parallel.so |
|
Platform |
All |
|
Impact |
Jobs may be scheduled incorrectly to low preference host |
98150 |
Date |
2007-12-13 |
|
Description |
pem/vemkd memory usage jumps to GB |
|
Component |
pem, vemkd |
|
Platform |
All |
|
Impact |
Machines running out of memory |
97588 |
Date |
2007-11-20 |
|
Description |
Wrong dependency syntax may cause MBD hang. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
mbatchd system is down |
95120 |
Date |
2007-11-08 |
|
Description |
poejob should export MP_MSG_API |
|
Component |
poejob |
|
Platform |
AIX 5.3-32, AIX 5.3-64 |
|
Impact |
Job runs over IP instead of InfiniBand |
90702 |
Date |
2007-10-19 |
|
Description |
Job starter cannot register task start event to pam due to host name resolution issue |
|
Component |
taskstarter pam |
|
Platform |
All |
|
Impact |
When task registration fails, pam kills the job |
support@platform.com
www.platform.com
North America: +1 905 948 4297
Europe: +44 1256 370 530
Asia: +86 10 6238 1125
Toll-free: 1-877-444-4573
Platform Support
Platform Computing Corporation
3760 14th Avenue
Markham, Ontario
Canada L3R 3T7
© 1994 - 2008 Platform Computing Corporation
All Rights Reserved.
Although the information in this document has been carefully reviewed, Platform Computing Corporation (“Platform”) does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document.
UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED IN THIS DOCUMENT IS PROVIDED “AS IS” AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA, OR SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM.
Document redistribution policy : This document is protected by copyright and you may not redistribute or translate it into another language, in part or in whole. You may only redistribute this document internally within your organization (for example, on an intranet).
LSF is a registered trademark of Platform Computing Corporation in the United States and in other jurisdictions.
ACCELERATING INTELLIGENCE, THE BOTTOM LINE IN DISTRIBUTED COMPUTING, PLATFORM COMPUTING, CLUSTERWARE, PLATFORM ACTIVECLUSTER, IT INTELLIGENCE, SITEASSURE, PLATFORM SYMPHONY, PLATFORM JOBSCHEDULER, PLATFORM INTELLIGENCE, PLATFORM INFRASTRUCTURE INSIGHT, PLATFORM WORKLOAD INSIGHT, and the PLATFORM and LSF logos are trademarks of Platform Computing Corporation in the United States and in other jurisdictions.
UNIX is a registered trademark of The Open Group in the United States and in other jurisdictions.
Microsoft is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries.
Windows is a registered trademark of Microsoft Corporation in the United States and other countries.
Other products or services mentioned in this document are identified by the trademarks or service marks of their respective owners.