Release Date: June 2011
The following bugs have been fixed in LSF Version 8.0.1 since the January 2011 release of LSF 8.0:
163071 |
Date |
2011-1-10 |
|
Description |
Master LIM core dumps when handling incorrect resource data from a dubious host. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
System is not stable. |
163232 |
Date |
2010-12-27 |
|
Description |
sbatchd hangs when child exits, due to a problem in SIGCHLD signal handler. |
|
Component |
Sbatchd |
|
Platform |
All |
|
Impact |
New jobs cannot run, and running jobs cannot be killed or stopped. |
164408 |
Date |
2011-1-4 |
|
Description |
mbatchd core dumps when switching events files if files are corrupted. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
System is not stable. |
164360 |
Date |
2011-1-4 |
|
Description |
SGI changed their product name, so LSF cannot identify the Propack version, and rla cannot start. |
|
Component |
rla sbatchd schmod_cpuset.so brlainfo |
|
Platform |
linux2.6-glibc2.3-x86_64 linux2.6-glibc2.4-sn-ipf |
|
Impact |
SGI cpuset feature does not work. |
158587 |
Date |
2011-1-12 |
|
Description |
Some jobs pend for a short time with the pending reason "CPUSET Scheduler Plugin internal error". |
|
Component |
mbschd schmod_cpuset.so schmod_parallel.so |
|
Platform |
Linux |
|
Impact |
Pending reason is not clear. |
165276 |
Date |
2011-1-17 |
|
Description |
There is an EACCES error information in the NFS log. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
When limiting access authorization of root in the NFS and enabling LSB_QUERY_PORT in LSF, if the query child mbatchd hangs, a new mbatchd cannot start automatically. |
165387 |
Date |
2011-1-24 |
|
Description |
When set MIN_SWITCH_PERIOD>600, there is EACCES error information in the NFS server log. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Files cannot be removed and corresponding error messages appear in the log file. |
165746 |
Date |
2011-1-25 |
|
Description |
LIM reports the wrong number of CPUs and cores on an Xeon7542 host. |
|
Component |
lim |
|
Platform |
Linux |
|
Impact |
Product licensing is affected. |
127716 |
Date |
2010-12-29 |
|
Description |
bmod cannot modify some parameters for running jobs with rusage containing "||". |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
bmod does not work for running jobs with rusage including "||". |
163118 |
Date |
2010-12-24 |
|
Description |
bjobs/bhist -l does not show the correct checkpoint directory name for array jobs. |
|
Component |
bjobs bhist |
|
Platform |
All |
|
Impact |
bjobs/bhists –l checkpoint directory name for array jobs is incorrect. |
163709 |
Date |
2010-12-29 |
|
Description |
LSB_MC_INITFAIL_RETRY does not work when the execution host does not recognize the submission host. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Job retries indefinitely. |
159518 |
Date |
2010-11-12 |
|
Description |
Job dependency behaves unexpectedly when job is requeued. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Job dependency does not work consistently. |
162602 |
Date |
2010-12-16 |
|
Description |
LSF maximum password length is documented as 31 characters, but only 23 characters is supported. |
|
Component |
lim lspasswd |
|
Platform |
All |
|
Impact |
lspasswd’s behavior is not consistent with the document. |
161280 |
Date |
2010-12-10 |
|
Description |
mbatchd log message is not clear for usergroups when defining user shares for a user group with group member "all". |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Unexpected error messages like “uDataSharesConfigure(): File /home/mike/LSF7/LSF706/conf/lsbatch/LSF706/configdir/lsb.users Section usergroup : The group <mike> cannot be assigned a share within <g> because it is not a member of <g>” confuse customer. |
164403 |
Date |
2011-1-12 |
|
Description |
Job status alters continuously between SSUSP and RUN if stop/resume condition is "select [status == busy]" or "select [status != busy]". |
|
Component |
schmod_default.so mbatchd |
|
Platform |
All |
|
Impact |
The running job does not stop as desired. |
166148 |
Date |
2011-1-27 |
|
Description |
The load of available hosts does not increase after running lsplace. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
lsplace may not indicate the best host for the task. |
162994 |
Date |
2011-1-27 |
|
Description |
A sequential cpuset job with memory requirement pends even when slots are free. |
|
Component |
mbschd |
|
Platform |
Linux |
|
Impact |
Job pends unnecessarily. |
166342 |
Date |
2011-1-31 |
|
Description |
Writing the persistence file fails when issuing brsvmod commands to change AR. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
AR modifications are lost after “mbatchd restart”. |
166359 |
Date |
2011-2-11 |
|
Description |
There are many "SCH_FM_insertJobTrigger: Try to insert an existing job messages in mbschd.log after issuing “badmin reconfig”. |
|
Component |
mbschd |
|
Platform |
All |
|
Impact |
Confusing messages. |
166205 |
Date |
2011-1-30 |
|
Description |
The message "getJRusageForPid_: Failed to open job object for pid <*>" appears in sbatchd log file despite successful job execution. |
|
Component |
sbatchd.exe |
|
Platform |
Windows |
|
Impact |
Log file grows unnecessarily. |
154145 |
Date |
2010-8-12 |
|
Description |
mbschd log message may cause buffer overflow. |
|
Component |
schmod_default.so |
|
Platform |
All |
|
Impact |
The scheduler does not work. |
166912 |
Date |
2011-2-14 |
|
Description |
Simultaneous "bjobs" command calls increase the load of the master LIM. |
|
Component |
bjobs |
|
Platform |
All |
|
Impact |
Unnecessary ls_info calls impact performance. |
166573 |
Date |
2011-2-15 |
|
Description |
RES hangs when child exits, due to a problem in SIGCHLD signal handler. |
|
Component |
res |
|
Platform |
All |
|
Impact |
Job is stuck in the RUN state. |
162994 |
Date |
2011-1-27 |
|
Description |
A sequential cpuset job with memory requirement pends even when slots are free. |
|
Component |
mbschd |
|
Platform |
Linux |
|
Impact |
Job pends unnecessarily. |
167504 |
Date |
2011-2-25 |
|
Description |
The output of a checkpoint job cannot be logged to the job's output file on a Windows host. |
|
Component |
sbatchd.exe |
|
Platform |
Windows |
|
Impact |
Checkpoint job output is not available. |
167267 |
Date |
2011-3-11 |
|
Description |
badmin ckconfig takes time if host group is defined in lsb.hosts. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Configuration checking performance is poor. |
166012 |
Date |
2011-3-11 |
|
Description |
Some temporary files under $LSF_TMPDIR cannot be deleted. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
The file directory grows large. |
168306 |
Date |
2011-3-16 |
|
Description |
If a user's home is on a remote shared directory, job files are not cleaned after job is done and post exec is marked as failed. |
|
Component |
sbatchd.exe |
|
Platform |
Windows |
|
Impact |
Job files are not cleaned after job is done and post exec is marked as failed but actually the post exec process succeeds. |
169166 |
Date |
2011-3-17 |
|
Description |
LSF_VPLUGIN value not accepted from the environment variable. |
|
Component |
PAM |
|
Platform |
All |
|
Impact |
HP MPI jobs fail because of missing libraries. |
169324 |
Date |
2011-3-24 |
|
Description |
Job res sleeps 5 seconds before it exits for interactive jobs. |
|
Component |
res |
|
Platform |
All |
|
Impact |
Job throughput is low for interactive jobs. |
169100 |
Date |
2011-4-1 |
|
Description |
The I/O value reported by "lsload -l" is not accurate. |
|
Component |
lim |
|
Platform |
Linux Solaris AIX |
|
Impact |
The customer gets wrong information about I/O load. |
170112 |
Date |
2011-4-6 |
|
Description |
LSF daemon names show as crashed in system log for messages logged by LSF daemons, if LSF_LOG_DIR isn't defined in lsf.conf |
|
Component |
lim pim res sbatchd mbatchd mbschd |
|
Platform |
All |
|
Impact |
Customer cannot get daemon name in the system log file. |
168458 |
Date |
2011-3-21 |
|
Description |
bsub -W reports confusing message of "Bad CPU limit specification. Job not submitted." when the run time limit specifed is too large. |
|
Component |
bsub |
|
Platform |
All |
|
Impact |
Warning information is confusing. |
170955 |
Date |
2011-4-19 |
|
Description |
bsub core dumps when esub sets an empty env value in LSB_SUB_MODIFY_FILE. |
|
Component |
bsub |
|
Platform |
All |
|
Impact |
Cannot submit jobs. |
170819 |
Date |
2011-4-27 |
|
Description |
The customer requires smaps file to calculate memory usage on RedHat Linux 2.6.9-55. |
|
Component |
pim |
|
Platform |
Linux |
|
Impact |
Enables calculation of memory usage more precise. |
170129 |
Date |
2011-4-30 |
|
Description |
LSF should set OMP_NUM_THREADS automatically for openmp jobs. |
|
Component |
res mpirun.lsf openmpi_wrapper |
|
Platform |
UNIX |
|
Impact |
OMP_NUM_THREADS must be set up manually. |
171041 |
Date |
2011-5-12 |
|
Description |
Interactive job keeps running after job submission node is rebooted. |
|
Component |
res |
|
Platform |
All |
|
Impact |
Waste of resources. |
172915 |
Date |
2011-5-17 |
|
Description |
Clients are reported as unavailable by master in LIM log. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
Confusing messages in the LIM log. |
172238 |
Date |
2011-5-17 |
|
Description |
After mbatchd restart, finished job status becomes PEND. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Confusing job info. |
167400 |
Date |
2011-2-21 |
|
Description |
If the path for job file contains space, tssub job fails. |
|
Component |
sbatchd |
|
Platform |
Windows |
|
Impact |
tssub.exe does not work. |
169758 |
Date |
2011-4-6 |
|
Description |
blaunch shows XDR encode/decode error. |
|
Component |
res |
|
Platform |
All |
|
Impact |
Parallel jobs with blaunch may fail. |
170796 |
Date |
2011-4-28 |
|
Description |
If the command following blaunch has quotation marks, it will fail when it worked in a previous version. |
|
Component |
res |
|
Platform |
UNIX/Linux |
|
Impact |
Compatibility is broken. |
Release Date: January 2011
The following bugs have been fixed in LSF Version 8.0 since the September 2009 update (LSF 7 Update 6):
138040 |
Date |
2009-11-18 |
|
Description |
You remove EGO_DAEMONS_CPUS and run "lsadmin limrestart", but lim is still bound to the CPU. |
|
Component |
lim |
|
Platform |
Linux 2.6 |
|
Impact |
The configuration change does not take effect. |
145947 |
Date |
2010-03-17 |
|
Description |
The vemkd socket connection keeps growing until the file descriptor limit is exhausted. |
|
Component |
vemkd |
|
Platform |
All |
|
Impact |
The file descriptors on the LSF master host are exhausted. |
152787 |
Date |
2010-09-21 |
|
Description |
The JOB_SPOOL_DIR sometimes fails to be detected when the environment is unstable. |
|
Component |
sbatchd |
|
Platform |
Linux/UNIX |
|
Impact |
Some of the jobs do not use the spool directory and job files are left over. |
140403 |
Date |
2010-01-12 |
|
Description |
The mbatchd daemon does not give a warning message for fairshare configuration errors. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
A job may not be charged to the correct fairshare account. |
156734 |
Date |
2010-09-30 |
|
Description |
The bqueues -l command shows job runTime overflow. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The counter for fairshare charges is incorrect. |
154126 |
Date |
2010-08-20 |
|
Description |
The bjobs -W command sometimes displays negative CPU time. |
|
Component |
bjobs |
|
Platform |
All |
|
Impact |
The bjobs -W command output is incorrect. |
138888 |
Date |
2009-11-12 |
|
Description |
The bjobs -A command shows jobs from other users when NEWJOB_REFRESH=Y is set in lsb.params. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Confusion and possible network congestion when other users’ jobs are shown. |
145426 |
Date |
2010-06-01 |
|
Description |
Improve mbschd performance for job arrays. |
|
Component |
mbatchd schmod_limit.so mbschd |
|
Platform |
All |
|
Impact |
Improve mbschd performance for job arrays. |
144960 |
Date |
2010-04-07 |
|
Description |
Hosts become unlicensed for a period of time after a daemon is down. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
Hosts become unlicensed for a period of time. |
140317 |
Date |
2009-12-25 |
|
Description |
The lspasswd command does not validate the passwd.lsfuser file. |
|
Component |
lspasswd.exe |
|
Platform |
Windows |
|
Impact |
Jobs pass the lspasswd check, then pend because the password is wrong. |
143648 |
Date |
2010-03-17 |
|
Description |
Jobs sometimes pend by mistake because ENFORCE_ONE_UG_LIMITS=Y. |
|
Component |
schmod_limit.so |
|
Platform |
All |
|
Impact |
Jobs are in the pending state when they should not be. |
135419 |
Date |
2009-09-29 |
|
Description |
The bmod command cannot modify the resource of a job submitted from a floating client host for several minutes after cluster reconfiguration. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
This affects usability and performance. |
151841 |
Date |
2010-07-06 |
|
Description |
The pwm.sys sometimes causes Windows to hang. |
|
Component |
pwm.sys |
|
Platform |
Windows |
|
Impact |
The host does not respond. |
144432 |
Date |
2010-03-15 |
|
Description |
Hosts become unlicensed for a short period of time. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
Client requests are rejected while the host is unlicensed. |
137028 |
Date |
2009-10-27 |
|
Description |
Suspended jobs with a memory requirement are not resumed. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
A job stays suspended even when resources are available. |
143328 |
Date |
2010-01-28 |
|
Description |
The CPU load is not reduced when a job is pending with PEND_NO_MAPPING. |
|
Component |
sbatchd |
|
Platform |
Linux 2.6 |
|
Impact |
Job binding is unbalanced. |
142540 |
Date |
2010-01-19 |
|
Description |
The lim daemon reports the wrong maximum memory when physical memory is bigger than 4 GB. |
|
Component |
lim |
|
Platform |
MacOS |
|
Impact |
Memory size is incorrect. |
135524 |
Date |
2009-09-25 |
|
Description |
Jobs are killed when you use a port scanner. |
|
Component |
res |
|
Platform |
All |
|
Impact |
LSF jobs cannot run under a port scanner. |
139350 |
Date |
2009-11-27 |
|
Description |
The child process of lim inherits the CPU binding when you run lsload. |
|
Component |
lim |
|
Platform |
Linux 2.6 |
|
Impact |
The child process of lim runs on the same CPU as lim does, the master lim performance is affected. |
133552 |
Date |
2009-09-01 |
|
Description |
The badmin perfmon view displays Total Queries, but it is not a total for user queries, and it is not the total of the given columns or rows. It is the total number of requests processed by mbatchd, which includes user queries (b* command queries) as well as internal transactions. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The term "total queries" is confusing. |
139318 |
Date |
2010-04-28 |
|
Description |
A License Scheduler job cannot be resumed after the duration expires. |
|
Component |
sbatchd mbatchd |
|
Platform |
All |
|
Impact |
The job cannot be resumed. |
143145 |
Date |
2010-01-29 |
|
Description |
You cannot submit a job from Linux to Windows using the special characters “ >”, “<”, or “|”. |
|
Component |
bsub |
|
Platform |
UNIX |
|
Impact |
Your application cannot use the special characters “ >”, “<”, or “|”. |
140575 |
Date |
2009-12-11 |
|
Description |
The bhist command displays a format error for the EVENT_JOB_ATTA_DATA event. |
|
Component |
bhist |
|
Platform |
All |
|
Impact |
This affects the viewing and collecting of job information. |
148839 |
Date |
2010-05-18 |
|
Description |
You define rusage[resourceA||resourceB], but if the limit for resourceA is reached, resourceB is not used. |
|
Component |
schmod_default.so |
|
Platform |
All |
|
Impact |
The "OR" relationship does not take effect for a sequential job. |
139810 |
Date |
2009-12-01 |
|
Description |
CLEARCASE_ROOT is not set to the correct path, and the job fails. Information such as job ID and time stamp are not logged by daemons.wrap. |
|
Component |
daemons.wrap |
|
Platform |
UNIX |
|
Impact |
If CLEARCASE_ROOT is not set correctly, jobs fail, and the log messages are not clear. |
154813 |
Date |
2010-08-27 |
|
Description |
The bjobs command does not work as expected. |
|
Component |
bjobs mbatchd |
|
Platform |
All |
|
Impact |
The bjobs command does not work as expected. |
136780 |
Date |
2009-10-14 |
|
Description |
You cannot close an unavailable host. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
You cannot close an unavailable host. |
152172 |
Date |
2010-12-08 |
|
Description |
There are some errors message in mbschd.log, and mbschd consumes a lot of memory. |
|
Component |
mbschd |
|
Platform |
All |
|
Impact |
The mbschd daemon cannot dispatch jobs normally. |
158777 |
Date |
2010-11-17 |
|
Description |
The master lim hangs on Solaris 8. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
The master lim hangs. |
153499 |
Date |
2010-07-30 |
|
Description |
The lspasswd command times out when the secondary master host is unavailable. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
You cannot update your LSF password. |
151103 |
Date |
2010-07-02 |
|
Description |
You delete the C$ share and run the installer, the error is "Cannot start LSF Workflow Monitor service on host wxp64." |
|
Component |
Installer |
|
Platform |
All |
|
Impact |
You cannot install LSF. |
137919 |
Date |
2009-11-19 |
|
Description |
With two EGO SLA consumers, one consumer has to reclaim resources to run jobs. The command “ego consumer alloc” shows resources have been allocated, but jobs are very slow to start. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Priority jobs do not start on time, job submission takes 25-30 minutes with reclaim instead of 1-2 minutes without it. |
138036 |
Date |
2009-11-01 |
|
Description |
The lim and license management software logs show false error messages indicating that features are not available or that the client and license server license files are not synchronized. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
You see error messages but there are no errors. |
140674 |
Date |
2009-12-15 |
|
Description |
To address performance and scalability issues with job scheduling, add timing debug messages in mbatchd and mbschd. |
|
Component |
mbatchd mbschd schmod_default.so schmod_fairshare.so schmod_limit.so |
|
Platform |
All |
|
Impact |
This improves debugging performance. |
138798 |
Date |
2009-11-10 |
|
Description |
Loading libptmalloc3.so in mbatchd may cause a problem with egroup. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The egroup could fail. |
150106 |
Date |
2010-06-08 |
|
Description |
A job's reservation of a numeric resource is lost after mbatchd reconfiguration, restart, or if another resource reservation expired. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
A job fails because the resource reservation is lost. |
146803 |
Date |
2010-04-07 |
|
Description |
Memory leak. |
|
Component |
libbat.a |
|
Platform |
All |
|
Impact |
Memory leak. |
147515 |
Date |
2010-04-14 |
|
Description |
ENFORCE_ONE_UG_LIMITS does not affect the pending job limit set for the user group. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The limit applies to all user groups. |
146311 |
Date |
2010-04-16 |
|
Description |
Over-preemption happens when there are high priority pending jobs with unsatisfied resource requirements. |
|
Component |
schmod_preemption.so |
|
Platform |
All |
|
Impact |
The time to finish all jobs is much longer than usual. |
139186 |
Date |
2009-12-16 |
|
Description |
When you run badmin reconfig, bld restarts automatically. |
|
Component |
mbatchd |
|
Platform |
UNIX |
|
Impact |
If there are multiple clusters sharing bld, reconfiguring one cluster impacts the other clusters. |
148500 |
Date |
2010-05-07 |
|
Description |
If there is space in the process name, pim gets an incorrect process snapshot. |
|
Component |
pim |
|
Platform |
Linux |
|
Impact |
Cannot get the correct process information. |
138899 |
Date |
2009-11-18 |
|
Description |
A job submitted from a floating client without a type could pend forever if any of the server hosts have an exclusive resource defined. |
|
Component |
mbschd schmod_default.so |
|
Platform |
All |
|
Impact |
A job submitted from a floating client could pend forever. |
142921 |
Date |
2010-01-27 |
|
Description |
The command lshosts displays the wrong number of CPUs for a dual-core Itanium 2 host. |
|
Component |
lim |
|
Platform |
ia64 linux2.4 |
|
Impact |
Over-licensing. |
148608 |
Date |
2010-07-21 |
|
Description |
Some events are overwritten. |
|
Component |
mbatchd.exe |
|
Platform |
Windows |
|
Impact |
JFD may fail to read the lsb.events file and some of the flow cannot go smoothly. It can also cause job loss and bhist command errors. |
137465 |
Date |
2009-10-27 |
|
Description |
If HOME is set in lstcsh, $home is set to $cwd. |
|
Component |
lstcsh |
|
Platform |
All |
|
Impact |
The HOME environment variable is set incorrectly, and the ~cd command does not work as expected. |
70861 |
Date |
2010-05-18 |
|
Description |
Preemption does not work normally due to over-reservation caused by cancelling dispatch decisions. |
|
Component |
mbschd schmod_preemption.so |
|
Platform |
All |
|
Impact |
Preemption does not work as expected. |
148923 |
Date |
2010-05-23 |
|
Description |
Checkpointable jobs sometimes fail to restart because the job file cannot be found. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Restarted checkpointable jobs fail to start. |
157160 |
Date |
2010-10-18 |
|
Description |
After you modify the lsf.cluster file, hostsetup recognizes the machine as a client. |
|
Component |
Installer |
|
Platform |
All |
|
Impact |
Cannot run hostsetup. |
144936 |
Date |
2010-02-28 |
|
Description |
The command ./daemons.wrap -V does not show the version of the binary for several clusters on different platforms. |
|
Component |
daemons.wrap |
|
Platform |
All |
|
Impact |
Difficult to know the version of daemons.wrap. |
157770 |
Date |
2010-10-26 |
|
Description |
The lim fails when a child exits because of a signal from the function in child_handler. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
The system is not stable. |
159933 |
Date |
2010-11-25 |
|
Description |
The mbatchd daemon sets the wrong pending reason for a job if there are over 299 resources in the LSF resource table. |
|
Component |
lim mbatchd |
|
Platform |
All |
|
Impact |
The command bjobs shows the wrong pending reason. |
144641 |
Date |
2010-02-28 |
|
Description |
Recursive job submission causes the last job to fail. |
|
Component |
sbatchd res |
|
Platform |
All |
|
Impact |
Jobs fail. |
134276 |
Date |
2009-09-02 |
|
Description |
The ls index is incorrect. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
This can affect the availability of a host. |
141326 |
Date |
2009-12-31 |
|
Description |
The command bpeek fails if $HOME/.lsbatch/ is not accessible with the local spool directory configured. |
|
Component |
bpeek |
|
Platform |
All |
|
Impact |
The bpeek command fails on non-execution hosts. |
143160 |
Date |
2010-01-26 |
|
Description |
The mbatchd log file contains some characters that cannot be displayed. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Cannot read the log file, problems with the command "wc". |
67806 |
Date |
2010-06-30 |
|
Description |
The bhosts command shows the host status as “closed_Busy” when it should be “ok”. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The bhosts command output is incorrect. |
136446 |
Date |
2009-10-19 |
|
Description |
The mbatchd daemon sends out more data than necessary when "bjobs -w" has more than one job ID. |
|
Component |
bjobs |
|
Platform |
All |
|
Impact |
The bjobs -w queries in job scripts cause very high network load. |
136498 |
Date |
2009-10-14 |
|
Description |
The command blaunch fails to run a command which has space in its path. |
|
Component |
blaunch |
|
Platform |
All |
|
Impact |
This problem impacts PMPI 7.1 integration, the PMPI default installation path has spaces. |
139193 |
Date |
2009-11-20 |
|
Description |
The pim daemon slows down on Linux hosts. |
|
Component |
pim |
|
Platform |
Linux |
|
Impact |
LSF does not report a job's rusage. |
|
Parameter |
LSF_PIM_LINUX_ENHANCE in lsf.conf |
135812 |
Date |
2009-09-27 |
|
Description |
Jobs submitted via bsub have a trailing colon added to the LD_LIBRARY_PATH. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
This may cause problems with the job application. |
148956 |
Date |
2010-05-20 |
|
Description |
Sometimes mbatchd dies when lim is being restarted. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The system becomes unavailable for a short time. |
154291 |
Date |
2010-09-02 |
|
Description |
The mbatchd daemon is killed by sbatchd when egroup does not return on time. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
The system cannot start properly. |
148528 |
Date |
2010-05-06 |
|
Description |
Using bsub with esub.password is slow when there are a lot of unavailable hosts in the cluster. |
|
Component |
esub.password.exe lspassword.exe |
|
Platform |
Windows |
|
Impact |
Performance slows down. |
155646 |
Date |
2010-09-13 |
|
Description |
The mbatchd daemon fails because of the file system error "Interrupted system call". |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The system is not stable. |
117503 |
Date |
2009-08-13 |
|
Description |
The command “bhist –l all” is case sensitive. |
|
Component |
bmod.exe bstop.exe bhist.exe libbat.lib liblsf.lib bresume.exe bjobs.exe eauth.exe bkill.exe bsub.exe |
|
Platform |
Windows |
|
Impact |
The command “bhist –l all” is case sensitive. |
140131 |
Date |
2009-12-09 |
|
Description |
In a MultiCluster environment, if schmod_mc is not set in lsb.modules, there is no error in the log file. |
|
Component |
mbschd |
|
Platform |
All |
|
Impact |
The log file does not indicate improper configuration. |
145350 |
Date |
2010-03-18 |
|
Description |
Dynamic user priority becomes 100 after you change HIST_HOURS to a large value and reconfigure mbatchd. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Some users have a high priority. |
158241 |
Date |
2010-10-27 |
|
Description |
LSB_SUB_QUEUE is not set in $LSB_SUB_PARM_FILE for an esub when the environment variable LSB_DEFAULTQUEUE is defined. |
|
Component |
liblsf.a liblsf.so libbat.a brestart libbat.so bsub |
|
Platform |
All |
|
Impact |
Unable to check queues in $LSB_SUB_PARM_FILE when the environment variable LSB_DEFAULTQUEUE is defined. |
122721 |
Date |
2010-06-22 |
|
Description |
A Perl script running in LSF on an x64 system crashes the machine. |
|
Component |
sbatchd.exe |
|
Platform |
win2003-x64 |
|
Impact |
This job cannot run on an x64 system. |
147709 |
Date |
2010-04-21 |
|
Description |
The PA administrator cannot delete the lsb.stream file. |
|
Component |
mbatchd |
|
Platform |
Linux/UNIX/Solaris |
|
Impact |
The Analytics data loader has to run under the primary LSF administrator account, but this is not realistic since the PA administrator and the LSF administrator are different IT roles. |
137023 |
Date |
2009-11-03 |
|
Description |
Even with available hosts, jobs take a long time (3 minutes) to start running if LSB_SLOT_RESERVE_ENHANCE=Y. |
|
Component |
schmod_reserve.so mbschd schmod_default.so |
|
Platform |
All |
|
Impact |
Performance is poor and job dispatching is slow. |
140133 |
Date |
2009-12-10 |
|
Description |
The lspasswd.exe program contacts LSF Windows client hosts in the cluster to check the password. |
|
Component |
lspasswd.exe |
|
Platform |
Windows |
|
Impact |
It takes a long time for lspasswd.exe to finish if it runs on a client host. |
154718 |
Date |
2010-09-10 |
|
Description |
The mbatchd daemon cannot start if an operation generates the wrong event file. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The mbatchd daemon cannot start if an operation generates the wrong event file. |
149947 |
Date |
2010-06-10 |
|
Description |
The lspasswd -C option does not require the password as input. |
|
Component |
lspasswd.exe |
|
Platform |
Windows |
|
Impact |
The lspasswd -C option does not require the password as input. |
138969 |
Date |
2009-12-03 |
|
Description |
The lsrcp program fails to copy a file when the file name contains the '@' character. |
|
Component |
lsrcp |
|
Platform |
All |
|
Impact |
The lsrcp program cannot copy a file if the name is in the format “my@dirlist user@host:/xxx”. |
134611 |
Date |
2009-09-25 |
|
Description |
The MELIM kills all processes in the system. |
|
Component |
melim |
|
Platform |
All |
|
Impact |
The MELIM brings the host down. |
147851 |
Date |
2010-05-05 |
|
Description |
Jobs are not dispatched by mbschd and you see the error "SCH_MOD_rememberSibling() execution rusage is NULL" |
|
Component |
mbschd schmod_default.so |
|
Platform |
All |
|
Impact |
Jobs pend unless you run badmin reconfig. |
155627 |
Date |
2010-09-10 |
|
Description |
The lsload command reports 0 MB memory and 0 MB swap. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
The lsload command provides the wrong information. |
159001 |
Date |
2010-11-11 |
|
Description |
The return value of lspasswd -c is 0 although the command fails. |
|
Component |
lspasswd.exe |
|
Platform |
Windows |
|
Impact |
You cannot use the environment variable %errorlevel% to get the lspasswd execution result. |
136366 |
Date |
2009-10-12 |
|
Description |
The bjobs command calls the master lim. |
|
Component |
bjobs |
|
Platform |
All |
|
Impact |
Cluster performance is affected. |
139036 |
Date |
2009-11-16 |
|
Description |
The command badmin perfmon view cannot show information related to the file descriptor on x86-64-sol10. |
|
Component |
mbatchd |
|
Platform |
Solaris10 x86-64 |
|
Impact |
You cannot get the information related to the file descriptor. |
147435 |
Date |
2010-07-08 |
|
Description |
The command badmin ckconfig does not report an error message if the USER_SHARE parameter is defined incorrectly. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Cannot troubleshoot the problem. |
157173 |
Date |
2010-10-14 |
|
Description |
The lsb.stream file may be lost in the stream directory. |
|
Component |
liblsbstram.so |
|
Platform |
UNIX/Linux |
|
Impact |
The PA cannot get the data from LSF. |
149326 |
Date |
2010-06-02 |
|
Description |
LSF jobs get finished twice because of EGO reclaim. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The job result is affected when the post-done job reruns again. |
132433 |
Date |
2009-11-20 |
|
Description |
Using "," in the job-level select resource requirement string causes LSF to ignore queue level resource requirements. |
|
Component |
bsub libbat.a libbat.so |
|
Platform |
All |
|
Impact |
Jobs are dispatched to the wrong host. |
157635 |
Date |
2010-10-20 |
|
Description |
The mbschd daemon has a memory leak when removing a host from a host group. |
|
Component |
mbschd |
|
Platform |
All |
|
Impact |
The mbschd daemon consumes more and more memory. |
160950 |
Date |
2010-12-02 |
|
Description |
Using fork mode to switch events, there is error information about root in the NFS server log. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Problems writing to lsb.events file. |
136656 |
Date |
2009-10-21 |
|
Description |
LSF 7.0.5 does not always work, sometimes res is not bound. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
The processor binding facility does not work. |
139963 |
Date |
2009-12-23 |
|
Description |
The LD_LIBRARY_PATH parameter is not set. |
|
Component |
profile.js |
|
Platform |
HPUX IA64 |
|
Impact |
Unable to source the JS environment. |
144633 |
Date |
2010-02-26 |
|
Description |
When USE_SERVER is used in a license file, lim fails to validate the lsf_manager feature and turns on core-based licensing. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
You cannot use the USE_SERVER keyword in a license file. |
159518 |
Date |
2010-11-12 |
|
Description |
Job dependency does not work as expected when a job is requeued. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Job dependency does not work as expected. |
139919 |
Date |
2009-12-07 |
|
Description |
A client API call cannot drive the announcement of the master host. |
|
Component |
liblsf.a libslf.so libbat.a libbat.so |
|
Platform |
All |
|
Impact |
The master lim takes a long time to recognize the new host. |
135974 |
Date |
2009-09-29 |
|
Description |
The command lsadmin limdebug does not respect the LC_PERFM debug class. |
|
Component |
lsadmin |
|
Platform |
All |
|
Impact |
You cannot retrieve information related to network packets dynamically, you must restart the master lim twice for data collection. |
143950 |
Date |
2010-02-05 |
|
Description |
CPU load is not reduced when a job pends with PEND_NO_MAPPING. |
|
Component |
sbatchd |
|
Platform |
Linux |
|
Impact |
A job may run on a highly loaded core, and take a longer time to complete. |
133672 |
Date |
2009-08-24 |
|
Description |
When the command bmig fails for a rerunnable job, the job mail is confusing. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
Job mail is confusing. |
149030 |
Date |
2010-05-19 |
|
Description |
The lsrcp program or lsrun program or parallel jobs fail with error "Request from non-LSF host rejected". |
|
Component |
lim |
|
Platform |
All |
|
Impact |
Cannot run any jobs. |
144584 |
Date |
2010-02-15 |
|
Description |
The slave lim reports an external resource value even though the resource is defined as [all]. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
The UDP buffer overflows and the cluster does not work properly. |
132514 |
Date |
2009-08-13 |
|
Description |
LSF reports the total number of cores on a PPC64 machine even though LSF is running on an IBM partition on that machine and only some of the cores are allocated to that partition. |
|
Component |
lim |
|
Platform |
linux2.6_glibc2.3_ppc64 |
|
Impact |
LSF overcharges for licenses and job scheduling is affected. |
147378 |
Date |
2010-04-13 |
|
Description |
The overwriting operation fails when you use the lsrcp command to transfer files to a 64-bit Windows host. |
|
Component |
lsrcp |
|
Platform |
All |
|
Impact |
The lsrcp command does not work. |
142530 |
Date |
2010-01-19 |
|
Description |
The lim log shows an error on Mac OS X 10.6: idletime(): open(/var/run/utmp) failed, No such file or directory. |
|
Component |
lim |
|
Platform |
MacOS |
|
Impact |
The lim cannot get the right idle time resource. |
136686 |
Date |
2009-12-07 |
|
Description |
MAX in bhosts output keeps increasing with SLOTS_PER_PROCESSOR configured. |
|
Component |
mbatchd blimits |
|
Platform |
All |
|
Impact |
The host is overused. |
135681 |
Date |
2009-10-20 |
|
Description |
The command bsub –L CShell fails if an error exists in the .login file. |
|
Component |
sbatchd |
|
Platform |
Linux/UNIX |
|
Impact |
A job fails without a meaningful error message. |
150608 |
Date |
2010-06-18 |
|
Description |
The command bhosts –w does not display all remote hosts. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
You cannot get the host partition from bhosts -w. |
133545 |
Date |
2009-08-31 |
|
Description |
You use lsrcp and lsid on AIX and the wrong message displays: “callLimUdp_(): got reply LIME_WRONG_MASTER”. |
|
Component |
lsrcp |
|
Platform |
All |
|
Impact |
The file transfer is successful but the message is not clear. |
149990 |
Date |
2010-12-17 |
|
Description |
The "lim -t" and "lshosts" commands report the wrong CPU or core number. |
|
Component |
lim |
|
Platform |
Linux |
|
Impact |
LSF licensing and host slots are affected. |
134007 |
Date |
2010-02-02 |
|
Description |
A job's processes are not bound to particular cores. |
|
Component |
sbatchd |
|
Platform |
linux2.6 Nehalem |
|
Impact |
You cannot use the LSF job binding feature on a Nehalem processor. |
138460 |
Date |
2009-11-16 |
|
Description |
A job is forwarded to the wrong cluster. |
|
Component |
mbschd schmod_mc.so mbatchd |
|
Platform |
All |
|
Impact |
Job forwarding does not work as expected. |
145645 |
Date |
2010-04-19 |
|
Description |
There is an xlsbatch memory leak. |
|
Component |
libbat.so) batch_lib(libbat.a xlsbatch |
|
Platform |
Linux |
|
Impact |
The xlsbatch uses more and more memory. |
140595 |
Date |
2009-12-16 |
|
Description |
The command lsadmin ckconfig -v does not report a configuration error if the lsf.cluster.clustername file resource map section is missing "End ResourceMap". |
|
Component |
lsadmin lim |
|
Platform |
All |
|
Impact |
Configuration problems are not reported on reconfiguration. |
124628 |
Date |
2010-07-29 |
|
Description |
Root mbatchd cannot start while a child mbatchd holds the batch port. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Root mbatchd cannot start. |
141173 |
Date |
2010-02-03 |
|
Description |
The number of CPUs and cores reported by lim is wrong. |
|
Component |
lim |
|
Platform |
MacOS |
|
Impact |
Licensing is incorrect. |
144640 |
Date |
2010-03-04 |
|
Description |
A requeued block mode job fails in 7.0.6. |
|
Component |
res nios sbatchd |
|
Platform |
All |
|
Impact |
A job fails. |
134887 |
Date |
2009-09-09 |
|
Description |
A job cannot be submitted if the dependency condition is larger than 2050 characters. |
|
Component |
liblsf.so libbat.a liblsf.a libbat.so bsub |
|
Platform |
All |
|
Impact |
A job cannot be submitted. |
150743 |
Date |
2010-06-23 |
|
Description |
You run bmod –q, but the modified job is not dispatched in the correct sequence. |
|
Component |
mbschd |
|
Platform |
All |
|
Impact |
The job scheduling sequence is not consistent with the bjobs output. |
135264 |
Date |
2009-11-26 |
|
Description |
You must manually delete MultiCluster information from lsb.lease.state after MultiCluster is disabled. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Expired MultiCluster information is visible. |
135982 |
Date |
2009-10-11 |
|
Description |
The lim runs, but lim log shows an error message if LC_TRACE and LOG_DEBUG are defined. |
|
Component |
lim |
|
Platform |
Windows |
|
Impact |
A false error is shown. |
127919 |
Date |
2009-09-14 |
|
Description |
A running job referred to an open AR keeps running when the AR expires, but an open AR job gets suspended by other pending jobs once the AR window closes. |
|
Component |
mbschd mbatchd bparams |
|
Platform |
All |
|
Impact |
Large parallel jobs fail. |
141620 |
Date |
2010-01-15 |
|
Description |
When using compute units, host groups and resource leasing in a MultiCluster environment, jobs submitted with a host specified are dispatched to the wrong host. |
|
Component |
mbschd |
|
Platform |
All |
|
Impact |
Jobs run on the wrong hosts even though the -m option is specified during job submission. |
152682 |
Date |
2010-07-21 |
|
Description |
Array elements are moved to the wrong queue if you run bswitch, bmod and badmin mbdrestart. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
An array job is sent to the wrong queue. |
144504 |
Date |
2010-02-11 |
|
Description |
You change ENFORCE_ONE_UG_LIMITS from Y to N, and run "badmin reconfig", but the parameter still takes effect. |
|
Component |
mbschd mbatchd |
|
Platform |
All |
|
Impact |
You cannot use badmin reconfig to change the parameter ENFORCE_ONE_UG_LIMITS. |
138171 |
Date |
2009-11-12 |
|
Description |
You cannot run LSF commands beginning with “b” from a floating client during master failover if the host order is not consistent in LSF_MASTER_LIST and LSF_SERVER_HOSTS. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
The floating client will not work for a while. |
134286 |
Date |
2009-10-23 |
|
Description |
In a MultiCluster environment, rerunnable jobs in "UNKWN" state fail to update status in the submission cluster. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Job Status does not match in the submission and execution clusters. |
149713 |
Date |
2010-07-27 |
|
Description |
The Windows installer grants special rights to the LSF Admin account even if you run the LSF daemon under the Local System Account. |
|
Component |
Installer |
|
Platform |
All |
|
Impact |
The Windows installer grants special rights to the LSF Admin account. |
152925 |
Date |
2010-07-24 |
|
Description |
The lsload output cannot show more than 9999 hosts. |
|
Component |
lsload lsplace mbatchd |
|
Platform |
All |
|
Impact |
Scalability is limited. |
156242 |
Date |
2010-10-11 |
|
Description |
The defined resource limit does not take effect on newly added dynamic hosts. |
|
Component |
mbatchd |
|
Platform |
Linux |
|
Impact |
You must run badmin reconfig to work around the issue. |
137457 |
Date |
2009-10-28 |
|
Description |
The esubs for MPICH2 and Intel MPI do not handle the “same” keyword in the resource requirement. |
|
Component |
esub.intelmpi esub.mpich2 |
|
Platform |
All |
|
Impact |
The "same" resource requirement string cannot be used for Intel MPI and MPICH2 jobs. |
135254 |
Date |
2009-12-17 |
|
Description |
The mbschd daemon reserves incorrect slots when preemption is configured with fairshare. |
|
Component |
mbschd schmod_preemption.so |
|
Platform |
All |
|
Impact |
The bqueues and bhosts commands show incorrect information and jobs are affected. |
149997 |
Date |
2010-07-05 |
|
Description |
The program eauth_userpass.exe cannot update passwd.lsfuser because the file contains the character '\r'. |
|
Component |
Installer |
|
Platform |
All |
|
Impact |
The program eauth_userpass.exe cannot update passwd.lsfuser. |
140675 |
Date |
2009-12-16 |
|
Description |
The host slot limit is not working. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The host slot limit is not working. |
157978 |
Date |
2010-10-29 |
|
Description |
The daemon sbatchd logs error messages and rla cannot start if both LSF_CPUSETLIB and LSF_ASPLUGIN are set in lsf.conf. |
|
Component |
rla liblsf.so libbat.a libbat.so liblsf.a sbatchd pam |
|
Platform |
linux2.6-glibc2.3-x86_64, linux2.6-glibc2.4-sn-ipf |
|
Impact |
Features related to SGI ProPack cannot work when LSF_CPUSETLIB and LSF_ASPLUGIN are set in lsf.conf. |
147705 |
Date |
2010-04-26 |
|
Description |
If you define LSF_DEBUG_MBD="LC_TRACE", or forgot to remove it from the lsf.conf file, mbatchd has slow response. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The mbatchd daemon does not respond in a high load cluster. |
148579 |
Date |
2010-05-12 |
|
Description |
After enabling daemons.wrap, some sbatchd debug messages are not logged. |
|
Component |
sbatchd mbatchd |
|
Platform |
All |
|
Impact |
Troubleshooting is not convenient. |
149270 |
Date |
2010-05-26 |
|
Description |
The command bacct always reports memory and swap for a job as zero. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
PA reports cannot be generated because the data in lsb.acct is incorrect. |
136762 |
Date |
2009-11-04 |
|
Description |
With Asian language settings, if the registry key HKEY_CURRENT_USER\Console\LoadConIme is 1, the tssub job does not finish because the conime.exe process does not finish automatically |
|
Component |
lstsmgr.exe |
|
Platform |
Windows |
|
Impact |
The tssub job does not finish. |
103932 |
Date |
2009-09-27 |
|
Description |
Advance reservation files are not created under LSB_LOCALDIR when the duplicate event log feature is enabled. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
You may lose the advance reservation definition. |
132935 |
Date |
2010-01-20 |
|
Description |
LSF does not handle single and double quotes in the job command line on Windows hosts. |
|
Component |
preservestarter sbatchd bsub |
|
Platform |
Linux / Windows |
|
Impact |
The user job flow is broken because the job command line is not interpreted correctly. |
|
Parameter |
JOB_STARTER_EXTEND in lsf.conf |
131751 |
Date |
2009-07-28 |
|
Description |
After changing a Windows server to a static client and restarting the daemons on the client, sbatchd cannot die automatically. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
The sbatchd daemon keeps running on the static client host. |
143961 |
Date |
2010-04-20 |
|
Description |
When bld exits abnormally, mlim does not log anything. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
Troubleshooting is not convenient. |
143248 |
Date |
2010-02-23 |
|
Description |
A job was not dispatched because the job file was owned by root. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The job remains pending, you must force it to run manually. |
140213 |
Date |
2009-12-15 |
|
Description |
A zombie pending job with a dependency exists in mbschd after the job gets killed in mbatchd. |
|
Component |
mbschd |
|
Platform |
All |
|
Impact |
There is a potential performance impact because mbschd still tries to schedule the job. |
134777 |
Date |
2009-09-15 |
|
Description |
The query child mbatchd is bound to the same CPU as the parent mbatchd if LSF_DAEMONS_CPUS is set. |
|
Component |
mbatchd |
|
Platform |
linux2.6 |
|
Impact |
This affects performance. |
136484 |
Date |
2009-10-14 |
|
Description |
When you call the lsb_submit API, it produces windows titled mesub and eauth. |
|
Component |
liblsf.lib |
|
Platform |
All |
|
Impact |
You see windows you do not use. |
147782 |
Date |
2010-04-22 |
|
Description |
You set "MAX_JOB_ARRAY_SIZE = 2147483646 " in lsb.params and but a value of 1000 is used. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
You cannot submit such a large job array. |
140673 |
Date |
2009-12-17 |
|
Description |
DAEMON_WRAP_DEBUG is not defined but every job logs a message to /tmp/daemons.wrap.log. |
|
Component |
daemons.wrap |
|
Platform |
All |
|
Impact |
The /tmp file system eventually runs out of disk space and memory, which can lead to other system problems. |
146411 |
Date |
2010-04-07 |
|
Description |
The esub.password.exe program cannot validate passwords in a mixed cluster with a UNIX master host. |
|
Component |
esub.password.exe |
|
Platform |
Windows |
|
Impact |
The esub.password.exe program does not work. |
140621 |
Date |
2010-01-15 |
|
Description |
The hostsetup program shows the error: “Cannot determine BINARY_TYPE”. |
|
Component |
hostsetup |
|
Platform |
All |
|
Impact |
You cannot run hostsetup or lsfinstall on Ubuntu 9.10. |
154994 |
Date |
2010-09-07 |
|
Description |
Some jobs in a large job array pend on a Windows host because they failed to get the password. |
|
Component |
sbatchd |
|
Platform |
Windows |
|
Impact |
The failed jobs pend. |
138410 |
Date |
2009-11-03 |
|
Description |
The JOB_FINISH record contains redundant blank space to separate values. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
User-defined scripts cannot handle the JOB_FINISH record. |
153404 |
Date |
2010-08-04 |
|
Description |
There is a performance problem with mbatchd reconfiguration when many SLAs are configured. |
|
Component |
mbatchd liblsf.a libbat.so liblsf.so |
|
Platform |
All |
|
Impact |
There is a performance problem with mbatchd reconfiguration. |
153992 |
Date |
2010-08-10 |
|
Description |
The mbschd daemon may crash when bmod and bkill are executed at the same time. |
|
Component |
mbschd |
|
Platform |
All |
|
Impact |
The mbschd daemon may crash. |
147028 |
Date |
2010-04-02 |
|
Description |
The master lim becomes out of service after overwhelming requests from the slave lim when LSF_REJECT_NONLSFHOST=Y |
|
Component |
lim |
|
Platform |
All |
|
Impact |
The system is down. |
144512 |
Date |
2010-02-19 |
|
Description |
In a mixed cluster, you must specify the password file location using a parameter. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
It is difficult to manage a mixed cluster. |
141438 |
Date |
2010-01-03 |
|
Description |
The sbatchd daemon logs messages such as “A socket operation has failed” and sbatchd CPU usage is 100%. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
The host cannot be used, you must restart sbatchd manually. |
150063 |
Date |
2010-07-08 |
|
Description |
CPU binding does not work on some non-Nehalem hosts. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
The CPU binding feature does not work. |
147626 |
Date |
2010-04-19 |
|
Description |
ENFORCE_ONE_UG_LIMITS is defined, but busers does not show the correct result when the keyword "all' is used to define the user group. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
ENFORCE_ONE_UG_LIMITS and the busers command do not work together as expected. |
146925 |
Date |
2010-04-13 |
|
Description |
A mistake in a job script might lead to errors in lsb.events. |
|
Component |
libbat.a mbatchd libbat.so |
|
Platform |
All |
|
Impact |
After you restart mbatchd, the job is gone, the bjobs command no longer shows the job. |
151584 |
Date |
2010-07-02 |
|
Description |
A job is mistakenly forwarded to the wrong cluster. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The job forwarding feature does not work. |
158439 |
Date |
2010-10-29 |
|
Description |
The mbatchd daemon uses up to 2 GB memory after 2-3 days. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
You must run badmin mbdrestart. |
141644 |
Date |
2010-01-08 |
|
Description |
A program linked with the LSF API library cannot run as a service, but it can run as a console application. |
|
Component |
liblsf.lib libbat.lib libbat.dll liblsf.dll libbatw2k.dll libbatw2k.lib liblsfw2k.dll liblsfw2k.lib |
|
Platform |
Windows |
|
Impact |
A program cannot run as a service. |
133940 |
Date |
2009-09-02 |
|
Description |
The mbatchd deamon logs duplicate event records for the same job ID in lsb.acct, with incorrect timestamps. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
This affects Platform Analytics. You see negative pending reasons for jobs. |
130564 |
Date |
2009-10-14 |
|
Description |
You cannot install all LSF features if the license file does not include the keyword FEATURE. |
|
Component |
Installer |
|
Platform |
All |
|
Impact |
LSF features are not enabled. |
149090 |
Date |
2010-05-19 |
|
Description |
The lim daemon does not detect two cores on the dual-core CPU. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
The number of job slots is incorrect. |
140089 |
Date |
2009-12-28 |
|
Description |
The command bjobs shows the incorrect pending order after you run bmig. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
You cannot forecast the dispatch time based on bjobs. |
139281 |
Date |
2009-11-25 |
|
Description |
It takes over 30 seconds to finish a Session Scheduler job. |
|
Component |
libvem.so |
|
Platform |
Linux |
|
Impact |
It takes over 30 seconds to finish a Session Scheduler job. |
151614 |
Date |
2010-07-09 |
|
Description |
After you run "badmin reconfig", sbatchd dies on some slave hosts. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
You must start sbatchd manually on these hosts. |
157247 |
Date |
2010-10-09 |
|
Description |
The sbatchd daemon spawns a child res with -PTY_FIX option for an interative job and it sometimes hangs. |
|
Component |
sbatchd |
|
Platform |
Linux/UNIX |
|
Impact |
An interactive job hangs. |
139538 |
Date |
2009-11-24 |
|
Description |
Jobs are dispatched to hosts not in the compute unit when ptile='!' is specified. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The compute unit does not work properly. |
147379 |
Date |
2010-04-20 |
|
Description |
When LSB_MIXED_PATH_ENABLE=y, job submission using a script fails. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
Job submission by script fails. |
147827 |
Date |
2010-05-05 |
|
Description |
The job does not requeue to the top if LSB_REQUEUE_TO_BOTTOM is 0. |
|
Component |
mbschd |
|
Platform |
All |
|
Impact |
The job execution order is not as expected. |
138051 |
Date |
2009-11-18 |
|
Description |
An LSF job does not release its slot if it is suspended by License Scheduler. LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE does not work. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Underutilization of slots. |
147837 |
Date |
2010-04-22 |
|
Description |
Add a log class for mkExecSibling log messages. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The mbatchd log file size grows. |
149174 |
Date |
2010-05-25 |
|
Description |
The dual core license cannot be checked out, even though FEATURE lsf_dualcore_x86 is specified in the license file. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
Dual-core machines are unlicensed. |
148825 |
Date |
2010-05-20 |
|
Description |
After configuring LSB_LOCALDIR, "badmin showconf mbd" displays an incorrect value for LSB_SHAREDIR. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The badmin command shows incorrect information. |
137221 |
Date |
2009-10-28 |
|
Description |
The bjobs command shows unexpected results because of a threading issue in the child mbatchd. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Job queries are interrupted. |
150860 |
Date |
2010-06-23 |
|
Description |
After submitting a simple Perl job to a win2008/win2003 64 bit host, the execution host hangs and is unreachable until the daemons restart. |
|
Component |
sbatchd |
|
Platform |
Windows |
|
Impact |
The execution host is down. |
145617 |
Date |
2010-03-28 |
|
Description |
Job submission fails if LSB_SUB_MODIFY_FILE is set as environment variable before calling lsb_submit(). |
|
Component |
liblsf.a liblsf.so bsub libbat.so |
|
Platform |
All |
|
Impact |
Job submission fails. |
133276 |
Date |
2009-08-28 |
|
Description |
LSF_BIND_JOB does not work, jobs are bound to same CPU. |
|
Component |
sbatchd |
|
Platform |
Linux 2.6 |
|
Impact |
CPU utilization is not balanced. |
157215 |
Date |
2010-10-10 |
|
Description |
A job submitted from a floating client pends with "Not the same type as the submission host" after restarting LSF. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Job execution fails. |
146690 |
Date |
2010-03-31 |
|
Description |
Some dynamic hosts cannot be recognized by mbatchd, but can be recognized by lim. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Some dynamic hosts cannot join the cluster. |
153987 |
Date |
2010-08-11 |
|
Description |
The mbatchd daemon does not log which host dropped the connection when logging this error message: do_queueInfoReq(): b_write_fix(204941860) failed, Connection reset by peer. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Troubleshooting is difficult. |
142065 |
Date |
2010-01-19 |
|
Description |
Jobs are not dispatched to a lightly loaded host ordered by r15s:pg as default if there is no order string defined explicitly. |
|
Component |
mbschd |
|
Platform |
All |
|
Impact |
Jobs cannot be dispatched as expected, and the cluster's load is not balanced. |
137739 |
Date |
2009-11-17 |
|
Description |
The PATH environment variable is not expanded correctly while using preservestarter.exe. |
|
Component |
preservestarter.exe |
|
Platform |
Windows |
|
Impact |
Cannot find correct path. |
135490 |
Date |
2009-10-14 |
|
Description |
The lsb.stream parsing library has a memory leak, which causes the PA loader to use a lot of memory. |
|
Component |
liblsbstream.so |
|
Platform |
All |
|
Impact |
The PA loader uses a lot of memory. |
143229 |
Date |
2010-01-27 |
|
Description |
The bsub command crashes at runBatchEsub() if the requested host list is longer than 1024 characters. |
|
Component |
bsub |
|
Platform |
All |
|
Impact |
A job cannot be submitted. |
139352 |
Date |
2009-12-03 |
|
Description |
The bclusters command shows “disc” when LSB_MAX_JOB_DISPATCH_PER_SESSION is defined in lsf.conf. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Clusters are disconnected. Jobs cannot run on a remote cluster. |
147957 |
Date |
2010-04-29 |
|
Description |
The sbatchd daemon fills /tmp with millions of temp files. |
|
Component |
clearcase/daemons.wrap sbatchd |
|
Platform |
All |
|
Impact |
The directory /tmp is filled with temp files. |
145612 |
Date |
2010-03-23 |
|
Description |
Memory usage for a multi-thread job is not reported correctly. |
|
Component |
sbatchd res |
|
Platform |
Linux |
|
Impact |
Job accounting information is incorrect, it may affect job scheduling based on memory. |
156717 |
Date |
2010-10-29 |
|
Description |
The mbatchd daemon waits for 5 minutes to restart mbschd. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
It takes a long time for the cluster to work normally. |
158033 |
Date |
2010-10-25 |
|
Description |
The lshosts command can not update the swap value after swap changes. |
|
Component |
lim |
|
Platform |
Linux |
|
Impact |
You must restart lim. |
145641 |
Date |
2010-03-18 |
|
Description |
The variable LSB_DJOB_HOSTFILE contains a mix of \ and / characters; the / characters are not recognized by Windows. |
|
Component |
sbatchd.exe |
|
Platform |
All |
|
Impact |
The LSB_DJOB_HOSTFILE cannot be recognized by Windows. |
157291 |
Date |
2010-10-26 |
|
Description |
On hosts with more than 256 virtual CPUs, the core number per processor may be incorrect. |
|
Component |
lim |
|
Platform |
Linux |
|
Impact |
The lim daemon reports incorrect core values, which affects licensing. |
151193 |
Date |
2010-06-24 |
|
Description |
In a MultiCluster environment, the rescheduling feature did not work on AIX. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
Jobs are not rescheduled. |
124220 |
Date |
2009-09-25 |
|
Description |
The command lsmake fails while building the openWRT project. |
|
Component |
lsmakerm lsmake |
|
Platform |
Linux |
|
Impact |
Unable to build the openWRT source tree. |
143068 |
Date |
2010-01-27 |
|
Description |
Improper log classes make collection of mbatchd debug data difficult. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The cluster is not responding. |
140592 |
Date |
2009-12-13 |
|
Description |
The allocation of vemkd and mbatchd do not match. |
|
Component |
mbatchd lib2vemkd.so |
|
Platform |
All |
|
Impact |
LSF jobs pend even with enough idle CPUs in the SLA. |
147725 |
Date |
2010-04-19 |
|
Description |
Memory leak. |
|
Component |
libbat.a |
|
Platform |
All |
|
Impact |
Memory leak. |
152925 |
Date |
2010-07-24 |
|
Description |
The command lsload cannot show more than 9999 hosts. |
|
Component |
lsload lsplace mbatchd |
|
Platform |
All |
|
Impact |
Scalability is limited. |
154479 |
Date |
2010-08-23 |
|
Description |
LSB_SUB_JOB_DESCRIPTION in esub is empty. |
|
Component |
bsub |
|
Platform |
All |
|
Impact |
Cannot get the value in esub. |
115080 |
Date |
2010-01-26 |
|
Description |
The bjobs –u all command output does not show the condensed host list. |
|
Component |
bjobs mbatchd |
|
Platform |
All |
|
Impact |
The bjobs –u all command output does not show the condensed host list. |
134040 |
Date |
2009-08-28 |
|
Description |
The handle and thread counts keep increasing when loading or unloading the LSF library. |
|
Component |
liblsf.lib |
|
Platform |
Windows |
|
Impact |
Your program will not work as expected. |
135702 |
Date |
2009-09-25 |
|
Description |
The lspasswd command does not work. |
|
Component |
lsf.shared |
|
Platform |
UNIX/Linux |
|
Impact |
The lspasswd command does not work. |
149157 |
Date |
2010-05-25 |
|
Description |
Incorrect calculation of the memory usage of the process. |
|
Component |
pim |
|
Platform |
Linux |
|
Impact |
Incorrect calculation of the memory usage of the process. |
145892 |
Date |
2010-05-06 |
|
Description |
ABR and ARW issues were found in lim when the lsf.shared file is different in the local and remote clusters. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
The cluster does not work. |
152097 |
Date |
2010-07-23 |
|
Description |
Running lspasswd from an LSF 7.0 host automatically alters the permissions of the passwd.lsfuser file to 600, and causes lsrcp to fail on an LSF 6.2 Windows host. |
|
Component |
lim lspasswd.exe |
|
Platform |
Windows |
|
Impact |
The lsrcp programs fails in a mixed cluster if the master runs LSF 7.0 EP 6, and the slave runs LSF 6.2. |
133477 |
Date |
2009-09-02 |
|
Description |
The LIBPATH environment variable on AIX must be set by profile.lsf. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
None. |
151637 |
Date |
2010-07-11 |
|
Description |
The mbatchd daemon core dumps when a dynamic host is removed. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The mbatchd daemon core dumps. |
143059 |
Date |
2010-02-08 |
|
Description |
The mbatchd daemon logs "updUserData1: numPEND is negative". |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The message is confusing. |
133621 |
Date |
2009-09-02 |
|
Description |
You added a new host as the master host, and the new master lim core dumped after reconfiguration. |
|
Component |
lim |
|
Platform |
All |
|
Impact |
The cluster is not stable, failover does not work. |
140808 |
Date |
2009-12-15 |
|
Description |
Using bsub with the XOR resource requirement causes mbatchd to core dump. |
|
Component |
mbatchd |
|
Platform |
All |
|
Impact |
The mbatchd daemon core dumps and you cannot submit jobs. |
153609 |
Date |
2010-08-01 |
|
Description |
When jobs use CPU binding on a Xen virtual machine, sbatchd core dumps. |
|
Component |
sbatchd |
|
Platform |
All |
|
Impact |
The sbatchd daemon core dumps. |
support@platform.com
www.platform.com
North America: +1 905 948 4297
Europe: +44 1256 370 530
Asia: +86 10 6238 1125
Toll-free: 1-877-444-4573
Platform Support
Platform Computing Corporation
3760 14th Avenue
Markham, Ontario
Canada L3R 3T7
© 1994-2011 Platform Computing Corporation
All Rights Reserved.
Although the information in this document has been carefully reviewed, Platform Computing Corporation (“Platform”) does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document.
UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED IN THIS DOCUMENT IS PROVIDED “AS IS” AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA, OR SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM.
Document redistribution policy : This document is protected by copyright and you may not redistribute or translate it into another language, in part or in whole. You may only redistribute this document internally within your organization (for example, on an intranet).
LSF is a registered trademark of Platform Computing Corporation in the United States and in other jurisdictions.
ACCELERATING INTELLIGENCE, THE BOTTOM LINE IN DISTRIBUTED COMPUTING, PLATFORM COMPUTING, CLUSTERWARE, PLATFORM ACTIVECLUSTER, IT INTELLIGENCE, SITEASSURE, PLATFORM SYMPHONY, PLATFORM JOBSCHEDULER, PLATFORM INTELLIGENCE, PLATFORM INFRASTRUCTURE INSIGHT, PLATFORM WORKLOAD INSIGHT, and the PLATFORM and LSF logos are trademarks of Platform Computing Corporation in the United States and in other jurisdictions.
UNIX is a registered trademark of The Open Group in the United States and in other jurisdictions.
Microsoft is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries.
Windows is a registered trademark of Microsoft Corporation in the United States and other countries.
Other products or services mentioned in this document are identified by the trademarks or service marks of their respective owners.