SV860
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs
|
SV860_243_165 / FW860.B1
06/02/22 |
Impact: Data
Severity: HIPER
System firmware changes that
affect certain systems
- HIPER/Pervasive:
For systems with PowerVM firmware and an IBM i partition with native
SR-IOV at firmware levels FW810.00 through FW860.B0, a problem was
fixed for data incorrectly written to PowerVM/LPAR memory during a
DLPAR remove of a native SR-IOV Virtual Function (VF) or Concurrent
Maintenance (CM) of the SR-IOV adapter. This may cause undetected data
corruption in a partition or a PowerVM crash.
|
SV860_240_165 / FW860.B0
01/21/22 |
Impact: Availability
Severity: SPE
System firmware changes that
affect all systems
- On systems with
PowerVM firmware, a problem was fixed for an incorrect SRC logged for a
#EXM0 PCIe expansion drawer power fault found on the low CXP
cable. An SRC B7006A85 (AOCABLE, PCICARD) is logged instead of
the correct SRC of B7006A86 (PCICARD, AOCABLE). This happens
every time there is a power fault on the low CXP cable.
- On systems with PowerVM firmware, a problem was fixed for a
Live Partition Mobility (LPM) hang during LPM validation on the target
system. This is a rare system problem triggered during an LPM
migration that causes LPM attempts to fail as well as other
functionality such as configuration changes and partition shutdowns. To
recover from this problem to be able to do LPM and other operations
such as configuration changes and shutting down partitions, the system
must be re-IPLed.
- On systems with PowerVM firmware, a problem was fixed for
the HMC Repair and Verify (R&V) procedure failing with "Unable to
isolate the resource" during concurrent maintenance of the #EMX0 Cable
Card. This could lead one to take disruptive action in order to
do the repair. This should occur infrequently and only with cases where
a physical hardware failure has occurred which prevents access to the
PCIe reset line (PERST) but allows access to the slot power
controls. As a workaround, pulling both cables from the Cable
Card to the #EMX0 expansion drawer will result in a completely failed
state that can be handled by bringing up the "PCIe Hardware Topology"
screen from either ASMI or the HMC. Then retry the R&V operation to
recover the Cable Card.
- On systems with PowerVM firmware, a problem was fixed for a
partition with an SR-IOV logical port (VF) having a delay in the start
of the partition. If the partition boot device is an SR-IOV logical
port network device, this issue may result in the partition failing to
boot with SRCs BA180010 and BA155102 logged and then stuck on progress
code SRC 2E49 for an AIX partition. This problem is infrequent
because it requires multiple error conditions at the same time on the
SR-IOV adapter. To trigger this problem, multiple SR-IOV logical
ports for the same adapter must encounter EEH conditions at roughly the
same time such that a new logical port EEH condition is occurring while
a previous EEH condition's handling is almost complete but not notified
to the hypervisor yet. To recover from this problem, reboot the
partition.
- On systems with PowerVM firmware, a problem was fixed for a
system hypervisor hang and an Incomplete state on the HMC after a
logical partition (LPAR) is deleted that has an active virtual session
from another LPAR. This problem happens every time an LPAR is
deleted with an active virtual session. This is a rare problem
because virtual sessions from an HMC (a more typical case) prevent an
LPAR deletion until the virtual session is closed, but virtual sessions
originating from another LPAR do not have the same check.
- On systems with PowerVM firmware, the following problems
were fixed for certain SR-IOV adapters:
1) An error was fixed that occurs during a VNIC failover where the VNIC
backing device has a physical port down due to an adapter internal
error with an SRC B400FF02 logged. This is an improved version of
the fix delivered in earlier service pack FW860.A0 for adapter firmware
11.4.415.37 and it significantly reduces the frequency of the error
being fixed.
2) An adapter in SR-IOV shared mode may cause a network interruption
and SRCs B400FF02 and B400FF04 logged. The problem occurs
infrequently during normal network traffic.
These fixes update the adapter firmware to 11.4.415.41 for the
following Feature Codes and CCINs: #EN15/#EN16 with CCIN 2CE3,
#EN17/#EN18 with CCIN 2CE4, #EN0H/#EN0J with CCIN 2B93, #EN0M/#EN0N
with CCIN 2CC0, #EN0K/#EN0L with CCIN 2CC1, #EL56/#EL38 with CCIN 2B93,
and #EL57/#EL3C with CCIN 2CC1.
Update instructions: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
- On systems with PowerVM firmware and an AIX or Linux
partition. a problem was fixed for Platform Error Logs (PELs) that
are truncated to only eight bytes for error logs created by the
firmware and reported to the AIX or Linux OS. These PELs may
appear to be blank or missing on the OS. This rare problem is
triggered by multiple error log events in the firmware occurring close
together in time and each needing to be reported to the OS, causing a
truncation in the reporting of the PEL. As a problem workaround,
the full error logs for the truncated logs are available on the HMC or
using ASMI on the service processor to view them.
- On systems with PowerVM firmware, a problem was fixed for
Platform Error Logs (PELs) not being logged and shown by the OS if they
have an Error Severity code of "critical error". The trigger is
the reporting by a system firmware subsystem of an error log that has
set an Event/Error Severity in the 'UH' section of the log to a value
in the range, 0x50 to 0x5F. The following error logs are affected:
B200308C ==> PHYP ==> A problem occurred during the IPL of
a partition. The adapter type cannot be determined. Ensure that a valid
I/O Load Source is tagged.
B700F104 ==> PHYP ==> Operating System error. Platform
Licensed Internal Code terminated a partition.
B7006990 ==> PHYP ==> Service processor failure
B2005149 ==> PHYP ==> A problem occurred during the IPL of
a partition.
B700F10B ==> PHYP ==> A resource has been disabled due to
hardware problems
A7001150 ==> PHYP ==> System log entry only, no service action
required. No action needed unless a serviceable event was logged.
B7005442 ==> PHYP ==> A parity error was detected in the hardware
Segment Lookaside Buffer (SLB).
B200541A ==> PHYP ==> A problem occurred during a partition
Firmware Assisted Dump
B7001160 ==> PHYP ==> Service processor failure.
B7005121 ==> PHYP ==> Platform LIC failure
BC8A0604 ==> Hostboot ==> A problem occurred during the IPL
of the system.
BC8A1E07 ==> Hostboot ==> Secure Boot firmware
validation failed.
Note that these error logs are still reported to the service processor
and HMC properly. This issue does not affect the Call Home action for
the error logs.
- On systems with PowerVM firmware, a problem was fixed for
the Device Description in a System Plan related to Crypto Coprocessors
and NVMe cards that were only showing the PCI vendor and device ID of
the cards. This is not enough information to verify which card is
installed without looking up the PCI IDs first. With the fix,
more specific/useful information is displayed and this additional
information does not have any adverse impact on sysplan
operations. The problem is seen every time a System Plan is
created for an installed Crypto Coprocessor or NVMe card.
- A problem was fixed for correct ASMI passwords being
rejected when accessing ASMI using an ASCII terminal with a serial
connection to the server. This problem always occurs for systems
at firmware level FW860.A0 and later.
System firmware changes that affect
certain systems
- On systems with PowerVM firmware and an IBM i partition, a
problem was fixed for a Live Partition Mobility (LPM) hang while
performing the migration of an IBM i partition. In some
situations, there is a timing issue when the hypervisor is managing IBM
i software licenses. When a subsequent LPM operation is
performed, the LPM operation hangs. To recover from this problem to be
able to do LPM, the system must be re-IPLed.
- On systems with PowerVM firmware and an IBM i partition. a
problem was fixed for an IBM i partition running in P7 or P8
processor compatibility mode failing to boot with SRCs BA330002 and
B200A101 logged. This problem can be triggered as larger
configurations for processors and memory are added to the
partition. A circumvention for this problem could be to reduce
the number of processors and memory in the partition, or booting in P9
or later compatibility mode will also allow the partition to boot.
|
SV860_236_165 / FW860.A2
12/07/21 |
Impact: Security
Severity: HIPER
System firmware changes that
affect all systems
- HIPER/Non-Pervasive:
On systems with PowerVM firmware, a security problem was fixed to
prevent an attacker that gains service access to the FSP service
processor from reading and writing PowerVM system memory using a series
of carefully crafted service procedures. This problem is Common
Vulnerability and Exposure number CVE-2021-38917.
- HIPER/Non-Pervasive:
On systems with PowerVM firmware, a problem was fixed for the IBM
PowerVM Hypervisor where through a specific sequence of VM management
operations could lead to a violation of the isolation between peer
VMs. This Common Vulnerability and Exposure number is
CVE-2021-38918.
|
SV860_234_165 / FW860.A1
09/16/21 |
Impact: Data
Severity: HIPER
System firmware changes that
affect all systems
- HIPER: On
systems with PowerVM firmware, a problem was fixed which may occur on a
target system following a Live Partition Mobility (LPM) migration of an
AIX partition utilizing Active Memory Expansion (AME) with 64 KB page
size enabled using the vmo tunable: "vmo -ro
ame_mpsize_support=1". The problem may result in AIX termination,
file system corruption, application segmentation faults, or undetected
data corruption.
Note: If you are doing an LPM migration of an AIX partition
utilizing AME and 64 KB page size enabled involving a POWER8 or POWER9
system, ensure you have a Service Pack including this change for the
appropriate firmware level on both the source and target systems.
|
SV860_231_165 / FW860.A0
07/08/21 |
Impact: Availability
Severity: SPE
New
features and functions
- Support added to Redfish to provide a command to set the
ASMI user passwords using a new AccountService schema.
Using this service, the ASMI admin, HMC, and general user passwords can
be changed.
System firmware changes that
affect all systems
- A problem was fixed
for Time of Day (TOD) being lost for the real-time clock (RTC) with an
SRC B15A3303 logged when the service processor boots or resets.
This is a very rare problem that involves a timing problem in the
service processor kernel. If the server is running when the error
occurs, there will be an SRC B15A3303 logged, and the time of day on
the service processor will be incorrect for up to six hours until the
hypervisor synchronizes its (valid) time with the service
processor. If the server is not running when the error occurs,
there will be an SRC B15A3303 logged, and If the server is subsequently
IPLed without setting the date and time in ASMI to fix it, the IPL will
abort with an SRC B7881201 which indicates to the system operator that
the date and time are invalid.
- A problem was fixed in ASMI to allow setting static routes
with two default gateway IP addresses. Without the fix,
ASMI always fails with "Invalid entry. Gateway address" for this
configuration. As a workaround, the static routes could be
created using the ASMI command line and the "route add" command.
- On systems with PowerVM firmware, a problem was fixed for
intermittent failures for a reset of a Virtual Function (VF) for SR-IOV
adapters during Enhanced Error Handling (EEH) error recovery.
This is triggered by EEH events at a VF level only, not at the adapter
level. The error recovery fails if a data packet is received by
the VF while the EEH recovery is in progress. A VF that has
failed can be recovered by a partition reboot or a DLPAR remove and add
of the VF.
- On systems with PowerVM firmware, a problem was fixed where
the Floating Point Unit Computational Test, which should be set to
"staggered" by default, has been changed in some circumstances to be
disabled. If you wish to re-enable this option, this fix is
required. After applying this service pack, do the
following steps:
1) Sign into the Advanced System Management Interface (ASMI).
2) Select Floating Point Computational Unit under the System
Configuration heading and change it from disabled to what is needed:
staggered (run once per core each day) or periodic (a specified time).
3) Click "Save Settings".
- On systems with PowerVM firmware, the following problems
were fixed for certain SR-IOV adapters:
1) An error was fixed that occurs during a VNIC failover where the VNIC
backing device has a physical port down or read port errors with an SRC
B400FF02 logged.
2) A problem was fixed for adding a new logical port that has a PVID
assigned that is causing traffic on that VLAN to be dropped by other
interfaces on the same physical port which uses OS VLAN tagging for
that same VLAN ID. This problem occurs each time a logical port
with a non-zero PVID that is the same as an existing VLAN is
dynamically added to a partition or is activated as part of a partition
activation, the traffic flow stops for other partitions with OS
configured VLAN devices with the same VLAN ID. This problem can
be recovered by configuring an IP address on the logical port with the
non-zero PVID and initiating traffic flow on this logical port.
This problem can be avoided by not configuring logical ports with a
PVID if other logical ports on the same physical port are configured
with OS VLAN devices.
This fix updates the adapter firmware to 11.4.415.37 for the following
Feature Codes and CCINs: #EN15/#EN16 with CCIN 2CE3, #EN17/#EN18 with
CCIN 2CE4, #EN0H/#EN0J with CCIN 2B93, #EN0M/#EN0N with CCIN 2CC0,
#EN0K/#EN0L with CCIN 2CC1, #EL56/#EL38 with CCIN 2B93, and #EL57/#EL3C
with CCIN 2CC1.
Update instructions: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
- On systems with PowerVM firmware, a problem was fixed for
some serviceable events specific to the reporting of EEH errors not
being displayed on the HMC. The sending of an associated call
home event, however, was not affected. This problem is
intermittent and infrequent.
- A problem was fixed for newer hardware record names
(hardware delivered after the original POWER8 GA) not being displayed
correctly in the ASMI deconfiguration records. For example, Capp
is displayed as "Unknown".
- A problem was fixed for Over Temperature (OT) errors being
reported for the processor with SRC B1112A10. In certain workload
environments, additional cooling is needed for the processors and this
can be provided by a user option to increase the floor speed for the
fans. This fix is activated using the ASMI command line to
install an alternate power management definition file to increase the
fan speeds. This change will persist until a factory reset of the
system. Please contact IBM Support for information on the command
to use to increase the fan speeds.
This problem only pertains to the S822 (8284-22A), S822L(8247-22L), and
S822L(5148-22L) models.
- On systems with PowerVM firmware, a problem was fixed for a
system termination with SRC B700F107 following a time facility
processor failure with SRC B700F10B. With the fix, the
transparent replacement of the failed processor will occur for the
B700F10B if there is a free core, with no impact to the system.
- On systems with PowerVM firmware, a problem was fixed for
possible partition errors following a concurrent firmware update from
FW810 or later. A precondition for this problem is that DLPAR
operations of either physical or virtual I/O devices must have occurred
prior to the firmware update The error can take the form of a
partition crash at some point following the update. The frequency of
this problem is low. If the problem occurs, the OS will likely
report a DSI (Data Storage Interrupt) error. For example, AIX
produces a DSI_PROC log entry. If the partition does not crash,
it is also possible that some subsequent I/O DLPAR operations will fail.
- A problem was fixed for spurious out-of-range (greater than
127 C) temperatures being reported for the processor with SRC
B1112A10. With the fix, only valid temperature sensor readings
are used when reporting processors that have exceeded the Over
Temperature (OT) value.
- A problem was fixed in ASMI for setting a static route with
a network address for the IP such as "xxx.xxx.xxx.0". Without the
fix, ASMI always fails with "Invalid entry. IP address" for this
network address format. As a workaround, the static route could
be created with the individual IP endpoint entered instead of the
network address. or created using the ASMI command line and the "route
add" command.
System firmware changes that affect
certain systems
- On systems with an IBM i partition, a problem was fixed for
physical I/O property data not being able to be collected for an
inactive partition booted in "IOR" mode with SRC B200A101
logged. This can happen when making a system plan (sysplan)
for an IBM i partition using the HMC and the IBM i partition is
inactive. The sysplan data collection for the active IBM i
partitions is successful.
- On systems with only Integrated Facility for Linux ( IFL)
processors and AIX or IBM i partitions, a problem was fixed for
performance issues for IFL VMs (Linux and VIOS). This problem
occurs if AIX or IBM i partitions are active on a system with IPL only
cores. As a workaround, AIX or IBM i partitions should not be
activated on an IFL only system. With the fix, the activation of
AIX and IBM i partitions are blocked on an IFL only system. If
this fix is installed concurrently with AIX or IBM i partitions
running, these partitions will be allowed to continue to run until they
are powered off. Once powered off, the AIX and IBM i partitions
will not be allowed to be activated again on the IFL-only system.
This problem pertains to only the E850 (8408-E8E) and E850C(8408-44E)
models.
|
SV860_226_165 / FW860.90
12/09/20 |
Impact: Data
Severity: HIPER
New
features and functions
- On systems with
PowerVM firmware, enable periodic logging
of internal component operational data for the PCIe3 expansion drawer
paths. The logging of this data does not impact the normal use of
the system.
System firmware changes that
affect all systems
- HIPER/Pervasive:
On systems with PowerVM firmware, a problem was fixed for certain
SR-IOV adapters for a condition that may result from frequent resets of
adapter Virtual Functions (VFs), or transmission stalls and could lead
to potential undetected data corruption.
The following additional fixes are also included:
1) The VNIC backing device goes to a powered off state during a VNIC
failover or Live Partition Mobility (LPM) migration. This failure
is intermittent and very infrequent.
2) Adapter time-outs with SRC B400FF01 or B400FF02 logged.
3) Adapter time-outs related to adapter commands becoming blocked with
SRC B400FF01 or B400FF02 logged.
4) VF function resets occasionally not completing quickly enough
resulting in SRC B400FF02 logged.
This fix updates the adapter firmware to 11.4.415.33 for the following
Feature Codes and CCINs: #EN15/#EN16 with CCIN 2CE3, #EN17/#EN18 with
CCIN 2CE4, #EN0H/#EN0J with CCIN 2B93, #EN0M/#EN0N with CCIN 2CC0,
#EN0K/#EN0L with CCIN 2CC1, #EL56/#EL38 with CCIN 2B93, and #EL57/#EL3C
with CCIN 2CC1.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- A problem was fixed for the service processor ASMI "Factory
Reset" option to disable the IPMI service as part of the factory
reset. Without the fix, the IPMI operation state will be
unchanged by the factory reset.
- A rare problem was fixed for a checkstop during an IPL that
fails to isolate and guard the problem core. An SRC is logged
with B1xxE5xx and an extended hex word 8 xxxxDD90. With the fix,
the suspected failing hardware is guarded.
- A problem was fixed for the REST/Redfish interface to
change the success return code for object creation from "200" to
"201". The "200" status code means that the request was received
and understood and is being processed. A "201" status code
indicates that a request was successful and, as a result, a resource
has been created. The Redfish Ruby Client, "redfish_client" may
fail a transaction if a "200" status code is returned when "201" is
expected.
- On systems with PowerVM firmware, a problem was fixed to
allow quicker recovery of PCIe links for the #EMXO PCIe expansion
drawer for a run-time fault with B7006A22 logged. The time for
recovery attempts can exceed six minutes on rare occasions which may
cause I/O adapter failures and failed nodes. With the fix, the
PCIe links will recover or fail faster (in the order of seconds) so
that redundancy in a cluster configuration can be used with failure
detection and failover processing by other hosts, if available, in the
case where the PCIe links fail to recover.
- On systems with PowerVM firmware, a problem was fixed for a
concurrent maintenance "Repair and Verify" (R&V) operation for a
#EMX0 fanout module that fails with an "Unable to isolate the resource"
error message. This should occur only infrequently for cases
where a physical hardware failure has occurred which prevents access to
slot power controls. This problem can be worked around by
bringing up the "PCIe Hardware Topology" screen from either ASMI or the
HMC after the hardware failure but before the concurrent repair is
attempted. This will avoid the problem with the PCIe slot
isolation These steps can also be used to recover from the
error to allow the R&V repair to be attempted again.
- On systems with PowerVM firmware, a problem was fixed for a
B7006A96 fanout module FPGA corruption error that can occur in
unsupported PCIe3 expansion drawer(#EMX0) configurations that mix an
enhanced PCIe3 fanout module (#EMXH) in the same drawer with legacy
PCIe3 fanout modules (#EMXF, #EMXG, #ELMF, or #ELMG). This causes
the FPGA on the enhanced #EMXH to be updated with the legacy firmware
and it becomes a non-working and unusable fanout module. With the
fix, the unsupported #EMX0 configurations are detected and handled
gracefully without harm to the FPGA on the enhanced fanout modules.
- On systems with PowerVM firmware, a problem was fixed for
possible dispatching delays for partitions running in POWER8 processor
compatibility mode.
- On systems with PowerVM firmware, a problem was fixed for
system memory not returned after create and delete of partitions,
resulting in slightly less memory available after configuration changes
in the systems. With the fix, an IPL of the system will recover
any of the memory that was orphaned by the issue.
- On systems with PowerVM firmware, a problem was fixed for
utilization statistics for commands such as HMC lslparutil and
third-party lpar2rrd that do not accurately represent CPU
utilization. The values are incorrect every time for a partition
that is migrated with Live Partition Mobility (LPM). Power Enterprise
Pools 2.0 is not affected by this problem. If this problem has
occurred, here are three possible recovery options:
1) Re-IPL the target system of the migration.
2) Or delete and recreate the partition on the target system.
3) Or perform an inactive migration of the partition. The cycle
values get zeroed in this case.
- On systems with PowerVM firmware, a problem was fixed for a
PCIe3 expansion drawer cable that has hidden error logs for a single
lane failure. This happens whenever a single lane error
occurs. Subsequent lane failures are not hidden and have visible
error logs. Without the fix, the hidden or informational logs
would need to be examined to gather more information for the failing
hardware.
- On systems with PowerVM firmware, a problem was fixed for a
DLPAR remove of memory from a partition that fails if the partition
contains 65535 or more LMBs. With 16MB LMBs, this error threshold
is 1 TB of memory. With 256 MB LMBs, it is 16 TB of memory.
A reboot of the partition after the DLPAR will remove the memory from
the partition.
- On systems with PowerVM firmware, a problem was fixed for
extraneous B400FF01 and B400FF02 SRCs logged when moving cables on
SR-IOV adapters. This is an infrequent error that can occur if
the HMC performance monitor is running at the same time the cables are
moved. These SRCs can be ignored when accompanied by cable
movement.
- On systems with PowerVM firmware, a problem was fixed for
B400FF02 errors for certain SR-IOV adapters during adapter
initialization or error recovery. This is a rare error that can
occur because of a race condition in the firmware.
This fix pertains to adapters with the following Feature Codes and
CCINs: #EN15/#EN16 with CCIN 2CE3, #EN17/#EN18 with CCIN 2CE4,
#EN0H/#EN0J with CCIN 2B93, #EN0M/#EN0N with CCIN 2CC0, #EN0K/#EN0L
with CCIN 2CC1, #EL56/#EL38 with CCIN 2B93, and #EL57/#EL3C with CCIN
2CC1.
- On systems with OPAL firmware, a problem was fixed for a
reset/reload of the service processor initiated by ipmitool inband
usage on the host (such as "mc reset cold") causing all subsequent
inband IPMI messages to be blocked.
- On systems with OPAL firmware, a problem was fixed for host
hangs that can occur when doing error recovery.
- On systems with OPAL firmware, a problem was fixed for I2C
transactions to the On-Chip Controller (OCC) causing a host hang.
- On systems with PowerVM firmware, a problem was fixed for
not logging SRCs for certain cable pulls from the #EMXO PCIe expansion
drawer. With the fix, the previously undetected cable pulls are
now detected and logged with SRC B7006A8B and B7006A88 errors.
- On systems with PowerVM firmware, a problem was fixed for a
rare system hang that can occur when a page of memory is being
migrated. Page migration (memory relocation) can occur for a
variety of reasons, including predictive memory failure, DLPAR of
memory, and normal operations related to managing the page pool
resources.
- On systems with PowerVM firmware, a problem was fixed for
running PCM on a system with SR-IOV adapters in shared mode that
results in an "Incomplete" system state with certain hypervisor tasks
deadlocked. This problem is rare and is triggered when using
SR-IOV adapters in shared mode and gathering performance statistics
with PCM (Performance Collection and Monitoring) and also having a low
level error on an adapter. The only way to recover from this
condition is to re-IPL the system.
- On systems with PowerVM firmware, a problem was fixed for
an SRC B7006A99 informational log now posted as a Predictive with a
call out of the CXP cable FRU, This fix improves FRU isolation
for cases where a CXP cable alert causes a B7006A99 that occurs prior
to a B7006A22 or B7006A8B. Without the fix, the SRC B7006A99 is
informational and the latter SRCs cause a larger hardware replacement
even though the earlier event identified a probable cause for the cable
FRU.
|
SV860_215_165 / FW860.81
03/04/20 |
Impact:
Security Severity: HIPER
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
System firmware changes that affect all systems
- HIPER/Pervasive:
A problem was fixed for an HMC "Incomplete" state for a system after
the HMC user password is changed with ASMI on the service
processor. This problem can occur if the HMC password is changed
on the service processor but not also on the HMC, and a reset of the
service processor happens. With the fix, the HMC will get the
needed "failed authentication" error so that the user knows to update
the old password on the HMC.
|
SV860_212_165 / FW860.80
12/17/19 |
Impact: Security
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
New features and functions
- Support was added
for improved security for the
service processor password policy. For the service
processor, the "admin", "hmc" and "general" password must be set
on first use for newly manufactured systems and after a factory reset
of the system. The IPMI interface has been changed to
be disabled by default in these scenarios. The REST/Redfish
interface will return an error saying the user account is
expired. This policy change helps to enforce the service
processor is not left in a state with a well-known password. The
user can change from an expired default password to a new password
using the Advanced System Management Interface (ASMI).
- Support was added for real-time
data capture for PCIe3 expansion drawer (#EMX0) cable card connection
data via resource dump selector on the HMC or in ASMI on the service
processor. Using the resource selector string of "xmfr
-dumpccdata" will non-disruptively generate an RSCDUMP type of dump
file that has the current cable card data, including data from cables
and the retimers.
System firmware changes that affect all systems
- A problem was fixed
for an intermittent IPMI core
dump on the service processor. This occurs only rarely when
multiple IPMI sessions are starting and cleaning up at the same
time. A new IPMI session can fail initialization when one of its
session objects is cleaned up. The circumvention is to retry the
IPMI command that failed.
- On systems using PowerVM firmware, a
problem was fixed for SR-IOV adapters to provide a consistent
Informational message level for cable plugging issues. For
transceivers not plugged on certain SR-IOV adapters, an unrecoverable
error (UE) SRC B400FF03 was changed to an Informational message
logged. This affects the SR-IOV adapters with the following
feature codes and CCINs: #EC2R/EC2S with CCIN 58FA; #EC2T/EC2U
with CCIN 58FB; and #EC3L/EC3M with CCIN 2CEC.
For copper cables unplugged on certain SR-IOV adapters, a missing
message was replaced with an Informational message logged. This
affects the SR-IOV adapters with the following feature codes and
CCINs: #EN17/EN18 with CCIN 2CE4; #EN0K/EN0L with CCIN 2CC1; and
#EL57/EL3C with CCIN 2CC1.
- On systems with PowerVM firmware, the
following problem related to SR-IOV was fixed: If the SR-IOV
logical port's VLAN ID (PVID) is modified while the logical port is
configured, the adapter will use an incorrect PVID for the Virtual
Function (VF). This problem is rare because most users do not
change the PVID once the logical port is configured, so they will not
have the problem.
This fix updates adapter firmware to 10.2.252.1940 for the
following Feature Codes and CCINs: #EN15/EN16 with CCIN 2CE3;
#EN17/EN18 with CCIN 2CE4; #EN0H/EN0J with CCIN 2B93; #EN0M/EN0N with
CCIN 2CC0; #EN0K/EN0L with CCIN 2CC1; #EL56/EL38 with CCIN 2B93; and
#EL57/EL3C with CCIN 2CC1.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- A problem was fixed for unknowingly
running at lower (the default) frequencies when changing into Fixed Max
Frequency (FMF) mode. This problem should be unlikely to happen
because it requires that the system already is in FMF mode, and then
the user requesting a change into FMF mode. This request is not
handled correctly as the tunable parameters get reset to default which
allows the processor frequency to be reduced to the minimum
value. The recovery for this problem is to change the power mode
to "Nominal" and then change it to FMF.
- A problem was fixed for
Novalink failing to activate partitions that have names with character
lengths near the maximum allowed character length. This problem
can be circumvented by changing the partition name to have 32
characters or less.
- A problem was fixed where a Linux or AIX partition type was
incorrectly reported as unknown. Symptoms include: IBM Cloud
Management Console (CMC) not being able to determine the RPA partition
type (Linux/AIX) for partitions that are not active; and HMC attempts
to dynamically add CPU to Linux partitions may fail with a HSCL1528
error message stating that there are not enough Integrated Facility for
Linux ( IFL) cores for the operation.
- A problem was fixed for
a possible system crash with SRC B7000103 if the HMC session is closed
while the performance monitor is active. As a circumvention for
this problem, make sure the performance monitor is turned off before
closing the HMC sessions.
- A problem was fixed for a Live
Partition Mobility (LPM) migration of a large memory partition to a
target system that causes the target system to crash and for the HMC to
go to the "Incomplete" state. For servers with the default LMB
size (256MB), if a partition is >=16TB and if desired memory is
different than the maximum memory, LPM may fail on the target
system. Servers with LMB sizes less than the default could hit
this problem with smaller memory partition sizes. A circumvention
to the problem is to set the desired and maximum memory to the same
value for the large memory partition that is to be migrated.
- A problem was fixed for
system hangs or incomplete states displayed by HMC(s) caused by a loop
in the handling of Segment Lookaside Buffer (SLB) cache memory parity
errors where SRC B7005442 may be logged. This problem has a low
frequency of occurrence as it requires severe errors in the SLB cache
that are not cleared by an error flush of the entries. A re-IPL
of the system can be used to recover from this error.
System firmware changes that affect certain systems
- On systems with an
IBM i partition, a problem was fixed
for a D-mode IPL failure when using a USB DVD drive in an IBM 7226
multimedia storage enclosure. Error logs with SRC BA16010E,
B2003110, and/or B200308C can occur. As a circumvention, an
external DVD drive can be used for the D-mode IPL.
- On systems with IBM i partitions, a
rare problem was fixed for an intermittent failure of a DLPAR remove of
an adapter. In most cases, a retry of the operation will be
successful.
|
SV860_205_165 / FW860.70
06/18/19 |
Impact: Availability
Severity: HIPER
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
System firmware changes that affect all systems
- HIPER/Pervasive: On
systems with PowerVM firmware , the following problems related to
SR-IOV were fixed:
1) A problem was fixed for new or replacement SR-IOV adapters with
feature codes EN15 and EN17 being rendered non-functional when moved to
SR-IOV mode. This includes cards moved from dedicated device mode,
newly installed adapters, and FRU replacements. This problem occurs
when the adapter firmware is updated to the 10.2.252.x levels from 11.x
adapter firmware levels.
2) A problem was fixed for certain SR-IOV adapters where SRC B400FF01
errors are seen during vNIC failovers and Live Partition Mobility (LPM)
migration of vNIC clients.This may also result in errors seen in
partitions (for example, some partitions may show LNC2ENT_TX_ERR).
3) A problem was fixed where network multicast traffic is not received
by a SR-IOV logical port (VF) network interface for a Linux partition.
The failure can occur when the partition transitions the network
interface out of promiscuous or multicast promiscuous mode.
These fixes update adapter firmware to 10.2.252.1939 for the
following Feature Codes: EN15, EN17, EN0H, EN0J, EN0M,
EN0N, EN0K, EN0L, EL38, EL3C, EL56, and EL57.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- DEFERRED:
PARTITION_DEFERRED: On systems with PowerVM firmware, a
problem was fixed for repeated CPU
DLPAR remove operations by Linux (Ubuntu, SUSE, or RHEL) OSes possibly
resulting in a partition crash. No specific SRCs or error
logs are reported. The problem can occur on any DLPAR CPU
remove operation if running on Linux. The occurrence is
intermittent and rare. The partition crash may result in one or
more of the following console messages (in no particular order):
1) Bad kernel stack pointer addr1 at addr2
2) Oops: Bad kernel stack pointer
3) ******* RTAS CALL BUFFER CORRUPTION *******
4) ERROR: Token not supported
This fix does not activate until there is a reboot of the partition.
- A problem was fixed for a PCIe Hub checkstop with SRC
B138E504 logged that fails to guard the errant processor chip.
With the fix, the problem hardware FRU is guarded so there is not a
recurrence of the error on the next IPL.
- A problem was fixed for an incorrect SRC of B1810000 being
logged when a firmware update fails because of Entitlement Key
expiration. The error displayed on the HMC and in the OS is
correct and meaningful. With the fix, for this firmware update
failure the correct SRC of B181309D is now logged.
- A problem was fixed for informational logs flooding
the error log if a "Get Sensor Reading" is not working.
- A problem was fixed for a Redfish (REST) Patch
request for PowerSaveMode with an unsupported mode value returning an
error code "500" instead of the correct error code of "400".
- On systems with PowerVM firmware, a problem was
fixed for a rare Live Partition Mobility migration hang with the
partition left in VPM (Virtual Page Mode) which causes performance
concerns. This error is triggered by a migration failover
operation occurring during the migration state of "Suspended" and there
has to be insufficient VASI buffers available to clear all partition
state data waiting to be sent to the migration target. Migration
failovers are rare and the migration state of "Suspended" is a
migration state lasting only a few seconds for most partitions, so this
problem should not be frequent. On the HMC, there will be an
inability to complete either a migration stop or a recovery
operation. The HMC will show the partition as migrating and any
attempt to change that will fail. The system must be re-IPLed to
recover from the problem.
- A problem was fixed for an IPMI core dump and SRC B1818601
logged intermittently when an IPMI session is closed. A flood of
B1818A03 SRCs may be logged after the error occurs. The IPMI
server is not impacted and a call home is reported for the
problem. There is no service outage for the IPMI users because of
this.
- A problem was fixed for IPMI sessions in the service
processor causing a flood of B181A803 informational error logs on
registry read fails for IPv6 and IPv4 keywords. These error logs
do not represent a real problem and may be ignored.
- On systems with the PowerVM firmware, a problem was
fixed for shared processor partitions going unresponsive after changing
the processor sharing mode of a dedicated processor partition
from "allow when partition is active" to either "allow when partition
is inactive" or "never". This problem can be circumvented by
avoiding disabling processor sharing when active on a dedicated
processor partition. To recover if the issue has been
encountered, enable "processor sharing when active" on the dedicated
partition.
- On systems with PowerVM firmware, a problem was fixed for
an error in deleting a partition with the virtualized Trusted Platform
Module (vTPM) enabled and SRC B7000602 logged. When this error
occurs, the encryption process in the hypervisor may become
unusable. The problem can be recovered from with a re-IPL of the
system.
- On systems with PowerVM firmware, a problem was fixed in
Live Partition Mobility (LPM) of a partition to a shared processor
pool, which results in the partition being unable to consume uncapped
cycles on the target system. To prevent the issue from occurring,
partitions can be migrated to the default shared processor pool and
then dynamically moved to the desired shared processor pool. To
recover from the issue, do one of the following four steps:
1) Either use DLPAR to add or remove a virtual processor to/from the
affected partition;
2) or dynamically move the partition between shared processor pools;
3) or reboot the partition;
4) or re-IPL the system.
- On systems with PowerVM firmware, a problem was fixed
for a boot failure using a N_PORT ID Virtualization (NPIV) LUN for an
operating system that is installed on a disk of 2 TB or greater, and
having a device driver for the disk that adheres to a non-zero
allocation length requirement for the "READ CAPACITY 16". The IBM
partition firmware had always used an invalid zero allocation length
for the return of data and that had been accepted by previous device
drivers. Now some of the newer device drivers are adhering to the
specification and needing an allocation length of non-zero to allow the
boot to proceed.
- On systems with PowerVM firmware, a problem was fixed for
failing to boot from an AIX mksysb backup on a USB RDX drive with SRCs
logged of BA210012, AA06000D, and BA090010. The problem trigger
is a boot attempt from the RDX device. The boot error does not occur if
a serial console is used to navigate the SMS menus.
- On systems with PowerVM firmware, a problem was fixed
for a system IPLing with an invalid time set on the service processor
that causes partitions to be reset to the Epoch date of
01/01/1970. With the fix, on the IPL, the hypervisor logs a
B700120x when the service processor real time clock is found to be
invalid and halts the IPL to allow the time and date to be corrected by
the user. The Advanced System Management Interface (ASMI) can be
used to correct the time and date on the service processor. On
the next IPL, if the time and date have not been corrected, the
hypervisor will log a SRC B7001224 (indicating the user was warned on
the last IPL) but allow the partitions to start, but the time and date
will be set to the Epoch value.
- A security problem was fixed in the service processor
Network Security Services (NSS) services which, with a
man-in-the-middle attack, could provide false completion or errant
network transactions or exposure of sensitive data from intercepted SSL
connections to ASMI, Redfish, or the service processor message
server. The Common Vulnerabilities and Exposures issue number is
CVE-2018-12384.
- On systems with PowerVM firmware, a problem was fixed for
hypervisor task getting deadlocked if partitions are powered on at the
same time that SR-IOV is being configured for an adapter. With
this problem, workloads will continue to run but it will not be
possible to change the virtualization configuration or power partitions
on and off. This error can be recovered by doing a re-IPL of the
system.
- On systems with PowerVM firmware, a problem was fixed
for hypervisor tasks getting deadlocked that cause the hypervisor to be
unresponsive to the HMC ( this shows as an incomplete state on the HMC)
with SRC B200F011 logged. This is a rare timing error. With
this problem, OS workloads will continue to run but it will not
be possible for the HMC to interact with the partitions. This
error can be recovered by doing a re-IPL of the system with a scheduled
outage.
- A problem was fixed for false indication of a real time
clock (RTC) battery failure with SRC B15A3305 logged. This error
happens infrequently. If the error occurs, and another battery
failure SRC is not logged within 24 hours, ignore the error as it was
caused by a timing issue in the battery test.
- A problem was fixed for an IPMI core dump and SRC B181720D
logged, causing the service processor to reset due to a low memory
condition. The memory loss is triggered by frequently using the
ipmitool to read the network configuration. The service processor
recovers from this error but if three of these errors occur within a 15
minute time span, the service processor will go to a failed hung state
with SRC B1817212 logged. Should a service processor hang occur,
OS workloads will continue to run but it will not be possible for the
HMC to interact with the partitions. This service processor hung
state can be recovered by doing a re-IPL of the system with a scheduled
outage.
System firmware changes that affect certain systems
- DEFERRED: On
systems with a PCIe3 I/O expansion drawer (#EMX0) , a problem was fixed
for the PCIe3 I/O expansion drawer links to improve
stability. Intermittent training failures on the links
occurred during the IPL with SRC B7006A8B logged. With the fix,
the link settings were changed to lower the peak link signal
amplification to bring the signal level into the middle of the
operating range, thus improving the high margin to reduce link training
failures. The system must be re-IPLed for the fix to activate.
- On a system witn an IBM i partition, a problem was fixed
for a DLPAR force-remove of a physical IO adapter from an IBM i
partition and a simultaneous power off of the partition causing the
partition to hang during the power off. To recover the partition
from the error, the system must be re-IPLed. This problem is rare
because there is only a 2-second timing window for the DLPAR and power
off to interfere with each other.
- On a system with an active IBM i partition, a problem was
fixed for a SPCN firmware download to the PCIe3 I/O expansion drawer
(feature #EMX0) Chassis Management Card (CMC) that could possibly get
stuck in a pending state. This failure is very unlikely as it
would require a concurrent replacement of the CMC card that is loaded
with a SPCN level that is older than 2015 (01MEX151012a). The
failure with the SPCN download can be corrected by a re-IPL of the
system.
- On a system with an AMS (Active Memory Sharing) partition,
a problem was fixed for a Live Partition Mobility (LPM) migration
failure when migrating from P9 to a pre-FW860 P8 or P7 system.
This failure can occur if the P9 partition is in dedicated memory mode,
and the Physical Page Table (PPT) ratio is explicitly set on the HMC
(rather than keeping the default value) and the partition is then
transitioned to AMS mode prior to the migration to the older
system. This problem can be avoided by using dedicated memory in
the partition being migrated back to the older system.
- On systems with PowerVM firmware and a vNIC configuration
with multiple backing Virtual Functions (VFs), a problem was fixed for
a backing VF failure after a sequence of repeated failovers where one
of the VF backing devices goes to a powered off state. This
problem is infrequent and only occurs after many vNIC failovers.
A reboot of the partition with the affected VF will recover it.
- On systems with PCIe3 expansion drawers (feature code
#EMX0), a problem was fixed for a UE B700BA01 logged after a FRU
was replaced in the PCIe Expansion drawer. The log should have
been informational instead of unrecoverable because it is normal to
have this log for a replaced part in the expansion drawer that has a
different serial number from the old part. If a part in the
expansion drawer has been replaced, the UE error log can be ignored.
- On systems with IBMi partitions, a problem was fixed
for Live Partition Mobility (LPM) migrations that could have incorrect
hardware resource information (related to VPD) in the target partition
if a failover had occurred for the source partition during the
migration. This failover would have to occur during the Suspended
state of the migration, which only lasts about a second, so this should
be rare. With the fix, at a minimum the migration error will be
detected to abort the migration so it can be restarted. And at a
later IBMi OS level, the fix will allow the migration to complete even
though the failover has occurred during the Suspended state of the
migration.
- On systems with PCIe3 expansion drawers (feature #EMX0), a
problem was fixed for PCI link recovery failure during a PCI Host
Bridge (PHB) reset with SRCs of B7006A80, B7006A22, B7006A8B, and
B7006970 logged. This causes the cable card to fail, losing all
slots in the expansion drawer. This is a rare problem. If
this error occurs, a concurrent maintenance operation could reboot the
expansion drawer or a re-IPL of the system could be done to recover the
drawer.
- On systems with an IBM i partition with greater than 9999
GB installed, a problem was fixed for on/Off COD memory-related
amounts not being displayed correctly. This only happens when
retrieving the On/Off COD numbers via a particular IBMi MATMATR MI
command option value.
- On systems with PCIe3 expansion drawers(feature code
#EMX0), a problem was fixed for a concurrent exchange of a PCIe
expansion drawer cable card, although successful, leaves the fault LED
turned on.
- On systems using PowerVM firmware, a problem was fixed for
shared processor pools where
uncapped shared processor partitions placed in a pool may not be able
to consume all available processor cycles. The problem may occur
when the sum of the allocated processing units for the pool member
partitions equals the maximum processing units of the pool.
|
SV860_180_165 / FW860.60
10/31/18 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
System firmware changes that affect all systems
- A security problem was fixed in the Dynamic Host Control
Protocol
(DHCP) client on the service processor for an out-of-bound memory
access flaw that could be used by a malicious DHCP server to crash the
DHCP client process. The Common Vulnerabilities and Exposures
issue
number is CVE-2018-5732.
- A problem was fixed for ipmitool not being able to set the
system power limit when the power limit is not activated with the
standard option. With the fix, the ipmitool user can
activate the power limit "dcmi power activate" and then set the power
limit "dcmi power set _limit xxxx" where "xxxx" in the new
power limit in Watts.
- A problem was fixed for the periodic guard reminder
function to not re-post error logs of failed FRUs on each IPL.
Instead, a reminder SRC is created to call home the list of FRUs that
have failed and require service. This puts the system to back to
original behavior of only posting one error log for each FRU that has
failed.
- For a HMC managed system, a problem was fixed for a rare,
intermittent NetsCMS core dump that could occur whenever the system is
doing a deferred shutdown power off. There is no impact to normal
operations as the power off completes, but there are extra error logs
with SRC B181EF88 and a service processor dump.
- A problem was fixed for the Redfsih "Manager" request
returning duplicate object URIs for the same HMC. This can occur
if the HMC was removed from the managed system and then later added
back in. The Redfish objects for the earlier instances of the
same HMC were never deleted on the remove.
- Hardware data collection performance was improved for
platform-level dumps.
- A problem was fixed for platform dumps failing for HWPROC
checkstops, causing the system to terminate instead of re-IPLing after
the processor failure. To recover, the system can be powered off
and then IPLed. Any problem hardware will be guarded during the
IPL to allow normal system operations.
- A security problem was fixed to detect and prevent Self
Boot Engine (SBE) SEEPROM corruption. The Common
Vulnerabilities and Exposures issue number is CVE-2018-8931.
System firmware changes that affect certain systems
- On systems with
PowerVM firmware, a problem was fixed for certain hypervisor
error logs being slow to report to the OS. The error logs
affected are those created by the hypervisor immediately after the
hypervisor is started and if there is more than 128 error logs from the
hypervisor to be reported. The error logs at the end of the queue
take a long time to be processed, and may make it appear as if error
logs are not being reported to the OS.
- On systems with PowerVM firmware, a problem was fixed
for an enclosure fault LED being stuck on after a repair of a
fan. This problem only occurs after the second concurrent repair
of a fan.
- On systems with PowerVM firmware, a problem was fixed
for a concurrent EMX0 PCIe3 expansion CXP (120 Gb/s 12x Small
Form-factor Pluggable) cable adapter add or repair that fails with a
hypervisor 0x030A error after a previous add or repair failure.
The affected CXP cable adapters have feature codes #EJ05 and
#EJ08. A system IPL will recover from the problem.
- On systems with PowerVM firmware, a problem was fixed
for a dedicated processor partition hanging during a shutdown.
This is a very rare problem with only a small timing window in the
shutdown that can cause the hang.
- On systems with PowerVM firmware, a problem was fixed for a
Novalink enabled partition not being able to release master from the
HMC that results in error HSCLB95B. To resolve the issue, run a
rebuild managed server operation on the HMC and then retry the
release. This occurs when attempting to release master from HMC
after the first boot up of a Novalink enabled partition if Master Mode
was enforced prior to the boot.
- On systems with PowerVM firmware, a problem was fixed for
resource dumps that use the selector "iomfnm" and options "rioinfo" or
"dumpbainfo". This combination of options for resource dumps
always fails without the fix.
- On a system with an AIX partition, a problem was
fixed for a partition time jump that could occur after doing an AIX
Live Update. This problem could occur if the AIX Live Update
happens after a Live Partition Mobility (LPM) migration to the
partition. AIX applications using the timebase facility could
observe a large jump forwards or backwards in the time reported by the
timebase facility. A circumvention to this problem is to
reboot the partition after the LPM operation prior to doing the AIX
Live Update. An AIX fix is also required to resolve this
problem. The issue will no longer occur when this firmware update
is applied on the system that is the target of the LPM operation and
the AIX partition performing the AIX Live Update has the appropriate
AIX updates installed prior to doing the AIX Live Update.
- On systems with PowerVM firmware, a problem was fixed for a
Virtual Network Interface Controller (vNIC) client adapter to prevent a
failover when disabling the adapter from the HMC. A failover to a
new backing device could cause the client adapter to erroneously appear
to be active again when it is actually disabled. This causes
confusion and failures on the OS for the device driver. This
problem can only occur when there is more than a single backing device
for the vNIC adapter and if a commands are issued from the HMC to
disable the adapter and enable the adapter.
- On systems with PowerVM firmware, a problem was fixed for
all variants (this was partially fixed in an earlier release) for the
SR-IOV firmware adapter updates using the HMC GUI or CLI to only reboot
one SR-IOV adapter at a time. If multiple adapters are updated at
the same time, the HMC error message HSCF0241E may occur:
"HSCF0241E Could not read firmware information from SR-IOV device
...". This fix prevents the system network from being disrupted
by the SR-IOV adapter updates when redundant configurations are being
used for the network. The problem can be circumvented by using
the HMC GUI to update the SR-IOV firmware one adapter at a time using
the following steps:
https://www.ibm.com/support/knowledgecenter/en/POWER8/p8efd/p8efd_updating_sriov_firmware.htm
- On systems with PowerVM firmware, a problem was fixed for
the callout of SRC BA188002 so it does not display three trailing extra
garbage characters in the location code for the FRU. The string
is correct up to the line ending white space, so the three extra
characters after that should be ignored. This problem is
intermittent and does not occur for all BA188002 error logs.
- On systems with PowerVM firmware, a problem was fixed for
when booting a large number of LPARs with Virtual Trusted Platform
Module (vTPM) capability, some partitions may post a SRC BA54504D
time-out for taking too long to start. With the fix, the time
allowed to boot a vTPM LPAR is increased. If a time-out occurs,
the partition can be booted again to recover. The problem can be
avoided by auto-starting fewer vTPM LPARs, or booting them a couple at
a time to prevent flooding the vTPM device server with requests that
will slow the boot time while the LPARs wait on the vTPM device server
responses.
- On systems with PowerVM firmware, a problem was fixed for
SMS menus to limit reporting on the NPIV and vSCSI configuration to the
first 511 LUNs. Without the fix, LUN 512 through the last
configured LUN report with invalid data. Configurations in excess
of 511 LUNs are very rare, and it is recommended for performance
reasons (to be able search for the boot LUN more quickly) that the
number of LUNs on a single targeted be limited to less than 512.
- On systems with PowerVM firmware, the following two errors
in the SR-IOV adapter firmware were fixed: 1) The adapter
resets and there is a B400FF01 reference code logged. This error
happens in rare cases when there are multiple partitions actively
running traffic through the adapter. System firmware resets the
adapter and recovers the system with no user-intervention required; 2)
SR-IOV VFs with defined VLANs and an assigned PVID are not able to ping
each other.
This fix updates adapter firmware to 10.2.252.1933, for the following
Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K,
EN0L, EL38, EL3C, EL56, and EL57.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems with PowerVM firmware, a problem was fixed for
an IPL that ends with the HMC in the "Incomplete" state with SRCs
B182951C and A7001151 logged. Partitions may start and can
continue to run without the HMC services available. In order to
recover the HMC session, a re-IPL of the system is needed
(however, partition workloads could continue running uninterrupted
until the system is intentionally re-IPLed at a scheduled time.).
The frequency of this problem is very low as it rarely occurs.
- On systems with PowerVM firmware, a problem was fixed for
Live Partition Mobility (LPM) failing along with other hypervisor
tasks, but the partitions continue to run. This is an extremely
rare failure where a re-IPL is needed to restore HMC or Novalink
connections to the partitions, or to do any system configuration
changes.
- On systems with PowerVM firmware, a problem was fixed for
partition SMS menus to display certain network adapters that were
unviewable and not usable as boot and install devices after a microcode
update. The problem network adapter is still present and usable
at the OS. The adapters with this problem have the following
featiure codes: EN0A, EN0B, EN0H, EN0J, EN0K, EN0L, EN15, EN17,
EL5B, EL38, EL3C, EL56, and EL57.
- For a shared memory partition, a problem was fixed
for Live Partition Mobility (LPM) migration hang after a Mover Service
Partition (MSP) failover in the early part of the migration. To
recover from the hang, a migration stop command must be given on the
HMC. Then the migration can be retryed.
- For a shared memory partition, a problem was fixed
for Live Partition Mobility (LPM) migration failure to an indeterminate
state. This can occur if the Mover Service Partition (MSP)
has a failover that occurs when the migrating partition is in the state
of "Suspended." To recover from this problem, the partition must
be shutdown and restarted.
- On a system attached to a Cloud Management Console (CMC)
via a Cloud Connector on the HMC, a problem was fixed for
Redfish queries to the service processor resulting in memory leaks and
out of memory (OOM) resets of the service processor.
|
SV860_165_165 / FW860.51
05/22/18 |
Impact: Security
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
Response for Recent Security Vulnerabilities
- DISRUPTIVE:
On systems with PowerVM firmware, In response to recently
reported security vulnerabilities, this firmware update is being
released to address Common Vulnerabilities and Exposures issue number
CVE-2018-3639. In addition, Operating System updates are required
in conjunction with this FW level for CVE-2018-3639.
|
SV860_160_056 / FW860.50
05/03/18 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E); Power
System E850C (8408-44E); Power System S812L (5148-21L) and Power
System S822L (5148-22L) servers only.
New features and functions
- On systems with PowerVM firmware, support was added to
allow V9R910 and later HMC levels to query Live Partition Mobility
(LPM) performance data after an LPM operation.
- Support was added to the Advanced System Management
Interface (ASMI) to provide customer control over speculative
execution in response to CVE-2017-5753 and CVE-2017-5715
(collectively known as Spectre) and CVE-2017-5754 (known as
Meltdown). The ASMI "System Configuration/Speculative
Execution Control" provides two options that can only be set when the
system is powered off:
1) Speculative execution controls to mitigate user-to-kernel and
user-to-user side-channel attacks. This mode is designed for
systems that need to mitigate exposures of the hypervisor, operating
systems, and user application data to untrusted code. This
mode is set as the default.
2) Speculative execution fully enabled: This optional mode is
designed for systems where the hypervisor, operating system, and
applications can be fully trusted.
Note: Enabling this option could expose the system to
CVE-2017-5753, CVE-2017- 5715, and CVE-2017-5754. This includes
any partitions that are migrated (using Live Partition Mobility) to
this system.
- On systems with PowerVM firmware, support was added to
allow a periodic data capture from the PCIe3 I/O expansion drawer (with
feature code #EMX0) cable card links.
- On systems with PowerVM firmware and an IBM i
partition, support was added for multipliers for IBM i MATMATR
fields that are limited to four characters. When retrieving
Server metrics via IBM MATMATR calls, and the system contains greater
than 9999 GB, for example, MATMATR has an architected "multiplier"
field such that 10,000 GB can be represented by 5,000 GB * Multiplier
of 2, so '5000' and '2' are returned in the quantity and
multiplier fields, respectively, to handle these extended values.
The IBM i OS also requires a PTF to support the MATMATR field
multipliers.
System firmware changes that affect all systems
- A problem was fixed in which deconfigured-resource records
can become malformed and cause the loss of service processor for both
redundant and non-redundant service processor systems. These
failures can occur during or after firmware updates to the FW860.40,
FW860.41, or FW860.42 levels. The complete loss of service
processor results in the loss of HMC (or FSP stand-alone) management of
the server and loss of any further error logging. The server
itself will continue to run. Without the fix, the loss of the
service processor could happen within one month of the deconfiguration
records being encountered. It is highly recommended to install
the fix. Recovery from the problem, once encountered, requires a
full server AC power cycle and clearing of deconfiguration records to
avoid reoccurrence. Clearing deconfiguration records exposes the
server to repeat hardware failures and possible unplanned outages.
- A problem was fixed for the guard reminder processing of
garded FRUs and error logs that can cause a system power off to hang
and time out with a service processor reset.
- A problem was fixed for the wrong Redfish method (PATCH or
POST) passed for a valid Uniform Resource Indicator (URI) causing an
incorrect error message of " 501 - Not Implemented". With the
fix, the message returned is "Invalid Method on URI" which is more
helpful to the user.
- A problem was fixed for SRC call home reminders for bad
FRUs causing service processor dumps with SRC B181E911 and
reset/reloads. This occurred if the FRU callout was missing a
CCIN number in the error log. This can happen because some error
logs only have have "Symbolic FRUs" and these were not being handled
correctly.
System firmware changes that affect certain systems
- DEFERRED: On
systems with PowerVM firmware, a problem was fixed for a PCIe3 I/O
expansion drawer (with feature code #EMX0) where control path stability
issues may cause certain SRCs to be logged. Systems using copper
cables may log SRC B7006A87 or similar SRCs, and the fanout module may
fail to become active. Systems using optical cables may log SRC
of B7006A22 or similar SRCs. For this problem, the errant I/O
drawer may be recovered by a re-IPL of the system.
- On systems with PowerVM firmware, a problem was fixed for a
Coherent Accelerator Processor Proxy (CAPP) unit hardware failure that
caused a hypervisor hang with SRC B7000602. This failure is very
rare and can only occur during the early IPL of the hypervisor, before
any partitions are started. A re-IPL will recover from the
problem.
- On systems with PowerVM firmware, a problem was fixed for a
Live Partition Mobility migration hang that could occur if one of its
VIOS Mover Service Partitions (MSPs) goes into a failover at the start
of the LPM operation. This problem is rare because it requires a
MSP error to force a MSP failover at the very start of the LPM
migration to get the LPM timing error. The LPM hang can be
recovered by using the "migrlpar -o s" and "migrlpar -o r" commands on
the HMC.
- On systems with PowerVM firmware, a problem was fixed for
incorrect low affinity scores for a partition reported from the HMC
"lsmemopt" command when a partition has filled an entire drawer.
A low score indicates the placement is poor but in this case the
placement is actually good. More information on affinity scores
for partitions and the Dynamic Platform Optimizer can be found at the
IBM Knowledge Center: https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hat/p8hat_dpoovw.htm.
- On systems with PowerVM firmware, a problem was fixed to
allow the management console to display the Active Memory Mirroring
(AMM) licensed capability. Without the fix, the AMM licensed
capability of a server will always show as "off" on the management
console, even when it is present.
- On systems with PowerVM firmware, a problem was fixed for a
rare hypervisor hang for systems with shared processors with a sharing
mode of uncapped. If this hang occurs, all partitions of the
system will become unresponsive and the HMC will go to an "Incomplete"
state.
- On systems with PowerVM firmware, a problem was fixed for a
Live Partition Mobility migration abort that could occur if one of its
VIOS Mover Service Partitions (MSPs) goes into a failover during the
LPM operation. This problem is rare because it requires a MSP
error to force a MSP failover during the LPM migration to get the LPM
timing error. The LPM abort can be recovered by retrying the LPM
migration.
- On systems with PowerVM firmware and a shared processor
pool, a very rare problem was fixed for the hypervisor not responding
to partition requests such as power off and LIve Partiton Mobility
(LPM). This error is caused by a request for a guard of a failed
processor (when there are not any available spare processors) that has
hung.
- On systems using PowerVM firmware with mirrored memory
running IBM i partitions, a problem was fixed for un-mirrored nodal
memory errors in the partition that also caused the system to
crash. With the fix, the memory failure is isolated to the
impacted partition, leaving the rest of the system unaffected.
This fix improves on an earlier fix delivered for IBM i memory
errors in FW840.60 by handling the errors in nodal memory.
- On systems with PowerVM firmware and Huge Page (16 GB)
memory enabled for a AIX partition, a problem was fixed for the
OS failing to boot with an 0607 SRC displayed. This error occurs
on systems with FW860.40, FW860.41 or FW860.42 installed.
To circumvent the problem, disable Huge Pages for the AIX
partition. For information on viewing and setting values for AIX
huge-page memory allocation, see the following link in the IBM
Knowledge Center: https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hat/p8hat_aixviewhgpgmem.htm
- On systems with PowerVM firmware and an IBM i partition, a
problem was fixed for 64 bytes overwritten in a portion of the IBM i
Main Storage Dump (MSD). Approximately 64 bytes are
overwritten just beyond the 17 MB (0x11000000) address on P8
systems. This problem is cosmetic as the dump is still readable
for problem diagnostics and no customer operations are affected by it.
- On systems with PowerVM firmware and a partition with a
Fibre Channel Adapter (FCA) or a Fibre Channel over Ethernet (FCoE)
adapter, a problem was fixed for bootable disks attached to the
FCA/FCoE adapter not being seen in the System Management Services (SMS)
menus for selection as boot devices. This problem is likely to
occur if the only I/O device in the partition is a FCA or FCoE
adapter. If other I/O devices are present, the problem may still
occur if the FCA or FCoE is the first adapter discovered by
SMS. A work-around to this problem is to define a virtual
Ethernet adapter in the partition profile. The virtual adapter
does not need to have any physical backing device, as just having
the VLAN defined is sufficient to avoid the problem. The FCA has
feature codes #EN0A, #EN0B, #EN0F, #EN0G, #EN0Y, #EN12, #5729, #5774,
#5735, and #5723 and the FCoE adapter has feature codes #5708, #EN0H,
#EN0J, #EN0K, and #EN0L for all but the Linux on Power 8247
models. For the Linux on Power 8247 models, the FCA has feature
codes #5729, #5774, #EL43, #EL58, #EL5B, #EL54, and #EL52 and the
FCoE adapter has feature codes #5708, #EL56, #EL38, #EL57, and #EL3C.
- On systems with PowerVM firmware and a partition with a 3.0
USB controller, a problem was fixed for a partition boot failure.
The USB 3.0 controller may be integrated or a adapter card with feature
code #EC45 or #EC46. The boot failure is triggered by a fault in
the USB controller but instead of the just the USB controller failing,
the entire partition fails. With the fix, the failure is limited
to the USB controller.
- On systems with PowerVM firmware, a problem was fixed for
the FRU callouts for the BA188001 and BA188002 EEH errors to include
the PCI Host Bridge (PHB) FRU which had been excluded. For the P8
systems, these rare errors will more typically isolate to the processor
instead of the adapter or slot planar. In the pre-P8
systems, the I/O planar also included the PHB, but for P8 systems, the
PHB was moved to the processor complex.
- On systems using PowerVM firmware, a problem
was fixed for an internal error in the SR-IOV adapter firmware
that resets the adapter and logs a B400FF01 reference code.
This error happens in rare cases when there are multiple partitions
actively running traffic through the adapter and a subset of the
partitions are shutdown hard. The error causes a temporary
disruption of traffic but recovery from the error is automatic with no
user intervention needed.
This fix updates adapter firmware to 10.2.252.1931, for the following
Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K,
EN0L, EL38, EL3C, EL56, and EL57.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems using OPAL firmware, Skiboot was updated to
V5.4.9 from V5.4.8, providing the following fix:
- A problem was fixed for a possible incorrect value for
processor frequency from /proc/cpuinfo. The value being returned was
the last frequency requested by the kernel, but may not reflect the
current frequency of the processor.
- On systems with PowerVM firmware, a problem was fixed
for a PCIe3 I/O expansion drawer (with feature code
#EMX0) failing to initialize during the IPL with a SRC B7006A88
logged. The error is infrequent. The errant I/O drawer can
be recovered by a re-IPL of the system.
- On systems with PowerVM firmware, a problem was fixed for
the SR-IOV firmware adapter updates using the HMC GUI or CLI to only
reboot one SR-IOV adapter at a time. If multiple adapters are
updated at the same time, the HMC error message HSCF0241E may
occur: "HSCF0241E Could not read firmware information from SR-IOV
device ...". This fix prevents the system network from being
disrupted by the SR-IOV adapter updates when redundant configurations
are being used for the network. The problem can be circumvented
by using the HMC GUI to update the SR-IOV firmware one adapter at a
time using the following steps: https://www.ibm.com/support/knowledgecenter/en/8247-22L/p8efd/p8efd_updating_sriov_firmware.htm
|
SV860_138_056 / FW860.42
01/09/18 |
Impact: Security
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers only.
New features and functions
- In response to recently reported security vulnerabilities,
this firmware update is being released to address Common
Vulnerabilities and Exposures issue numbers CVE-2017-5715,
CVE-2017-5753 and CVE-2017-5754. Operating System updates are
required in conjunction with this FW level for CVE-2017-5753 and
CVE-2017-5754.
|
SV860_127_056 / FW860.41
12/08/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
System firmware changes that affect certain systems
- On systems using PowerVM firmware that are co-managed with
HMC and PowerVM NovaLink, a problem was fixed for the HMC going
into the Incomplete state after deleting a NovaLink partition or after
using the HMC "chsyscfg powervm_mgmt_capable=0" command to remove
the NovaLink attribute from a partition. Partitions will continue
running but cannot be changed by the management console and the Live
Partitiion Mobility (LPM) will not function in this state. A
power off of the system will remove it from the Incomplete state, but
the NovaLink partition will not have been deleted. To force the
delete of the NovaLink partition or partitions without the fix,
erase the service processor NVRAM and then restore the HMC partition
data.
- On systems using PowerVM firmware with PowerVM NovaLink, a
problem was fixed for the HMC going into the incomplete state when
restoring HMC profile data after deleting a NovaLink partition.
This fix will prevent but not repair the problem once it has
occurred. Recovery from the problem is to erase the service
processor NVRAM and then restore the HMC partition data.
|
SV860_118_056 / FW860.40
11/08/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers only.
System firmware changes that affect all systems
- A problem was fixed
for the "Minimum code level supported" not being shown by the Advanced
System Management Interface (ASMI) when selecting the "System
Configuration/Firmware Update Policy" menu. The message shown is
"Minimum code level supported value has not been set". The
workaround to find this value is to use the ASMI command line interface
with the "registry -l cupd/MinMifLevel" command.
- A problem was fixed for system termination and outage
caused by a corrupted system reset type. For cases where the
system reset type cannot be identified, the service processor will now
do a reset/reload to keep the system running. This is a rare
problem that is occurring during an error/recovery situation that
involves a reset of the service processor. This is a replacement
for a previous fix attempt (same fix description) for this problem but
it failed to prevent the system from terminating.
- A problem was fixed for a power supply error log with SRC
B155A4E0 not identifying the FRU location of the failed power
supply. This will happen anytime a power supply fails or is
removed at system runtime. A circumvention for this problem is to
look for other power Predictive Errors in the error log and these will
help identify the location of the failing power supply.
- A problem was fixed for "sh: errl: not found " error
messages to the service processor console whenever the Advanced System
Management Interface (ASMI) was used to display error logs. These
messages did not cause any problems except to clutter the console
output as seen in the service processor traces.
- A problem was fixed for the LineInputVoltage and
LastPowerOutputWatts being displayed in millivolts and milliwatts,
respectively, instead of volts and watts for the output from the
Redfish API for power properties for the chassis. The URL
affected is the following: "https://<fsp
ip>/redfish/v1/Chassis/<id>/Power"
- A problem was fixed for a Power Supply Unit (PSU) failure
of SRC 110015xF logged with a power supply fan call out
when doing a hot re-plug of a PSU. The power supply may be
made operational again by doing a dummy replace of the PSU that was
called out (keeping the same PSU for the replace operation). A
re-IPL of the system will also recover the PSU.
- A problem was fixed for the service processor low-level
boot code always running off the same side of the flash image,
regardless of what side has been selected for boot ( P-side or
T-side). Because this low-level boot code rarely changes, this
should not cause a problem unless corruption occurs in the flash image
of the boot code. This problem does not affect firmware
side-switches as the service processor initialization code
(higher-level code than the boot code) is running correctly from the
selected side. Without the fix, there is no recovery for boot
corruption for systems with a single service processor as the service
processor must be replaced.
- A problem was fixed for a missing serviceable event from a
periodic call home reminder. This occurred if there was an FRU
deconfigured for the serviceable event.
- A problem was fixed for help text in the Advanced System
Management Interface (ASMI) not informing the user that system fan
speeds would increase if the system Power Mode was changed to "Fixed
Maximum Frequency" mode. If ASMI panel function "System
Configuration->Power Management->Power Mode Setup" "Enable Fixed
Maximum Frequency mode" help is selected, the updated text states
"...This setting will result in the fans running at the maximum speed
for proper cooling."
- A problem was fixed for a degraded PCI link causing a
Predictive SRC for a non-cacheable unit (NCU) store time-out that
occurred with SRC B113E540 or B181E450 and PRD signature
"(NCUFIR[9]) STORE_TIMEOUT: Store timed out on PB". With the fix,
the error is changed to be an Informational as the problem is not with
the processor core and the processor should not be replaced. The
solution for degraded PCI links is different from the fix for this
problem, but a re-IPL of the CEC or a reset of the PCI adapters could
help to recover the PCI links from their degraded mode.
- A problem was fixed for the IPMI serial over LAN (SOL)
console buffer becoming full without an active ipmitool client causing
a service processor hang to host, resulting in a host initiated
reset/reload of the service processor. The problem causes a
serviceable event and a service processor dump, but otherwise it should
not impact the jobs on the running host.
- A problem was fixed for the IPMI serial over LAN (SOL)
console intermittently dropping a character of data. This
occurred anytime the console data to write size matched the free space
size in the SOL console 4K buffer.
- A problem was fixed for a Redfish Patch on the
"Chassis" "HugeDynamicDMAWindowSlotCount" for the validation of
incorrect values. Without the fix, the user will not get proper
error messages when providing bad values to the patch.
System firmware changes that affect certain systems
- DEFERRED: On
systems using PowerVM firmware, a problem was fixed for
DPO (Dynamic Platform Optimizer) operations taking a very long and
impacting the server system with a performance degradation. The
problem is triggered by a DPO operation being done on a system with
unlicensed processor cores and a very high I/O load. The fix
involves
using a different lock type for the memory relocation activities (to
prevent lock contention between memory relocation threads and partition
threads) that is created at IPL time, so an IPL is needed to activate
the fix. More information on the DPO function can be found at the
IBM
Knowledge Center: https://www.ibm.com/support/knowledgecenter/en/8247-42L/p8hat/p8hat_dpoovw.htm
- On systems using PowerVM firmware, a problem was
fixed for an intermittent service processor core dump and a callout for
netsCommonMSGServer with SRC B181EF88. The HMC connection
to the service processor automatically recovers with a new session.
- On systems using PowerVM firmware, a problem was fixed for
a concurrent firmware update failure with HMC error message
"E302F865-PHYPTooBusyToQuiesce". This error can occur when the
error log is full on the hypervisor and it cannot accept more error
logs from the service processor. But the service processor keeps
retrying the send of an error log, resulting in a "denial of service"
scenario where the hypervisor is kept busy rejecting the error logging
attempts. Without the fix, the problem may be circumvented by
starting a logical partition (if none are running) or by purging
the error logs on the service processor.
- On systems using PowerVM firmware with mirrored memory
running IBM i partitions, a problem was fixed for memory fails in the
partition that also caused the system to crash. The system
failure will occur any time that IBM i partition memory towards the
beginning of the partition's assigned memory fails. With the fix,
the memory failure is isolated to the impacted partition, leaving the
rest of the system unaffected.
- On systems using PowerVM firmware, a problem was fixed for
failures deconfiguring SR-IOV Virtual Functions (VFs). This can
occur during Live Partition Mobility (LPM) migrations with HMC error
messages of HSCLAF16, HSCLAF15 and HSCLB602 shown. This results
in an LPM migration failure and a system reboot is required to recover
the VFs for the I/O adapters. This error may occur more
frequently in cases where the I/O adapter has pending I/O at the time
of the deconfigure request for the VF.
- On systems using PowerVM firmware, a problem was fixed for
a vNIC client that has backing devices being assigned an active server
that was not the one intended by an HMC user failover for the client
adapter. This only can happen if the vNIC client adapter had
never been activated. A circumvention is to activate the client
OS and initialize the vNIC device (ifconfig "xxx" up) and an active
backing device will then be selected.
- On systems using PowerVM firmware, a problem was fixed for
partitions with more than 32TB memory failing to IPL with memory space
errors. This can occur if the logical memory block (LMB) size is
small as there is a memory loss associated with each LMB. The
problem can be circumvented by reducing the amount of partition memory
or increasing the LMB size to reduce the total number of LMBs needed
for the memory allocation.
- On systems using PowerVM firmware, a problem was
fixed for the error handling of EEH events for the SR-IOV Virtual
Functions (VFs) that can result in IPL failure with B7006971, B400FF05,
and BA210000 SRCs logged. In these cases, the partition console
stops at an OFDBG prompt. Also, a DLPAR add of a VF may result in
a partition crash due to a 300 DSI exception because of a low-level EEH
event. A circumvention for the problem would be to debug the EEH
events which should be recovered errors and eliminate the cause of the
EEH events. With the fix, the EEH events still log Predictive
Errors but do not cause a partition failure.
- On systems using PowerVM firmware and running IBM i on
stand-alone systems (no HMC attached). a problem was fixed for an
inadvertent Operations Panel function 71 activation that put the system
into "Network Boot" mode and prevented the IBM i from IPLing. A
circumvention is to use Operations Panel function 72 to turn off
"Network Boot" mode. With the fix, the Operations Panel function
71 request will be ignored on IBM i stand-alone systems.
- A problem was fixed for intermittent high-temperature
induced link failures on the 100GB EDR IB, NIC, and RoCE adapters
caused by system fans running at too low of a speed. These
adapters include the PCIe3 1-port and 2-port 100Gb EDR IB x16 adapters
and the PCIe3 2-port 100GbE (NIC and RoCE) QSFP28 x16 adapter with
feature codes EC3E, EC3F, EC3L, EC3M, EC3T, and EC3U. EDR IB
(Enhanced Data Rate Infiniband), NIC (Network Interface Controller),
and IBTA RoCE (Remote Direct Memory Access (RDMA) over Converged
Ethernet) are the specific network standards supported in the adapters.
This problem was fixed earlier in FW860.31 for the (8284-xxx) and
(8247-xxx) models. The fix has been extended to include the E850
(8408-E8E) and the E850 (8408-44E) models.
- On systems using PowerVM firmware, a problem was fixed for
an invalid date from the service processor causing the customer date
and time to go to the Epoch value (01/01/1970) without a warning or
chance for a correction. With the fix, the first IPL
attempted on an invalid date will be rejected with a message alerting
the user to set the time correctly in the service processor. If
the warning is ignored and the date/time is not corrected, the next IPL
attempt will complete to the OS with the time reverted to the Epoch
time and date. This problem is very rare but it has been known to
occur on service processor replacements when the repair step to set the
date and time on the new service processor was inadvertently skipped by
the service representative.
- On systems using PowerVM firmware with PowerVM NovaLink, a
problem was fixed for a lost of a communications channel between the
hypervisor and the PowerVM NovaLink during a reset of the service
processor. Various NovaLink tasks, including deploy, could fail
with a "No valid host was found" error. With the fix, PowerVM
NovaLink prevents normal operations from being impacted by a reset of
the service processor.
- On systems using PowerVM firmware, a problem was fixed for
a rare system hang caused by a process dispatcher deadlock timing
window. If this problem occurs, the HMC will also go to an
"Incomplete" state for the managed system.
- On systems using PowerVM firmware, a problem
was fixed for communication failures on adapters in SR-IOV shared
mode. This communication failure only occurs when a logical
port's VLAN ID ( PVID) is dynamically changed from non-zero to
zero. An SR-IOV logical port is an I/O device created for a
partition or a partition profile using the management console (HMC)
when a user intends for the partition to access an SR-IOV adapter
Virtual Function. The error can be recovered from by a reboot of
the partition.
This fix updates adapter firmware to 10.2.252.1929, for the following
Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K,
EN0L, EL38, EL3C, EL56, and EL57.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems using PowerVM firmware, a problem was fixed for
error logs not getting sent to the OS running in a
partition. This problem could occur if the error log buffer
was full in the hypervisor and then a re-IPL of the system
occurred. The error log full condition was persisting across the
re-IPL, preventing further logs from being sent to the OS.
- On systems using OPAL firmware, Skiboot was updated to
V5.4.8 from V5.4.6, providing the following fixes:
- A problem was fixed for an intermittent host freeze during a
reset/reload of the service processor. The host will resume
normal operations after the reset/reload has completed. To have
this error occur, a timing window has to be hit where a
synchronous message from the host is in progress to the service
processor at the same time a reset/reload is initiated.
- A problem was fixed for IPMI Serial Over Lan (SOL) console
disconnects to prevent Host process hangs related to the console
management for output buffers and error logging. If there is a
reset of the service processor and the console was active, the console
session is now closed to free all the console resources.
- A problem was fixed for "FSP: Unhandled message eb0500" error
message. This is a command sent by the FSP to OPAL to get vNVRAM
statistics. Since OPAL maintains no NVRAM statistics, it now
returns FSP_STATUS_INVALID_SUBCMD with its new handler. Sample of
OPAL log that will no longer occur with the fix:
[16944.384670488,3] FSP: Unhandled message eb0500
[16944.474110465,3] FSP: Unhandled message eb0500
- A problem was fixed for sending false messages for "Reassociating
HVSI console" when the console is not available. These message
are no longer issued for unavailable consoles:
5013.227994012,7] FSP: Reassociating HVSI console 1
[ 5013.227997540,7] FSP: Reassociating HVSI console 2
- A problem was fixed for a Delayed Power Off (DPO) failure that
occurred if the service processor reset right after the request.
With the fix, the DPO and normal shutdowns will complete on the host
without regard to service processor state changes that occur after the
request.
- On systems using OPAL firmware, Petitboot was updated to
V1.4.4 from V1.4.2, providing the following fixes:
- A problem was fixed for line truncation on the Petitboot screen
occurring for any line that had a multibyte character in it.
- A problem was fixed for the safe mode message not clearing even after
"Rescan Devices" button in safe mode was pressed and re-initialization
completed successfully.
- A problem was fixed for Petitboot configuration for boot order and
network settings being cleared when the user just wanted to clear the
IPMI override. With the fix, the IPMI override is cleared and
safe mode is exited, if active, without modifying the rest of the
configuration.
- On systems using PowerVM firmware, a problem was fixed in
the text for the Firmware License agreement to correct a link that
pointed to a URL that was not specific to microcode licensing.
The message is displayed for a machine during its initial power
on. Once accepted, the message is not displayed again. The
fixed link in the licensing agreement is the following: http://www.ibm.com/support/docview.wss?uid=isg3T1025362.
|
SV860_109_056 / FW860.31
08/30/17 |
Impact: Availability
Severity: ATT
System
firmware changes that
affect certain systems
- A problem was fixed for intermittent high-temperature
induced link failures on the 100GB EDR IB, NIC, and RoCE adapters
caused by system fans running at too low of a speed. These
adapters include the PCIe3 1-port and 2-port 100Gb EDR IB x16 adapters
and the PCIe3 2-port 100GbE (NIC and RoCE) QSFP28 x16 adapter with
feature codes EC3E, EC3F, EC3L, EC3M, EC3T, and EC3U. EDR IB
(Enhanced Data Rate Infiniband), NIC (Network Interface Controller),
and IBTA RoCE (Remote Direct Memory Access (RDMA) over Converged
Ethernet) are the specific network standards supported in the adapters.
This problem does not apply
to the E850 (8408-E8E) or the E850 (8408-44E) models.
|
SV860_103_056 / FW860.30
06/30/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
New features and functions
- Support was added for Redfish API to allow the ISO 8610
extended format for the time and date so that the date/time can be
represented as an offset from UTC (Universal Coordinated Time).
- Support for the Redfish API for power and thermal
properties for the chassis. The new URIs are as follows::
https://<fsp ip>/redfish/v1/Chassis/<id>/Power :
Provides fan data
https://<fsp ip>/redfish/v1/Chassis/<id>/Thermal : Provides
power supply data
Only the Redfish GET operation is supported for these resources.
System firmware changes that affect all systems
- A problem was fixed
for service actions with SRC B150F138 missing an Advanced System
Management Interface (ASMI) Deconfiguration Record. The
deconfiguration records make it easier to organize the repairs that are
needed for the system and they need to be consistent with the periodic
maintenance reminders that are logged for the failed FRUs.
- A problem was fixed for a false 1100026B1 (12V power good
failure) caused by an I2C bus write error for a LED state. This
error can be triggered by the fan LEDs changing state.
- A problem was fixed for a fan LED turning amber on solid
when there is no fan fault, or when the fan fault is for a different
fan. This error can be triggered anytime a fan LED needs to
change its state. The fan LEDs can be recovered to a normal state
concurrently using the following link steps for a soft reset of the
service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
- A problem was fixed for a system termination and outage
caused by a corrupted system reset type. For cases where the
system reset type cannot be identified, the service processor will now
do a reset/reload to keep the system running. This is a rare
problem that is occurring during an error/recovery situation that
involves a reset of the service processor.
- A problem was fixed for sporadic blinking amber LEDs for
the system fans with no SRCs logged. There was no problem with
the fans. The LED corruption occurred when two service processor
tasks attempted to update the LED state at the same time. The fan
LEDs can be recovered to a normal state concurrently using the
following link steps for a soft reset of the service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
- A problem was fixed for a Redfish Patch on the "Chassis" or
"IBMEnterpriseComputerSystem" with empty data that caused a "500
Internal Server Error". Validation for the empty data case has
been added to prevent the server error.
- A problem was fixed for the loss of Operations Panel
function 30 (displaying ethernet port HMC1 and HMC2 IP addresses)
after a concurrent repair of the Operations Panel.
Operations Panel function 30 can be restored concurrently using
the following link steps for a soft reset of the service
processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
- A problem was fixed for a core dump of the rtiminit
(service processor time of day) process that logs an SRC B15A3303
and could invalidate the time on the service processor. If the
error occurs while the system is powered on, the hypervisor has the
master time and will refresh the service processor time, so no action
is needed for recovery. If the error occurs while the system is
powered off, the service processor time must be corrected on the
systems having only a single service processor. Use the following
steps from the IBM Knowledge Center to change the UTC time with the
Advanced System Management Interface: https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hby/viewtime.htm.
- A problem was fixed for the service processor boot
watch-dog timer expiring too soon during DRAM initialization in the
reset/reload, causing the service processor to go unresponsive.
On systems with a single service processor, the SRC B1817212 was
displayed on the control panel. For systems with redundant
service processors, the failing service processor was
deconfigured. To recover the failed service processor, the system
will need to be powered off with AC powered removed during a regularly
scheduled system service action. This problem is intermittent and
very infrequent as most of the reset/reloads of the service processor
will work correctly to restore the service processor to a normal
operating state.
- A problem was fixed for host-initiated resets of the
service processor causing the system to terminate. A prior fix
for this problem did not work correctly because some of the
host-initiated resets were being translated to unknown reset types that
caused the system to terminate. With this new correction for
failed host-initiated resets, the service processor will still be
unresponsive but the system and partitions will continue to run.
On systems with a single service processor, the SRC B1817212 will be
displayed on the control panel. For systems with redundant
service processors, the failing service processor will be
deconfigured. To recover the failed service processor, the system
will need to be powered off with AC powered removed during a regularly
scheduled system service action. This problem is intermittent and
very infrequent as most of the host-initiated resets of the service
processor will work correctly to restore the service processor to a
normal operating state.
- A problem was fixed for a service processor reset triggered
by a spurious false IIC interrupt request in the kernel. On
systems with a single service processor, the SRC B1817201 is displayed
on the Operator Panel. For systems with redundant service
processors, an error failover to the backup service processor
occurs. The problem is extremely infrequent and does not impact
processes on the running system.
- A problem was fixed for an incorrect Redfish error message
when trying to use the $metadata URI: "The resource at the
URI https://<systemip>/redfish/v1/%24metadata was not found.".
This %24 is meaningless. The "%24" has been replaced with a "$"
in the error message. The Redfish $metadata URI is not supported.
- A problem was fixed so that IPMI boot parameters are not
cleared after a service processor reset or loss of AC power to the
system.
- A problem was fixed for serializing concurrent requests for
the IPMI serial over LAN (SOL) console that were causing a service
processor hang with a subsequent Host-Initiated Reset/Reload for
service processor.
System firmware changes that affect certain systems
- DEFERRED: On
systems using PowerVM firmware, a problem was fixed for PCIe3 I/O
expansion drawer (#EMX0) link improved stability. The settings
for the continuous time linear equalizers (CTLE) was updated for all
the PCIe adapters for the PCIe links to the expansion drawer. The
system must be re-IPLed for the fix to activate.
- On systems using the OPAL firmware, a problem was fixed for
an IPMI console hang to OPAL that caused the Linux host to be hung for
SSH sessions and for ipmitool commands to fail with "Error in open
session response message : insufficient resources for session" error
messages on the service processor. An error log with
SRC B1818601 is reported for the service processor IPMI failure
and multiple SRC BB822210 error logs are reported for OPAL
message time outs to the service processor. In most cases, this
error can be recovered from by doing a soft reset of the service
processor using the following steps from the IBM Knowledge
Center:
https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
- On systems using
OPAL firmware, a problem was fixed for intermittent long delays in the
NX co-processor for asynchronous requests such as NX 842
compressions. This problem was observed for PowerVM AIX DB2 when
it was doing hardware-accelerated compressions of data but could occur
on any asynchronous request to the NX co-processor. The PowerVM
version of the fix was delivered in FW860.00.
- On systems using PowerVM firmware with a Linux Little
Endian (LE) partition, a problem was fixed for system reset interrupts
returning the wrong values in the debug output for the NIP and MSR
registers. This problem reduces the ability to debug hung Linux
partitions using system reset interrupts. The error occurs every
time a system reset interrupt is used on a Linux LE partition.
- On systems using PowerVM firmware, a problem was fixed for
"Time Power On" enabled partitions not being capable of suspend and
resume operations. This means Live Partition Mobility (LPM) would
not be able to migrate this type of partition. As a workaround,
the partition could be transitioned to a "Non-time Power On" state and
then made capable of suspend and resume operations.
- On systems using PowerVM firmware, a problem was fixed for
manual vNIC failovers (from the HMC, manually "Make the Backing Device
Active") so that the selected server was chosen for the failover,
regardless of its priority. With the problem, the server chosen
for the VNIC failover will be the one with the most favorable
priority.
There are two possible workarounds to the problem:
(1) Disable auto-priority-failover; Change priority to the server that
is needed as the target of the failover; Force the vNIC failover;
Change priority back to original setting.
(2) Or use auto-priority-failover and change the priority so the server
that is needed as the target of the failover is favored.
- On systems using PowerVM firmware, a problem was fixed for
extra error logs in the VIOS due to failovers taking place while the
client vNIC is inactive. The inactive client vNIC failovers are
skipped unless the force flag is on. With the problem occurring,
Enhanced Error Handling (EEH) Freeze/Temporary Error/Recovery logs
posted in the VIOS error log of the client partition boot can be
ignored unless an actual problem is experienced.
- On systems using PowerVM firmware, a problem was fixed for
a Live Partition Mobility (LPM) migration abort and reboot on the
FW860 target CEC caused by a mismatched address space for the
source and target partition. The occurrence of this problem is
very rare and related to performance improvements made in the memory
management on the FW860 system that exposed a timing window in the
partition memory validation for the migration. The reboot of the
migrated partition recovers from the problem as the migration was
otherwise successful.
- On systems using PowerVM firmware, a problem was fixed for
reboot retries for IBM i partitions such that the first load source I/O
adapter (IOA) is retried instead of bypassed after the first failed
attempt. The reboot retries are done for an hour before the
reboot process gives up. This error can occur if there is more
than one known load source, and the IOA of the first load source is
different from the IOA of the last load source. The error can be
circumvented by retrying the boot of the partition after the load
source device has become available.
- On systems using PowerVM firmware, a problem was fixed for
adapters failing to transition to shared SR-IOV mode on the IPL after
changing the adapter from dedicated mode. This intermittent
problem could occur on systems using SR-IOV with very large memory
configurations.
- On systems using PowerVM firmware, a problem
was fixed for SR-IOV adapters in shared mode for a transmission stall
or time out with SRC B400FF01 logged. The time out happens during
Virtual Function (VF) shutdowns and during Function Level Resets (FLRs)
with network traffic running.
This fix updates adapter firmware to 10.2.252.1927, for the following
Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K,
EN0L, EL38, EL3C, EL56, and EL57.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems with maximum memory configurations (where every
DIMM slot is populated - size of DIMM does not matter), a problem
has been fixed for systems losing performance and going into Safe mode
(a power mode with reduced processor frequencies intended to protect
the system from overheating and excessive power consumption) with
B1xx2AC3/B1xx2AC4 SRCs logged. This happened because of On-Chip
Controller (OCC) timeout errors when collecting Analog Power Subsystem
Sweep (APSS) data, used by the OCC to tune the processor
frequency. This problem occurs more frequently on systems that
are running heavy workloads. Recovery from Safe mode back to
normal performance can be done with a re-IPL of the system, or
concurrently using the following link steps for a soft reset of the
service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
To check or validate that Safe mode is not active on the system will
require a dynamic celogin password from IBM Support to use the service
processor command line:
1) Log into ASMI as celogin with dynamic celogin password
generated by IBM Support
2) Select System Service Aids
3) Select Service Processor Command Line
4) Enter "tmgtclient --query_mode_and_function" from the command line
The first line of the output, "currSysPwrMode" should say "NOMINAL" and
this means the system is in normal mode and that Safe mode is not
active.
- A problem has been fixed for systems losing
performance and going into Safe mode (a power mode with reduced
processor frequencies intended to protect the system from overheating
and excessive power consumption) with B1xx2AC3/B1xx2AC4 SRCs
logged. This happened because of an On-Chip Controller (OCC)
internal queue overflow. The problem has only been observed for systems
running heavy workloads with maximum memory configurations (where every
DIMM slot is populated - size of DIMM does not matter), but this may
not be required to encounter the problem. Recovery from Safe mode
back to normal performance can be done with a re-IPL of the system, or
concurrently using the following link steps for a soft reset of the
service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
To check or validate that Safe mode is not active on the system will
require a dynamic celogin password from IBM Support to use the service
processor command line:
1) Log into ASMI as celogin with dynamic celogin password
generated by IBM Support
2) Select System Service Aids
3) Select Service Processor Command Line
4) Enter "tmgtclient --query_mode_and_function" from the command line
The first line of the output, "currSysPwrMode" should say "NOMINAL" and
this means the system is in normal mode and that Safe mode is not
active.
- On systems using PowerVM firmware, a problem
was fixed for a partition boot from a USB 3.0 device that has an error
log SRC BA210003. The error is triggered by an Open Firmware
entry to the trace buffer during the partition boot. The error
log can be ignored as the boot is successful to the OS.
- On systems using PowerVM firmware, a problem
was fixed for a partition boot fail or hang from a Fibre Channel device
having fabric faults. Some of the fabric errors returned by the
VIOS are not interpreted correctly by the Open Firmware VFC drive,
causing the hang instead of generating helpful error logs.
- On systems using PowerVM firmware, a problem
was fixed for a power off hanging at D200C1FF caused by a vNIC VF
failover error with SRC B200F011. The power off hang error is
infrequent because it requires that a VF failover error having occurred
first. The system can be recovered by using the power off
immediate option from the Hardware Management Console (HMC).
- On systems using PowerVM firmware, a problem was fixed for
the incorrect reporting of the Universally Unique Identifier (UUID) to
the OS, which prevented the tracking of a partition as it moved within
a data center. The UUID value as seen on HMC or the NovaLink did
not match the value as displayed in the OS.
- On systems using OPAL firmware, a problem was fixed
for an IPMI console hang to OPAL that caused the Linux host to be hung
for SSH sessions and for ipmitool commands to fail with "Error in open
session response message: insufficient resources for session" error
messages on the service processor. An error log with
SRC B1818601 is reported for the service processor IPMI failure
and multiple SRC BB822210 error logs are reported for OPAL
message timeouts to the service processor. In most cases, this
error can be recovered from by doing a soft reset of the service
processor using the following steps from the IBM Knowledge
Center: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
- On systems using the OPAL firmware, Petitboot was updated
to v1.4.2 from V1.2.7, including the following update:
A problem was fixed for the User Interface server connect message to
make it more clear. The current message mentions a "server" which
can give the misleading impression that the user interface is waiting
for a remote network server. The delay is actually in waiting for the
pb-discover process to be ready.
More information for the Petitboot changes can be found at the
following link: http://git.ozlabs.org/?p=petitboot;a=tags
- On systems using OPAL firmware, Skiboot was updated to
v5.4.6 from V5.3.7, including the following updates:
- Fix setting of firmware progress sensor properly. OPAL was
incorrectly setting firmware status on a sensor id "00" which doesn't
exist.
- Fix error log timeout to only timeout on the send of the error log to
the service processor. This will significantly reduce false time
out errors.
- A problem was fixed for excessive "Poller recursion detected" error
messages during the skiboot that could require a power off to recover
from the error.
- A problem was fixed for an unnecessary error message when a reset
occurs on an empty PCIe Host Bridge (PHB) - no PCIe adapters
attached. The extra error message occurs anytime the PHBs in the
system go through error recovery.
- A problem was fixed to fence off an errant PCIe Host Bridge (PHB)
during a complete reset to allow the kernel to retry the
operation. This helps the system recovery process by guarding out
the bad hardware to prevent a fatal error loop.
- A problem was fixed for unknown command messages in the OPAL
log after a Host-Initiated Reset/Reload of the service processor.
- A problem was fixed the I2C bus locking that sometimes caused an OPAL
crash with double unlock() detected.
- A problem was fixed for OPAL kernel lockups when the IPMI SOL console
became unresponsive. The console can become full now and drop
messages but this prevents the lock-up of the Host kernel.
- A problem was fixed service processor time-out messages being
interpreted as "success" by OPAL, preventing correct error reporting
and recovery actions.
- A problem was fixed for a kernel hang caused by queued messages
needing to be sent to the service processor during a reset/reload of
the service processor. The messages are now cached and sent when
the service processor is ready to receive after a reset/reload.
- A problem was fixed for a soft lockup of the kernel that occurred
because of RTC/TOD clock errors during a Host-initiated Reset/Reload of
the service processor. A frozen process would be seen on the host
system along with this message: "NMI watchdog: BUG: soft
lockup - CPU#57 stuck for 23s!" where the CPU number would vary.
More information on the Skiboot changes can be found at the following
link: https://github.com/open-power/skiboot/tree/master/doc/release-notes.
- For the IBM Power System E850 (8408-44E), a problem was
fixed for the power supply with feature #EB3M and part number
001KU578 for fans spinning too slowly with SRC 110015xf logged,
where x is 1,2,3, or 4 depending on which power supply has the failing
fan.
- On systems using PowerVM firmware, a problem was fixed for
an error finding the partition load source that has a GPT format.
GUID Partition Table (GPT) is a standard for the layout of the
partition table on a physical storage device used in the server, such
as a hard disk drive or solid-state drive, using globally unique
identifiers (GUID). Other drives that are working may be using
the older master boot record (MBR) partition table format. This
problem occurs whenever load sources utilizing the GPT format occur in
other than the first entry of the boot table. Without the fix, a
GPT disk drive must be the first entry in the boot table to be able to
use it to boot a partition.
- On systems using PowerVM firmware, a problem was fixed for
an SRC BA090006 serviceable event log occurring whenever an attempt was
made to boot from an ALUA (Asymmetric Logical Unit Access)
drive. These drives are always busy by design and cannot be used
for a partition boot, but no service action is required if a user
inadvertently tries to do that. Therefore, the SRC was changed to
be an informational log.
|
SV860_096_056 / FW860.21
06/07/17 |
Impact: Availability
Severity: ATT
Power
System S812L (8247-21L), Power
System S822L (8247-22L) and Power System S824L (8247-42L)
servers only.
System firmware changes that affect certain systems
- On systems using
the OPAL firmware, a problem was fixed for an IPMI console hang to OPAL
that caused the Linux host to be hung for SSH sessions and for ipmitool
commands to fail with "Error in open session response message :
insufficient resources for session" error messages on the service
processor. An error log with SRC B1818601 is reported
for the service processor IPMI failure and multiple SRC BB822210
error logs are reported for OPAL message time outs to the service
processor. In most cases, this error can be recovered from by
doing a soft reset of the service processor using the following steps
from the IBM Knowledge Center: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
|
SV860_082_056 / FW860.20
03/17/17 |
Impact: Availability
Severity: SPE
Power
System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S812
(8284-21A), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A); Power
System E850 (8408-E8E) and Power
System E850C (8408-44E) servers
only.
New features and functions
- Support for the Redfish API for provisioning of Power
Management tunable (EnergyScale) parameters. The Redfish Scalable
Platforms Management API ("Redfish") is a DMTF specification that uses
RESTful interface semantics to perform out-of-band systems
management. (http://www.dmtf.org/standards/redfish).
Redfish service enables platform management tasks to be controlled by
client scripts developed using secure and modern programming paradigms.
For systems with redundant service processors, the Redfish service is
accessible only on the primary service processor. Usage
information for the Redfish service is available at the following
IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hdx/p8_workingwithconsoles.htm.
The IBM Power server supports DMTF Redfish API (DSP0266, version 1.0.3
published 2016-06-17) for systems management.
A copy of the the Redfish schema files in JSON format published by the
DMTF (http://redfish.dmtf.org/schemas/v1/)
are packaged in the firmware image.
The schema files are distributed on chip to enable proper functioning
in deployments with no WAN connectivity.
IBM extensions to the Redfish schema are published at http://public.dhe.ibm.com/systems/power/redfish/schemas/v1.
Copyright notices for the DMTF Redfish API and schemas are at: (a) http://www.dmtf.org/about/policies/copyright,
and (b) http://redfish.dmtf.org/schemas/README8010.html.
- Support for the IBM Power System S812 (8284-21A) with a
single partition system running either AIX (FC #EPXQ 4-core
3.026GHz 130W module, CCIN 54E9) or IBM i (FC #EPXP, 1-core 3.026GHz
130W module, CCIN 54E9) for the operating system.
- Support added to reduce memory usage for shared SR-IOV
adapters.
- Support for the Advanced System Management Interface (ASMI)
was changed to allow the special characters of "I", "O", and "Q" to be
entered for the serial number of the I/O Enclosure under the Configure
I/O Enclosure option. These characters have only been found in an
IBM serial number rarely, so typing in these characters will normally
be an incorrect action. However, the special character entry is
not blocked by ASMI anymore so it is able to support the exception
case. Without the enhancement, the typing of one of the special
characters causes message "Invalid serial number" to be displayed.
- On systems using PowerVM firmware, support was added to
allow the IBM i OS on the Power System S822 (8284-22A) without the need
for a VET code.
System firmware changes that affect all systems
- A problem was fixed
for the setting the disable of a periodic notification for a call home
error log SRC B150F138 for Memory Buffer resources (membuf) from the
Advanced System Management Interface (ASMI).
- A problem was fixed for the call home data for the B1xx2A01
SRC to include the min/max/average readings for more values. The
values for processor utilization, memory utilization, and node power
usage were added.
- A problem was fixed for incorrect callouts of the Power
Management Controller (PMC) hardware with SRC B1112AC4 and SRC
B1112AB2 logged. These extra callouts occur when the On-Chip
Controller (OCC) has placed the system in the safe state for a prior
failure that is the real problem that needs to be resolved.
- A problem was fixed for System Vital Product Data (SVPD)
FRUs being guarded but not having a corresponding error log
entry. This is a failure to commit the error log entry that has
occurred only rarely.
- A problem was fixed for the failover to the backup PNOR on
a Hostboot Self Boot Engine (SBE) failure. Without the fix, the
failed SBE causes loss of processors and memory with B15050AD
logged. With the fix, the SBE is able to access the backup PNOR
and IPL successfully by deconfiguring the failing PNOR and calling it
out as a failed FRU.
- A problem was fixed for the OS not being able to detect the
USB connected Uninterruptible Power Supply (UPS) that has feature code
#ECCF. An informational SRC B1814616 is logged from the service
processor and the IBM i OS logs a CPI0961 (Uninterruptible power supply
no longer attached). The error occurs infrequently because it
depends on system timing and system configuration. If a system is
having the error, it might have it on every IPL. The
circumvention is to reseat the USB cable connector for the USB
connected UPS.
- A problem was fixed for the Advanced System Management
Interface (ASMI) "System Service Aids => Error/Event Logs" panel not
showing the "Clear" and "Show" log options and also having a truncated
error log when there are a large number of error logs on the system.
- A problem was fixed for IPMI process core dumps for DCMI
commands used to gather power and thermal data. These dumps occur
intermittently if the DCMI commands are used in a repetitive loop.
- A problem was fixed to allow changing the IPMI channel
authentication capabilities from the OS. The following command
was causing an IPMI core dump "ipmitool channel authcap 1 4" every time
it was run.
- A problem was fixed a system going into safe mode with SRC
B1502616 logged as informational without a call home
notification. Notification is needed because the system is
running with reduced performance. If there are unrecoverable
error logs and any are marked with reduced performance and the system
has not been rebooted, then the system is probably running in safe mode
with reduced performance. With the fix, the SRC B1502616 is a
Unrecoverable Error (UE).
- A problem was fixed for valid IPv4 static IP addresses not
being allowed to communicate on the network and not being allowed to be
configured.
The Advanced System Management Interface (ASMI) static IPv4
address configuration was not allowing "255" in the IP address
subfields. The corrected range checking is as follows:
Allowed values: x.255.x.x, x.x.255.x, x.255.255.x
Disallowed values: x.x.x.255
The failure for the communication on the network is seen if the
problematic IP addresses are in use prior to a firmware update to
860.00, 860.10, 860.11, or 860.12. After the firmware update, the
service processor is unable to communicate on the network. The
problem can be circumvented by changing the service processor to use
DHCP addressing, or by moving the IP address to a different static IP
range, prior to doing the firmware update.
- A problem was fixed for DCMI commands intermittent failures
when used from the HMC to continuously gather power and thermal
data. The maximum number of IPMI sessions was being exceeded by
the HMC. The number of IPMI sessions has been increased to allow
two HMCs to collect data simultaneously.
- A problem was fixed for an unneeded service action request
for a informational VRM redundant phase fail error logged with SRC
11002701. If reminders for service action with SRC B150F138
are occurring for this problem, then firmware containing the fix needs
to be installed and ASMI error logs need to be cleared in order to stop
the periodic reminder.
System firmware changes that affect certain systems
- On systems using
PowerVM firmware with PowerVM NovaLink, a problem was fixed for
returning to HMC-only management from co-management when a
Novalink partition is deleted holding the master mode. A
circumvention is to release master mode before deleting the NovaLink
partition and then reconnect the disconnected management console.
Please refer to IBM Knowledge Center link "http://ibm.biz/novalink-kc" for
more information on the PowerVM NovaLink feature and changing the
master authority when doing co-management.
- On systems using PowerVM firmware, a problem was
fixed for a blank SRC in the LPA dump for user-initiated non-disruptive
adjunct dumps. The A2D03004 SRC is needed for problem
determination and dump analysis.
- A problem was fixed for the system VPD showing 4 extra PCIe
slots that are not actually available to the system. When running
an IBM i partition, the IBM i Hardware Service Manager shows twelve
PCIe adapter slots instead of the actual eight that can be used (P1-C2,
P1-C3, P1-C4, and P1-C5 are the extra slots displayed). This
problem only pertains to the IBM Power System S814 (8286-41A).
- On a system using PowerVM firmware with an IBM i partition
and VIOS, a problem was fixed for a Live Partition Mobility
migration for a IBM i partition that fails if there is a VIOS failover
during the migration suspended window.
- On a system using PowerVM firmware and VIOS, a
problem was fixed for a HMC "Incomplete State" after a Live Partition
Mobility migration followed by a VIOS failover. The error is
triggered by a delete operation on a migration adapter on the VIOS that
did the failover. The HMC "Incomplete State" can be recovered
from by doing a re-IPL of the system. This error can also prevent
a VIOS from activating.
- On systems using PowerVM firmware, a problem was fixed with
SR-IOV adapter error recovery where the adapter is left in a failed
state in nested error cases for some adapter errors. The
probability of this occurring is very low since the problem trigger is
multiple low-level adapter failures. With the fix, the adapter is
recovered and returned to an operational state.
- On systems using PowerVM firmware with PCIe adapters
in Single Root I/O Virtualization (SR-IOV) shared mode, a problem was
fixed for the hypervisor SR-IOV adjunct partition failing during the
IPL with SRCs B200F011 and B2009014 logged. The SR-IOV adjunct
partition successfully recovers after it reboots and the system is
operational.
- On systems using PowerVM firmware with PCIe adapters in
Single Root I/O Virtualization (SR-IOV) shared-mode in a PCIe slot with
Enlarged IO Capacity and 2TB or more of system memory, a problem was
fixed for the hypervisor SR-IOV adjunct partition failing during
the IPL with SRCs B200F011 and B2009014 logged. In this
configuration, it is possible the SR-IOV adapter will not become
functional following a system reboot or when an adapter is first
configured into shared-mode. Larger system memory configurations
of 2TB or more than 1TB are more likely to encounter the problem.
The problem can be avoided by reducing the number of PCIe slots with
Enlarged IO Capacity enabled so it does not include adapters in SR-IOV
shared-mode. Another circumvention option is to move the adapter
to an SR-IOV capable PCIe slot where Enlarged IO Capacity is not
enabled.
- On a system using PowerVM firmware and VIOS, a
problem was fixed for a Live Partition Mobility (LPM) migration for an
Active Memory Sharing (AMS) partition that hangs if there is a VIOS
failover during the migration.
- On systems using PowerVM firmware, a problem was fixed for
the PCIe3 Optical Cable Adapter for the PCIe3 Expansion Drawer failing
with SRC B7006A84 error logged during the IPL. The failed cable
adapter can be recovered by using a concurrent repair operation to
power it off and on. Or the system can be re-IPLed to
recover the cable adapter. The affected optical cable adapters
have feature codes #EJ05, #EJ06, and #EJ08 with CCINs 2B1C, 6B52, and
2CE2, respectively.
- On systems using PowerVM firmware, the hypervisor "vsp"
macro was enhanced to show the type of the adjunct partition. The
"vsp -longname" macro option was also updated to list the location
codes for the SR-IOV adjunct partitions. The hypervisor macros
are used by IBM support to help debug Power system problems.
- On systems using PowerVM firmware, a problem was fixed for
PCIe Host Bridge (PHB) outages and PCIe adapter failures in the PCIe
I/O expansion drawer caused by error thresholds being exceeded for the
LEM bit [21] errors in the FIR accumulator. These are typically
minor and expected errors in the PHB that occur during adapter updates
and do not warrant a reset of the PHB and the PCIe adapter
failures. Therefore, the threshold LEM[21] error limit has been
increased and the LEM fatal error has been changed to a Predictive
Error to avoid the outages for this condition.
- On systems using PowerVM firmware, a problem was fixed for
PCIe3 I/O expansion drawer (#EMX0) link improved stability. The
settings for the continuous time linear equalizers (CTLE) was updated
for all the PCIe adapters for the PCIe links to the expansion
drawer. The CEC must be re-IPLed for the fix to activate.
- On systems using PowerVM firmware with IBM i partitions, a
problem was fixed for frequent logging of informational B7005120 errors
due to communications path closed conditions during messaging from HMCs
to IBMi partitions. In the majority of cases these errors are due
to normal operating conditions and not due to errors that require
service or attention. The logging of informational errors due to
this specific communications path closed condition that are the result
of normal operating conditions has been removed.
- On a system using PowerVM firmware with an IBM i
partition, a problem was fixed for a D-mode boot failure for IBM
i from an USB RDX cartridge. There is a hang at the LPAR
progress code C2004130 for a period of time and then a failure with SRC
B2004158 logged. There is a USB External Dock (FC #EU04) and
Removable Disk Cartridge (RDX) 63B8-005 attached. The error is
intermittent so the RDX can be powered off and back on to retry the
D-mode boot to recover.
- On systems using the OPAL firmware, Petitboot was updated
to v1.2.7. It is is now less verbose during boot - only
error-level messages are printed during Petitboot bootloader
initialization. This means that there will be fewer messages
printed as the system boots. Additionally, the Petitboot user interface
is started earlier in the boot process. This means that the user will
be presented with the user interface sooner, but it may still take
time, potentially up to 30 seconds, for the user interface to be
populated with boot options as storage and network hardware is being
initialized. During this time, Petitboot will show the status
message "Info: Waiting for device discovery". When Petitboot
device discovery is completed, the following status message will be
shown "Info: Connected to pb-discover!".
- On systems using PowerVM firmware, the following
problems were fixed for SR-IOV adapters:
1) Insufficient resources reported for SR-IOV logical port configured
with promiscuous mode enable and a Port VLAN ID (PVID) when creating
new interface on the SR-IOV adapters.
2) Spontaneous dumps and reboot of the adjunct partition for SR-IOV
adapters.
3) Adapter enters firmware loop when single bit ECC error is
detected. System firmware detects this condition as a adapter
command time out. System firmware will reset and restart the
adapter to recover the adapter functionality. This condition will
be reported as a temporary adapter hardware failure.
4) vNIC interfaces not being deleted correctly causing SRC B400FF01 to
be logged and Data Storage Interrupt (DSI) errors with
failiure on boot of the LPAR.
This set of fixes updates adapter firmware to 10.2.252.1926, for the
following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M,
EN0N, EN0K, EN0L, EL38 , EL3C, EL56, and EL57.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems using PowerVM firmware with an IBM i partition,
a problem was fixed for incorrect maximum performance reports based on
the wrong number of "maximum" processors for the system.
Certain performance reports that can be generated on IBMi systems
contain not only the existing machine information, but also "what-if"
information, such as "how would this system perform if it had all the
processors possible installed in this system". This "what-if"
report was in error because the maximum number of processors possible
was too high for the system.
- On systems using PowerVM firmware, a problem was fixed for
degraded PCIe3 links for the PCIe3 expansion drawer with SRC B7006A8F
not being visible on the HMC. This occurred because the SRC was
informational. The problem occurs when the link attaching a
drawer to the system trains to x8 instead of x16. With the fix,
the SRC has been changed to a B70006A8B permanent error for the
degraded link.
- On systems using PowerVM firmware, a problem was fixed for
a concurrent exchange of a CAPI adapter that left the new adapter in a
deactivated state. The system can be powered off and IPLed
again to recover the new adapter. The CAPI adapters have the
following feature codes: #EC3E, #EC3F, #EC3L, #EC3M, #EC3T,
#EC3U, #EJ16, #EJ17, #EJ18, #EJ1A, and #EJ1B.
- On a system using PowerVM firmware with SR-IOV
adapters, a problem was fixed for a DLPAR remove on a Virtual
Function (VF) of a ConnectX-4 (CX4) adapter that failed with AIX error
"0931-013 Unable to isolate the resource". The HMC reported error
is "HSCL12B5 The operation to remove SR-IOV logical port xx
failed because of the following error: HSCL131D The SR-IOV logical port
is still in use by the partition". The failing PCIe3 adapters are
sourced from Mellanox Corporation based on ConnectX-4 technology and
have the following feature codes and CCINs: #EC3E, #EC3F with
CCIN 2CEA; #EC3L and #EC3M with CCIN 2CEC; and #EC3T and #ECTU with
CCIN 2CEB. The issue occurs each time a DLPAR remove operation is
attempted on the VF. Restarting the partition after a failed
DLPAR remove recovers from the error.
- A problem was fixed for the serial port being disabled on
the service processor for the IBM Power System E850
(8408-44E). There is no response when plugging the serial port.
- On systems using PowerVM firmware, a problem was fixed for
NVRAM corruption that can occur when deleting a partition that owns a
CAPI adapter, if that CAPI adapter is not assigned to another partition
before the system is powered off. On a subsequent IPL, the system
will come up in recovery mode if there is NVRAM corruption. To
recover, the partitions must be restored from the HMC. The
frequency of this error is expected to be rare. The CAPI adapters
have the following feature codes: #EC3E, #EC3F, #EC3L, #EC3M,
#EC3T, #EC3U, #EJ16, #EJ17, #EJ18, #EJ1A, and #EJ1B.
- On systems using PowerVM firmware, a problem was fixed for
NVRAM corruption and a HMC recovery state when using Simplified Remote
Restart partitions. The failing systems will have at least one
Remote Restart partition and on the failed IPL there will be a
B70005301 SRC with word 7 being 0X00000002.
- On systems using PowerVM firmware, a problem was fixed for
a group of shared processor partitions being able to exceed the
designated capacity placed on a shared processor pool. This error
can be triggered by using the DLPAR move function for the shared
processor partitions, if the pool has already reached its maximum
specified capacity. To prevent this problem from occurring when
making DLPAR changes when the pool is at the maximum capacity, do not
use the DLPAR move operation but instead break it into two steps:
DLPAR remove followed by DLPAR add. This gives enough time for
the DLPAR remove to be fully completed prior to starting the DLPAR add
request.
- On systems using PowerVM firmware, a problem was fixed for
partition boot failures and run time DLPAR failures when adding I/O
that log BA210000, BA210003, and/or BA210005 errors. The fix also
applies to run time failures configuring an I/O adapter following an
EEH recovery that log BA188001 events. The problem can impact
IBMi partitions running in any processor mode or AIX/Linux partitions
running in P7 (or older) processor compatibility modes. The
problem is most likely to occur when the system is configured in the
Manufacturing Default Configuration (MDC) mode. The trigger for
the problem is a race-condition between the hypervisor and the physical
operations panel with a very rare frequency of occurrence.
|
SV860_070_056 / FW860.12
01/13/17 |
Impact: Availability
Severity: SPE
The following pertains to Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
System firmware changes that
affect certain systems
- On a system using
PowerVM firmware, a problem was fixed for the System Management
Services (SMS) SAS utility showing very large (incorrect) disk capacity
values depending on the size of the disk or Volume Set/Array. The
problem occurs when the number of blocks on a disk is 2 G or more.
- On a system using PowerVM firmware running a Linux
OS, a problem was fixed for support for Coherent Accelerator
Processor Interface (CAPI) adapters. The CAPI related RTAS
h-calls for the CAPI devices could not be made by the Linux OS,
impacting the CAPI adapter functionality and usability. This
problem involves the following adapters: the PCIe3 LP CAPI
Accelerator Adapter with F/C #EJ16 that is used on the S812L(8247-21L)
and S822L (8247-22L) models; the PCIe3 CAPI FlashSystem
Acclerator Adapter with F/C #EJ17 that is used on the
S814(8286-41A) and S824(8286-42A) models; and the PCIe3 CAPI
FlashSystem Accelerator Adapter with F/C #EJ18 that is used on the
S822(8284-22A), E870(9119-MME), and E880(9119-MHE) models. This
problem does not pertain to PowerVM AIX partitions using CAPI adapters.
- On a system using PowerVM firmware, a problem was fixed for
Live Partition Mobility (LPM) migrations to FW860.10 or FW860.11 from
any other level of firmware (i.e. not FW 860.10 or FW860.11) that
caused errors in the output of the AIX "lsattr -El mem0" command and
Dynamic LPAR (DLPAR) operations. The "lsattr" command will report
the partition only has one logical memory block (LMB) of memory
assigned to it, even though there is more memory assigned to the
partition. Also, as a result of this problem, DLPAR operations
will fail with an error indicating the request could not be
completed. This issue affects AIX 5.3, AIX 6.1, AIX 7.1, AIX 7.2
TL 0, and may result in AIX DLPAR error message "0931-032 Firmware
failure. Data may be out of sync and the system may require
a reboot." This issue also affect all levels of Linux. Not
affected by this issue are AIX 7.2 TL 1, VIOS and IBM i
partitions.
In addition, after performing LPM from FW860 to earlier versions of
firmware, the DLPAR of Virtual Adapters will fail with HMC error
message HSCL294C, which contains text similar to the following:
"0931-007 You have specified an invalid drc_name."
Without the fix, a reboot of the migrated partition will correct the
problem.
- On a system using PowerVM firmware, a problem was fixed for
I/O DLPARs that result in partition hangs. To trigger the
problem, the DLPAR operation must be performed on a partition which has
been migrated via a Live Partition Mobility (LPM) operation from a P6
or P7 system to a P8 system. Additionally, DLPAR of I/O will fail
when performed on a partition which has been migrated via an LPM
operation from a P8 system to a P6 or P7 system. The failure will
produce HMC error message HSCL2928, which contains text similar to the
following: "0931-011 Unable to allocate the resource to the
partition." DLPAR operations for memory or CPU are not affected.
This issue affects all Linux and AIX partitions. IBMi partitions
are not affected.
|
SV860_063_056 / FW860.11
12/05/16 |
Impact: Availability
Severity: SPE
The following pertains to Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
System firmware changes that
affect certain systems
- DEFERRED: A problem
was fixed for a Field Core Override (FCO) error
that causes a processor chip without functional cores to be guarded
with a SRC B111BA24 error logged and by guard association causes all
the memory and I/O resources behind the processor chip to be lost for
the current IPL. This problem is triggered by a system
being manufactured with one or more feature codes of #2319
(Factory Deconfiguration of 1-core) to assist with optimization of
software licensing. For more information on Field Core Override,
refer to IBM Knowledge Center: http://www.ibm.com/support/knowledgecenter/POWER8/p8hby/fieldcore.htm.
The error only occurs in systems where the total number of active cores
is less than the number of processor chips. When the fix is
applied on a system that has lost memory or I/O resources due to the
errant processor guard, the system must be re-IPLed with the guard
removed from the processor to recover the resources.
Without the fix, the problem may be circumvented by the following four
steps:
1) Power off the system.
2) Use the Field Core Override function to increase the number of
active processor cores in the system. The Advanced System
Management Interface (ASMI) "System Configuration -> Hardware
Deconfiguration -> Field Core Override" panel shows the number of
cores that are active in the system and it can be used to increase the
number of active processor cores in the system.
3) Unguard the failed processor. Use the ASMI "System
Configuration -> Hardware Deconfiguration -> Clear All
Deconfiguration Errors" panel to restore the guarded processor.
4) IPL with the increased number of active processor cores and the
unguarded processor.
This problem does not pertain to the IBM Power System E850 (8408-44E)
model.
|
SV860_056_056 / FW860.10
11/18/16 |
Impact:
New
Severity: New
The following pertains to Power System S812L (8247-21L), Power
System S822L (8247-22L), Power System S824L (8247-42L), Power
System S822
(8284-22A), Power System S814 (8286-41A), Power System S824
(8286-42A) and Power System
E850C (8408-44E) servers only.
New features and functions
- Support enabled for Live Partition Mobility (LPM)
operations.
- Support enabled for partition Suspend and Resume from the
HMC.
- Support enabled for partition Remote Restart.
- Support enabled for PowerVM vNIC. PowerVM vNIC combined
many of the best features of SR-IOV and PowerVM SEA to provide a
network solution with options for advanced functions such as Live
Partition Mobility along with better performance and I/O efficiency
when compared to PowerVM SEA. In addition PowerVM vNIC provided
users with bandwidth control (QoS) capability by leveraging SR-IOV
logical ports as the physical interface to the network.
- Support for dynamic setting of the Simplified Remote
Restart VM property, which enables this property to be turned on or off
dynamically with the partition running.
- Support for PowerVM and HMC to get and set the boot
list of a partition.
- Support for PowerVM partition restart in a Disaster
Recovery (DR) environment.
- On systems using PowerVM firmware, support for PCIe3 3D
graphics (F/C #EC51) adapter for Linux boot. Supported Linux OS
distributions are Red Hat Enterprise Linux 7.3 and SLES 12 SP2.
This feature only applies to S822 (8284-22A), S812L (8247-21L), and
S822L (8247-22L) systems.
- Support for concurrent add of a PCIe3 Optical cable card
(#EJ08 and CCIN 2CE2) used to attach the PCIe expansion drawer.
This feature pertains to E850(8408-E8E) and E850 (8408-44E) systems
only.
- Support for concurrent add of a PCIe expansion drawer
(#EMX0) to an existing cable card. This feature pertains to
E850(8408-E8E) and E850 (8408-44E) systems only.
- Support on PowerVM for a partition with 32 TB memory.
AIX, IBM i and Linux are supported but IBM i must be IBM i 7.3.
TR1 IBM i 7.2 has a limit of 16 TB per partition and IBM i 7.1
has a limit of 8 TB per partition. AIX level must be 7.1S or
later. Linux distributions supported are RHEL 7.2 P8, SLES
12 SP1, Ubuntu 16.04 LTS, RHEL 7.3 P8, SLES 12 SP2, Ubuntu
16.04.1, and SLES 11 SP4 for SAP HANA.
- Support for four processors for each IBM i partition with
VIOS (up from limit of two processors) on the IBM Power System S822
(8284-22A).
- Support for PowerVM and PowerNV (non-virtualized or OPAL
bare-metal) booting from a PCIe Non-Volatile Memory express (NVMe)
flash adapter. The adapters include feature codes #EC54 and #EC55
- 1.6 TB, and #EC56 and #EC57 - 3.2 TB NVMe flash adapters
with CCIN 58CB and 58CC respectively.
- Support for PowerVM NovaLink V1.0.0.4 which includes the
following features:
- IBM i network boot
- Live Partition Mobility (LPM) support for inactive source VIOS
- Support for SR-IOV configurations, vNIC, and vNIC failover
- Partition support for Red Hat Enterprise Linux
- Support for a decrease in the amount of PowerVM memory
needed to support Huge Dynamic DMA Window (HDDW) for a PCI slot by
using 64K pages instead of 4K pages. The hypervisor only
allocates enough storage for the Enlarged IO Capacity (Huge Dynamic DMA
Window) capable slots to map every page in main storage with 64K pages
rather than 4K pages as was done previously. This affects only
the Linux OS as AIX and IBM i do not use HDDW.
- Support was enhanced for the Power Linux models to increase
the default number of slots for I/O Adapter Enlarged Capacity PCI slots
from 4 to 13. In 860.10, the new default of 13 Enlarged I/O slots
will use approximately 1.5 GB of storage (which is a factor of 10 less
than what would have been previously required for this many slots,
benefiting by the PowerVM change to 64K pages from 4K pages for HDDW).
Huge DMA is a PCIe slot capability on IBM Power Systems servers that
enables a DMA window to be wider, possibly allowing all the partition
memory to be mapped for DMA. This feature avoids increased system usage
when DMA mappings are requested by the adapter driver, because all the
system memory assigned to the partition is already mapped.
Consequently, this feature enables the data transfer between the I/O
card that is placed in this slot and the system memory to be more
efficient and with lower latency. The performance benefit will vary
based on the operating system and adapter being used. Linux performance
information can be found in the 64-bit DMA performance benefit topic in
the performance section of the IBM Knowledge Center:http://www.ibm.com/support/knowledgecenter/linuxonibm/liabm/liabmconcepts.htm.
This feature enhancement only pertains to the IBM Power System S812L
(8247-21L), S822L (8247-22L) and S824L (8247-42L) models.
- Support added to reduce the number of error logs and
call homes for the non-critical FRUs for the power and thermal faults
of the system.
- Support for redundancy in the the transfer of partition
state for Live Partition Mobility (LPM) migration operations.
Redundant VIOS Mover Service Partitons (MSPs) can be defined along with
redundant network paths at the VIOS/MSP level. When redundant MSP
pairs are used, the migrating memory pages of the logical partition are
transferred from the source system to the target system by using two
MSP pairs simultaneously. If one of the MSP pair fails, the migration
operation continues by using the other MSP pair. In some scenarios,
where a common shared Ethernet adapter is not used, use redundant MSP
pairs to improve performance and reliability.
Note: For a LPM migration for a partition using Advanced Memory
Sharing (AMS) in a dual (redundant) MSP configuration the LPM operation
may hang if the MSP connection fails during the LPM migration. To avoid
this issue that applies only to AMS partitions, the AMS
migrations should only be done from the HMC command line using the
migrlpar command and specifying --redundentmsp 0 to disable the
redundant MSPs.
Note: To use redundant MSP pairs, all VIOS MSPs must be at version
2.2.5.00 or later, the HMC at version 8.6.0 or later, and the firmware
level FW860 or later.
For more information on LPM and VIOS supported levels and restrictions,
refer to the following links on the IBM Knowledge Center:
http://www.ibm.com/support/knowledgecenter/PurePower/p8hc3/p8hc3_firmwaresupportmatrix.htm
https://www.ibm.com/support/knowledgecenter/HW4L4/p8eeo/p8eeo_ipeeo_main.htm
- Support for failover capability for vNIC client adapters in
the PowerVM hypervisor, rather than requiring the failover
configuration to be done in the client OS. To create a redundant
connection, the HMC adds another vNIC server with the same remote lpar
ID and remote DRC as the first, giving each server its own priority.
- Support for SAP HANA with Solution edition with feature
code #EPVR on 3.65 GHZ processors and 12-core activations and 512 GB
memory activations on SUSE Linux.. SAP HANA is an in-memory
platform for processing high volumes of data in real-time. HANA allows
data analysts to query large volumes of data in real-time. HANA's
in-memory database infrastructure frees analysts from having to load or
write-back data.
- Support for the Hardware Management Console (HMC) to
access the service processor IPMI credentials and to retrieve
Performance and Capacity Monitor (PCM) data for viewing in a tabular
format or for exporting as CSV values. The enhanced HMC interface can
now start and stop VIOS Shared Storage Pool (SSP) monitoring from the
HMC and start and stop SSP historical data aggregation.
- Support for the Advanced System Management Interface (ASMI)
was changed to not create VPD deconfiguration records and call home
alerts for hardware FRUs that have one VPD chip of a redundant pair
broken or inaccessible. The backup VPD chip for the FRU allows
continued use of the hardware resource. The notification of the
need for service for the FRU VPD is not provided until both of the
redundant VPD chips have failed for a FRU.
System firmware changes that affect all systems
- A problem was fixed
for a failed IPL with SRC UE BC8A090F that does not have a hardware
callout or a guard of the failing hardware. The system may be
recovered by guarding out the processor associated with the error and
re-IPLing the system. With the fix, the bad processor core is
guarded and the system is able to IPL.
- A problem was fixed for an Operations Panel Function 04
(Lamp test) during an IPL causing the IPL to fail. With the fix,
the lamp test request is rejected during the IPL until the hypervisor
is available. The lamp test can be requested without problems
anytime after the system is powered on to hypervisor ready or an OS is
running in a partition.
- A problem was fixed for On-Chip Controller (OCC) errors
that had excessive callouts for processor FRUs. Many of the OCC
errors are recoverable and do not required that the processor be called
out and guarded. With the fix, the processors will only be called
out for OCC errors if there are three or more OCC failures during a
time period of a week.
- A problem was fixed for the On-Chip Controller (OCC)
incorrectly calling out processors with SRC B1112A16 for L4 Cache DIMM
failures with SRC B124E504. This false error logging can occur if
the DIMM slot that is failing is adjacent to two unoccupied DIMM slots.
- A problem was fixed for device time outs during a IPL
logged with a SRC B18138B4. This error is intermittent and no
action is needed for the error log. The service processor
hardware server has allotted more time of the device transactions to
allow the transactions to complete without a time-out error.
- Support for 6 core processor with FC #8A2225 and CCIN
54E1 extended for use in the Power System S822L (8247-22L).
Support was already in place for this processor since FW810.20 for the
S822 (8284-22A).
- For the IBM Power System E850 (8408-44E) system, a problem
was fixed for the incorrect values for the Idle Power Saver (IPS) mode
call home data. The call home "max" is reported much lower
numbers than what the On-chip Controllers (OCC) read for the IPS.
This problem only affects 4-socket systems as it is caused by an
integer overflow of the summation of the IPS value from all OCCs in the
system.
System firmware changes that affect certain systems
- DISRUPTIVE:
On systems
using the PowerVM firmware, a problem was fixed for an "Incomplete"
state caused by initiating a resource dump with selector macros from
NovaLink (vio -dump -lp 1 -fr). The failure causes a
communication
process stack frame, HVHMCCMDRTRTASK, size to be exceeded with a
hypervisor page fault that disrupts the NovalLink and/or HMC
communications. The recovery action is to re-IPL the CEC but that will
need to be done without the assistance of the management console.
For
each partition that has a OS running on the system, shut down each
partition from the OS. Then from the Advanced System Management
Interface (ASMI), power off the managed system.
Alternatively, the
system power button may also be used to do the power off. If the
management console Incomplete state persists after the power off, the
managed system should be rebuilt from the management console. For
more
information on management console recovery steps, refer to this IBM
Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm.
The fix is disruptive because the size of the PowerVM hypervisor must
be increased to accommodate the over-sized stack frame of the failing
task.
- DEFERRED: On
systems using
the PowerVM firmware, a problem was fixed for a CAPI function
unavailable condition on a system with the maximum number of CAPI
adapters and partitions. Not enough bytes were allocated for CAPI
for
the maximum configuration case. The problem may be circumvented
by
reducing the number of active partitions or CAPI adapters.
The fix is
deferred because the size of the hypervisor must be increased to
provide the additional CAPI space.
- DEFERRED:
On systems using PowerVM
firmware, a problem was fixed for cable card capable PCI slots that
fail during the IPL. Hypervisor I/O Bus Interface UE B7006A84 is
reported for each cable card capable PCI slot that doesn't
contain a
PCIe3 Optical Cable Adapter for the PCIe Expansion Drawer (feature code
#EJ05). PCI slots containing a cable card will not report an
error but
will not be functional. The problem can be resolved by performing
an
AC cycle of the system. The trigger for the failure is the I2C
devices
used to detect the cable cards are not coming out of the power on reset
process in the correct state due to a race condition.
- On systems using PowerVM firmware, a problem was fixed for
network issues, causing critical situations for customers, when an
SR-IOV logical port or vNIC is configured with a non-zero Port VLAN ID
(PVID). This fix updates adapter firmware to 10.2.252.1922, for
the following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EL38,
EN0M, EN0N, EN0K, EN0L, and EL3C.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems using the PowerVM firmware, a problem was fixed
for a Live Partition Mobility migration that resulted in the source
managed system going to the management console Incomplete state after
the migration to the target system was completed. This problem is
very rare and has only been detected once.. The problem trigger is that
the source partition does not halt execution after the migration to the
target system. The management console went to the
Incomplete state for the source managed system when it failed to delete
the source partition because the partition would not stop
running. When this problem occurred, the customer network was
running very slowly and this may have contributed to the failure.
The recovery action is to re-IPL the source system but that will need
to be done without the assistance of the management console. For
each partition that has a OS running on the source system, shut down
each partition from the OS. Then from the Advanced System
Management Interface (ASMI), power off the managed system.
Alternatively, the system power button may also be used to do the power
off. If the management console Incomplete state persists after
the power off, the managed system should be rebuilt from the management
console. For more information on management console recovery
steps, refer to this IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm
- On systems using the PowerVM firmware, a fix was made to
provide an option to change the ordering of PCIe Host Bridge (PHB)
devices on Power 8 systems to match the discovery order on Power 7
systems.
- On systems using PowerVM firmware, a problem was
fixed for a shared processor pool partition showing an incorrect zero
"Available Pool Processor" (APP) value after a concurrent firmware
update. The zero APP value means that no idle cycles are present
in the shared processor pool but in this case it stays zero even when
idle cycles are available. This value can be displayed using the
AIX "lparstat" command. If this problem is encountered, the
partitions in the affected shared processor pool can be dynamically
moved to a different shared processor pool. Before the dynamic
move, the "uncapped" partitions should be changed to "capped" to
avoid a system hang. The old affected pool would continue to have the
APP error until the system is re-IPLed.
- On systems using PowerVM firmware, a problem was fixed for
a latency time of about 2 seconds being added to a target Live
Partition Mobility (LPM) migration system when there is a latency time
check failure. With the fix, in the case of a latency time check
failure, a much smaller default latency is used instead of two
seconds. This error would not be noticed if the customer system
is using a NTP time server to maintain the time.
- On systems with OPAL firmware, a problem was fixed for
misaligned mapped interrupts to virtual PCI devices that could cause a
PB_CENT_CRESP_ADDR_ERROR checkstop.
- On systems with OPAL firmware, a problem was fixed for a
PXE (Preboot eXecution Environment) boot (also known as network boot)
hang that occurred when a network server was down. With the fix,
the boot is able to recover so that alternative methods of booting can
be selected using petitboot menu items.
- A problem was fixed for PCI Host Bridge (PHB) "link
down" Endpoint Recoverable errors that became fatal exceptions
when not handled by the CAPI adapters. With the fix, the
recoverable errors are now detected by the CAPI adapters to allow for
run-time link recovery.
- On systems using PowerVM firmware, a rare problem was
fixed for a system hang that can occur when dynamically moving
"uncapped" partitions to a different shared processor pool. To
prevent a system hang, the "uncapped" partitions should be changed to
"capped" before doing the move.
- On systems using the PowerVM firmware, support was added
fora new utility option for the System Management Services (SMS)
menus. This is the SMS SAS I/O Information Utility. It has
been introduced to allow an user to get additional information about
the attached SAS devices. The utility is accessed by selecting
option 3 (I/O Device Information) from the main SMS menu, and then
selecting the option for "SAS Device Information".
- On systems using the PowerVM hypervisor firmware and
Novalink, a problem was fixed for a NovaLink installation error where
the hypervisor was unable to get the maximum logical memory buffer
(LMB) size from the service processor. The maximum supported LMB
size should be 0xFFFFFFFF but in some cases it was initialized to a
value that was less than the amount of configured memory, causing the
service processor read failure with error code 0X00000134.
- On systems using the PowerVM hypervisor firmware and CAPI
adapters, a problem was fixed for CAPI adapter error recovery.
When the CAPI adapter goes into the error recovery state, the Memory
Mapped I/O (MMIO) traffic to the adapter from the OS continues,
disrupting the recovery. With the fix, the MMIO and DMA traffic
to the adapter are now frozen until the CAPI adapter is fully
recovered. If the adapter becomes unusable because of this
error, it can be recovered using concurrent maintenance steps from the
HMC, keeping the adapter in place during the repair. The error
has a low frequency since it only occurs when the adapter has failed
for another reason and needs recovery.
- On systems using the PowerVM hypervisor firmware, when
using affinity groups, if the group includes a VIOS, ensure the group
is placed in the same drawer where the VIOS physical I/O is
located. Prior to this change, if the VIOS was in an
affinity group with other partitions, the partitions placement could
over-ride the VIOS adapter placement rules and the VIOS could end up in
a different drawer from the IO adapters.
- On systems using PowerVM firmware, a problem was
fixed to improve error recovery when attempting to boot an iSCSI target
backed by a drive formatted with a block size other than 512
bytes. Instead of stopping on this error, the boot attempt fails
and then continues with the next potential boot device.
Information regarding the reason for the boot failure is available in
an error log entry. The 512 byte block size for backing devices
for iSCSI targets is a partition firmware requirement.
- On systems using PowerVM firmware, a problem was fixed for
a false thermal alarm in the active optical cables (AOC) for the PCIe3
expansion drawer with SRCs B7006AA6 and B7006AA7 being logged every 24
hours. The AOC cables have feature codes of #ECC6 through #ECC9,
depending on the length of the cable. The SRCs should be ignored
as they call for the replacement of the cable, cable card, or the
expansion drawer module. With the fix, the false AOC thermal
alarms are no longer reported.
- On systems using PowerVM firmware that have an attached
HMC, a problem was fixed for a Live Partition Mobility migration
that resulted in a system hang when an EEH error occurred
simultaneously with a request for a page migration operation. On
the HMC, it shows an incomplete state for the managed system with
reference code A181D000. The recovery action is to re-IPL the
source system but that will need to be done without the assistance of
the HMC. From the Advanced System Management Interface
(ASMI), power off the managed system. Alternatively, the
system power button may also be used to do the power off. If the
HMC Incomplete state persists after the power off, the managed system
should be rebuilt from the HMC. For more information on HMC
recovery steps, refer to this IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm
- On systems using the OPAL firmware, a problem was fixed for
fundamental PCI resets at boot time causing the PCI adapters to not be
usable in the Linux OS. No errors occur in the skiboot but the
adapters are not configurable once the OS is reached.
- On systems using the OPAL firmware, a problem was fixed for
time-out errors during the power off of PCI slots with " Timeout
powering off slot ... FIRENZE-PCI: Wrong state 00000000 on slot" error
message during a power off of the system.
|
SV860_039_039 / FW860.00
11/02/16 |
Impact:
New
Severity:
New
The following pertains to Power System E850C (8408-44E) servers only.
New Features and Functions
NOTE:
- GA Level
Four FW840 features that have been disabled for the 860.00 GA are
listed below. These will be re-enabled for the 860.10 service
pack:
1. Support disabled for Live Partition Mobility (LPM) operations.
2. Support disabled for partition Suspend and Resume from the HMC.
3. Support disabled for partition Remote Restart.
4. Support disabled for PowerVM vNIC. PowerVM vNIC combined many of the
best features of SR-IOV and PowerVM SEA to provide a network solution
with options for advanced functions such as Live Partition Mobility
along with better performance and I/O efficiency when compared to
PowerVM SEA. In addition PowerVM vNIC provided users with
bandwidth
control (QoS) capability by leveraging SR-IOV logical ports as the
physical interface to the network.
- New features that have been disabled: vNIC failover; new
redundant path LPM function; and PCIe cable recovery
on a link to the PCIe3 expansion drawer.
- Do not use the following functions. They are not
disabled but should not be used as the implementations and testing has
not been completed for 860.00:
1. SMS SAS I/O Information utility. If a non-SCDD (Self
Configuring Device Data) drive is attached to a controller and the
utility is used to look at devices attached to the controller, a
Default Catch condition will occur due to a partition firmware data
stack underflow. This utility is accessed by selecting option 3
(I/O Device Information) from the main SMS menu, and then selecting
option 2 (SAS Device Information).
2. 32TB Max Memory Enablement for partitions.
3. PowerVM NovaLink enhancements. For more information, refer to
IBM Knowledge Center: http://www.ibm.com/support/knowledgecenter/POWER8/p8eig/p8eig_kickoff.htm
4. PowerVM change to support HDDW using 64K pages
5. IBM Power System E850(8408-44E) concurrent add of the PCIe expansion
drawer (#EMX0).
6. IBM Power System E850(8408-84E) concurrent add of PCIe3 Optical
Cable Adapter for PCIe3 Expansion Drawer (F/C #EJ08)
7. Enforcement of limits to IBM i support on IBM Power System S822
(8284-22A)
8. Dynamic TCE memory allocation for SR-IOV adapters
9. Dynamic Toggle of SRR
10. Power Boot List Management Platform Support
11. SAP HANA (#EPVR) enhancements - Solution edition for SAP HANA 3.65
GHz + 12 Activations
12. HMC new gui enhancements
13. LPAR DR Restart
14. HMC override for Port vs LUN level validation
15. SNMP traps for system state
16. HMC Option to boot without IPv6 Support
17. PCIe3 3D Graphics Adapter x16 (#EC51) boot support (for Linux only)
18. Non-volatile Memory Express (NVMe) boot
19. Service processor security updates
20. vHMC support for DHCP server configuration
- Support for the IBM Power System E850 (8408-44E).
Similar in many respects to the 8408-E8E but upgraded with faster
processors (4.223GHz, 10C 3.957GHz, 12C 3.658GHz ) with a maximum of 48
cores and an upgrade in memory to DDR4 with expanded capacity to 4 TB
with 128 GB Dimms available. As with 8408-E8E, there is no IBM
i or OPAL support. Operating System offerings for PowerVM
partitions are AIX and Linux (RHEL, SLES, and Ubuntu).
|