Troubleshooting Platform Analytics

Some or all of the Vertica nodes fail to start up due to a memory error

Some or all of the Vertica nodes may fail to start up and have the following error:

Large:Memory(KB) Exceeded: Requested = number, Free = number

This error occurs because of an issue with the Resource Manager in Vertica. To resolve this issue, you need to disable the Resource Manager before running the database, then enable the Resource Manager after the database has started up. The resolution method depends on whether all Vertica nodes failed to start up, or if only some Vertica nodes failed to start up.

Start the Vertica database if all nodes fail to start up

  1. Manually disable the Resource Manager on all Vertica nodes.

    Perform the following steps on each host in the database cluster.

    1. Log into a host in the database cluster.
    2. Navigate to the directory containing the vertica.conf file.

      The directory is the directory of the catalogs of the database that you want to start up. This is the Catalog pathname that you were initially prompted to specify when you first created the database.

    3. Edit the vertica.conf file and add the following line to the end of the file:

      EnableResourceManager=0

  2. Start the Vertica database on all database nodes.
  3. Re-enable the Resource Manager.
    1. Log into a host in the database cluster.
    2. Run the following SQL statement from the vsql command line:

      SELECT SET_CONFIG_PARAMETER('EnableResourceManager', '1');

Start the Vertica database if some nodes fail to start up

  1. Log into one of the Vertica nodes that are still running.
  2. Disable the Resource Manager.

    Run the following SQL statement from the vsql command line:

    SELECT SET_CONFIG_PARAMETER('EnableResourceManager', '0');

  3. Start the Vertica database on all the database nodes that failed to start up.
  4. Re-enable the Resource Manager.
    1. Log into a host in the database cluster.
    2. Run the following SQL statement from the vsql command line:

      SELECT SET_CONFIG_PARAMETER('EnableResourceManager', '1');

Platform Analytics node does not send events after installation if it is started before the Platform Analytics server

After installing Platform Analytics using a clean database, if you start the Platform Analytics node before starting the Platform Analytics server, the node will not send events. This problem only occurs the first time after installation.

The EVENT_MANAGER_CONF table for the event locator is not initialized until you start the Platform Analytics server for the first time. Therefore, if you start the Platform Analytics node without first starting the Platform Analytics server after initial installation with a clean database, the event sender does not have access to the EVENT_MANAGER_CONF TABLE until you start the Platform Analytics server.

To resolve this issue, restart the Platform Analytics node after you start the Platform Analytics server.

FLEXnet usage data loader could not obtain license usage data due to insufficient swap space

If you have an Platform Analytics node running on a UNIX host, the FLEXnet usage data loader (flexlicusageloader) log may report "Failed to obtain license usage from the license server" and "Not enough space" errors. This problem does not apply to Windows hosts.

This error occurs if you have insufficient disk space allocated to the swap space on that host. To work around this issue, extend the swap space so it has at least 2 GB of free space on that host before starting the Platform Analytics node on the host.

In certain configurations, the Platform Analytics Console shows that the loader controller is down, but perfadmin shows it is running

In the Platform Analytics Console, if you click Data Collection Nodes, you may see that the loader controller is Down. However, if you examine the loader controller service (plc) in the Platform Analytics node (using perfadmin list) the loader controller service is STARTED.

This issue may occur because you incorrectly defined the loopback IP address (127.0.0.1) as the name of your host rather than localhost in the /etc/hosts file, or if your host has multiple network interface cards (NICs).

To fix this problem, you need to change the loopback IP address and NSS (Name Service Switch) configuration.

Change the loopback IP address and NSS configuration

  1. Change the loopback IP address to localhost.
    1. Edit the /etc/hosts file.
    2. Navigate to the line where you defined the loopback IP address (127.0.0.1).

      If this IP address is not defined as localhost, you need to change the definition

      For example, if your host is hostA in the example.com domain, you need to navigate to and change the following line :

      127.0.0.1       hostA   hostA.example.com

    3. Either delete the line or change the definition to localhost and save the file.

      For example, either delete the line or change it to the following:

      127.0.0.1       localhost   localhost.localdomain

  2. If your host has multiple network interface cards (NICs), change the NSS (Name Service Switch) configuration to look up NIS before looking up the file for host names and numbers.
    1. Edit the /etc/nsswitch.conf file.
    2. Navigate to the line with the definition for hosts.

      For example, by default, this line is as follows:

      hosts:      files nis dns

    3. Change the line so "nis" appears before "files" and save the file.

      For example,

      hosts:      nis files dns

  3. Restart the services on the Platform Analytics node.

    perfadmin stop all

    perfadmin start all

Fail to install Platform Analytics

A crashed InstallShield database may cause the Platform Analytics installation to fail. If you failed to install Platform Analytics, you may need to manually remove the InstallShield Multi-Platform (ISMP) database.

Remove the ISMP database from the following directories:

  • Windows: C:\Program Files\Common Files\InstallShield\Universal\common

  • UNIX: ~/InstallShield

The Cluster Capacity and Workload Statistics workbook displays only the first execution host in the execution host list for parallel job

This is applicable only for the 7.x cluster.

The Cluster Capacity and Workload Statistics workbook displays parallel job execution hosts as one host and gets the data from the first execution host even though parallel jobs are running on different hosts. For example, if a parallel job execution host is “3*hostA 4*hostB�, the cluster capacity data transformer assumes that 7 slots are occupied by host A.

Number of down slots reported is not correct

If the number of job slots are defined using “!� in the lsb.hosts file and a host is down, then the number of down slots reported is not correct. To work around this issue, define the number of slots for each host in the cluster in lsb.hosts.

The license usage data collected for the license vendor daemons is not accurate

If you have multiple license vendor daemons on a license server sharing the same port, the license usage data for those license vendor daemons may not be correct. To work around this issue, download the older version of lmutil from the Platform FTP site.

Download the lmutil binary from patches/lsf_analytics/8.0/FLEXlm9.2/<platform> and move it to ANALYTICS_TOP/license/7.0/<platform>/...

Cannot install the Platform Analytics node

The Platform Analytics node installation will fail when the LSF_VERSION defined in the lsf.conf file is not the actual version.

To resolve this issue, before you install the Platform Analytics node, edit lsf.conf to change LSF_VERSION to an appropriate version. For example, if the actual LSF version is 7.x but if the LSF_VERSION in lsf.conf is set to "active", then before you install the node change the LSF_VERSION to 7.0. After installing the node change the LSF_VERSION back to “active�.

Third-party issues and how you can troubleshoot

A message “Out of memory� displays after clicking on the Data tab

This error message is displayed when you try to view big data, which is more than 4GB. To avoid the error, you can either narrow down the data range or increase the memory size of the host.

Average data on the Cluster Usage table is not as accurate as the data on the Cluster Usage graph

Data shown in the table is not accurate in some of the roll-up levels as it considers the sampling points of data instead of whole date period.

For example, the following data shows sampling points of data for slots number with different slot status:

Sampling points for different slot status

10:00

10:10

10:20

10:30

10:40

10:50

RUN

1

1

DOWN

2

2

2

2

2

2

In the Cluster Usage table , average slots number for the RUN status rolled up to the hour 10 is (1 + 1) / 2 = 1. The graph data shows the correct value, which is (1 + 1) / 6= 0.33

As a workaround, refer to the Cluster Usage graph for more accurate data.

The Projects dashboard of Workload Accounting report throws session busy error when sorting big data

In the Projects dashboard of Workload Accounting report, if you select big data and try to sort, the reporting server may display the following error: 'Unexpected Server Error: Session busy, please try later.' For example, if you select data more than 3 years and try to drill down to a specific year that has more than 20K projects and sort as project name, you will see that error.

To avoid the error, narrow down the data range or try to view using the Platform Analytics Designer.