(For Oracle users only) Prepare the Oracle database for Analytics

Follow these instructions only if you are using Oracle database. Create tablespaces and data schema, then configure the Oracle database to prepare for the Platform Analytics installation.

What you need to do

Check the following:

  • The Oracle database host is properly configured and running.

  • You have a user name, password, and URL to access the database.

  • The Oracle database and the database host meet the Platform Analytics system requirements.

    See Prepare for the Platform Analytics installation.

  1. Create the Oracle tablespaces for the database schema
  2. Create an Oracle database schema
  3. Optimize the Oracle database

After preparing the database for Platform Analytics, prepare the Platform Analytics server host.

Create the Oracle tablespaces for the database schema

Contact your Oracle database administrator (DBA) to create the tablespaces according to these specifications.

Create tablespaces to prepare the Oracle database for the Platform Analytics database schema. You may create partitioned.

Partitioned data schema maximizes Platform Analytics performance, reduces database administration complexity, and allows the data loaders and data transformers to work more efficiently with the data volume at a larger cluster scale.

For optimal database performance, we recommend that you use partitioned data schema and create seven tablespaces for Platform Analytics.

  1. Determine the scale of your cluster, based on the number of hosts, and the corresponding initial tablespace size.

    Refer to the following table to determine the scale of your cluster and the initial tablespace.


    Number of hosts in the cluster

    Scale of the cluster

    Initial tablespace size in total

    Less than 100

    Tiny

    4 GB (4096 MB)

    100 - 1000

    Small

    20 GB (20480 MB)

    1000 - 3000

    Medium

    200 GB (204800 MB)

    Greater than 3000

    Large

    500 GB (512000 MB)


  2. Create tablespaces with the "AUTOEXTEND" and "EXTENT MANAGEMENT LOCAL" options with the initial tablespace size according to the scale of your cluster.
    Note:

    AUTOEXTEND specifies that the tablespaces are automatically extended with a specified size for the NEXT parameter. Specifying the NEXT parameter assures that data files grow with consistent, reusable extend sizes.

    EXTENT MANAGEMENT LOCAL allows you to specify a locally-managed tablespace. You should generally not specify a dictionary-managed tablespace.

    CREATE TABLESPACE tablespace_name DATAFILE ’data_file_path’ SIZE tablespace_sizeM AUTOEXTEND ON 2048M EXTENT MANAGEMENT LOCAL;

    where

    • tablespace_name is the name of the tablespace

    • data_file_path is the file path to the .ora Oracle data file.

    • tablespace_size is the initial size of the tablespace according to the scale of your cluster, in MB. For example, for small clusters (100 to 1000 hosts), this is 20480.

    For example, for small clusters (100 to 1000 hosts),

    CREATE TABLESPACE TS_DATA_01 DATAFILE ’/oradata/ts_data_01_01.ora’ SIZE 20480M AUTOEXTEND ON 2048M EXTENT MANAGEMENT LOCAL;

    and repeat this command for the remaining tablespaces (for example, tablespaces TS_DATA_02 to TS_DATA_07 for partitioned data schema

Create an Oracle database schema

Check the following:

  • The Platform Analytics database user must have the following:

    • Roles: Connect and resource.

    • System privileges: Create view.

  1. Log into any host with access to the database host as the Platform Analytics database user.
  2. Extract the database schema package.
  3. Navigate to the directory containing your database schema scripts.
    • If you are creating a partitioned data schema, navigate to schema/Oracle/partitioned

      You must be using Oracle Enterprise Edition to work with partitioned databases.

  4. Use sqlplus to run the grant_privilege.sql script with DBA privileges to grant additional, required privileges to the Platform Analytics database user.

    sqlplus "sys/password@connect_string as sysdba" @grant_privilege.sql user_name

    where
    • password is the password for the sys user on the database.

    • connect_string is the named SQLNet connection for this database (optional).

    • user_name is the name of the Platform Analytics database user.

  5. Use sqlplus to run the create_pa8.0.2_schema.sql script to create a database schema.
    • To create a partitioned data schema:

      sqlplus user_name/password@connect_string @create_pa8.0.2_schema.sql data_tablespace index_tablespace data_tablespace,index_tablespace,partitioned_tablespace_list cluster_scale

    where
    • user_name is the user name on the database.

    • password is the password for this user name on the database.

    • connect_string is the named SQLNet connection for this database (optional).

    • data_tablespace is the name of the tablespace where you intend to store the table schema.

    • index_tablespace is the name of the tablespace where you intend to store the index.

    • partitioned_tablespace_list is a comma-separated list of the remaining tablespaces where you intend to store the partitioned tables and indexes. Spaces are not allowed in this list.

    • cluster_scale is a letter representing the scale of your cluster (T - Tiny, S - Small, M - Medium, L - Large).

    For example, if you have a new small cluster (100 to 1000 hosts),

    If you created seven tablespaces for a partitioned data schema,

    sqlplus Platform Analytics_owner/mypasswd @create_pa8.0.2_schema.sql TS_DATA_01 TS_DATA_02 TS_DATA_01,TS_DATA_02,TS_DATA_03,TS_DATA_04,TS_DATA_05,TS_DATA_06,TS_DATA_07 S

  6. Partitioned databases only. Wait for thirty minutes after you created the database schema, then verify that the partitions for all tables have been created correctly.

    Use a SQL query to check the sys_partition_events table for any partitioning error messages.

    SELECT * FROM sys_partition_events

    If there is an error message due to insufficient tablespace size, you need to expand the tablespace size and create the partitions again by running the following command on the SQL Plus command line:

    EXEC CREATE_PARTITIONS;

Optimize the Oracle database

Optional. Modify the Oracle database for optimal performance with Platform Analytics.

  1. Modify the Oracle database initialization parameters.

    The following table describes the parameters you need to modify:


    Parameter name

    Value

    sga_target (Oracle 10g or newer only)

    Total memory × (35 to 40%)

    pga_aggregate_target

    Total memory × (35 to 40%)

    open_cursors

    300

    processes

    300

    db_block_size

    8192

    db_file_multiblock_read_count

    16

    optimizer_index_cost_adj

    30

    undo_retention

    28800


    If you are using Oracle 9i, you also need to modify the following parameters:


    Parameter name

    Value

    log_buffer

    1048576

    sga_max_size

    Total memory × (35 to 40%)

    large_pool_size

    104857600

    shared_pool_size

    314572800

    java_pool_size

    0

    db_cache_size

    sga_max_size - large_pool_size - shared_pool_size - java_pool_size - log_buffer

    sort_area_size

    1048576


  2. Increase the size of the redo log files to decrease log file synchronization time.

    Increase the number of redo log groups to four and increase the size of each of the redo log groups to 1.5 GB.

    In addition, you should redistribute these redo log files to a physical hard disk other than the one containing the operating system and the tablespaces for Platform Analytics data.

  3. Increase the UNDO and TEMP tablespaces.

    Change the size of the UNDO and TEMP tablespaces to 20 GB, or as large as possible.

    For example, if your cluster has a large volume of records (such as a full workload scale LSF cluster consisting of 5000 hosts and 3 million finished jobs a day with 40 000 license usage combinations), you may need to set the TEMP and UNDO tablespace size to 40 GB or more.

  4. Change the default Oracle statistics gathering job (GATHER_STATS_JOB) to weekly base.

    To change the job to weekly base, run the following PL/SQL block in SQL*Plus with DBA privileges:

    BEGIN
      BEGIN
        DBMS_SCHEDULER.DROP_JOB(JOB_NAME => 'GATHER_STATS_JOB', FORCE => TRUE);
      EXCEPTION
        WHEN OTHERS THEN
          NULL;
      END;
      DBMS_SCHEDULER.CREATE_JOB(JOB_NAME        => 'GATHER_STATS_JOB',
        PROGRAM_NAME    => 'GATHER_STATS_PROG',
        JOB_CLASS       => 'AUTO_TASKS_JOB_CLASS',
        START_DATE      => SYSTIMESTAMP,
        REPEAT_INTERVAL => 'freq=daily;byday=SAT;byhour=0;byminute=0;bysecond=0',
        END_DATE        => NULL,
        ENABLED         => TRUE,
        COMMENTS        => 'Oracle defined automatic optimizer statistics collection job.');
    END;
    /