Hash lookup

Hash lookup obtains a value from a lookup table, according to a hashed value derived from a source column and places it in a destination column. With hash lookup you can consistently mask data in any environment when using the same source value and lookup table.

Hash lookup hashes a value in a source column, which need not be a column that is replaced, to derive a number used to match a value in the key column for the lookup table. Values from the selected row in the lookup table are inserted at the destination. If a source value is found without a corresponding number in the lookup table, a conversion error occurs.

Hash lookup is in two forms, single column and multiple column. In both forms, the source column values are hashed. You can use the multiple column form for any source column to be replaced by lookup table values, but you must edit the column map to remove the names of remaining source columns that are also replaced.

The lookup table is typically indexed and must include a numeric key column that contains sequential number values without any gaps. The remaining columns contain replacement values. Certain source column values (NULL, spaces in CHAR and VARCHAR columns, and zero-length VARCHAR) are not hashed. The lookup table must include a row for each of these reserved values, so that a lookup value can be obtained for each.

Source Value Lookup Table Key
NULL -1
spaces (CHAR or VARCHAR) -2
zero-length VARCHAR -3

The syntax is:

HASH_LOOKUP( { sourcesearchcol | SRCSEARCH=(sourcecol1,...,sourcecoln) }
 [ ,trim=( [ char1char2... ] [ \u ] ) ][ ,dest=( col1,...,coln ) ]
 ,lktablename ( lktablekeycol, { valuecolname | values=( col1,...,coln ) } )
 [ ,ALGO={ SHA256 | DEFault } ][ ,cache | ,nocache ]
 [ ,PRESERVE=( [ NOT_FOUND, ] colname1 ( SPACES, NULL, ZERO_LEN ) ...
	           ,  colnamen ( SPACES, NULL, ZERO_LEN ) ]
 [ ,seed= { n | HMAC } ] )
sourcesearchcol
For as single column search, the name of the source column from which hashed values are derived. If not specified, the name of the destination column is used. The source column need not be a column that is replaced by lookup table values.
SRCSEARCH=
For a multiple column search, the names of the source columns that contain the search values. The list of column names must be enclosed in parentheses with names separated by commas. Specify a maximum of 16 columns to search.
trim=
List of characters trimmed from the source value before it is hashed. You can also use trim to convert the source value to uppercase before it is hashed.
char1char2. . .
Characters trimmed from the source value before it is hashed. For example, if trimming commas from the source, the values Smith, John, and Smith John hash to the same value. The list is case-sensitive. You can specify a space or comma as a character. After the initial occurrence of a character, any additional occurrences in the list are ignored. If the value is NULL or all spaces after characters are trimmed, the source value is not hashed and is assigned the appropriate reserved value (-1 or -2).

To specify a backslash “\” or a right parentheses “)”, you must precede the character with a backslash escape character. For example, to specify a right parentheses, enter: trim=(\)). You can use the escape character only with a backslash, a right parentheses, or as part of the uppercase indicator.

\u
Converts the source value to uppercase before it is hashed. Any trimmed characters are also converted to uppercase. Upper and lower case letters hash differently. For example, John and JOHN hash to different values. If you want upper and lower case values to hash to the same value, use trim=\u to convert the source value to upper case before it is hashed.
dest=
Names of the destination table columns in which values from the lookup table are inserted. (Required for multiple column lookup.)
col1, . . . ,coln
List of destination table columns, separated by commas and enclosed in parentheses. The order of names must correspond to the order of lookup table column names in values=.
lktablename
Name of the lookup table. Specify the lookup table name as dbalias.creatorid.tablename, creatorid.tablename, or tablename. If you do not fully qualify the table name, the qualifiers for the destination table are used.
lktablekeycol
The name of the column in the lookup table that contains the sequentially numbered keys to match with values hashed from the source column or columns.
valuecolname
Name of the column in the lookup table that contains the translated search value to be inserted at the destination. (Required for single column lookup.)
values=
For a multiple column lookup, the names of lookup table columns that contain values to be inserted at the destination.
col1, . . . ,coln
List of lookup table columns, separated by commas and enclosed in parentheses. The order of the column names must correspond to the order of destination table column names in dest=.
ALGO=
Method or algorithm used to hash the source value.
SHA256
Use the SHA-256 algorithm. If SEED= is not specified or if SEED=HMAC and an HMAC key exit is present, the HMAC key is used. If SEED=HMAC and an exit is not present, an error occurs. Sample hash exits (ioqhashexit.dll for Windows, libioqhashexit.so for UNIX, and libioqhashexit.sl for HPUX PA-RISC) are provided in the bin folder of the Optim installation. The source code for the hash exit is provided in the SAMPLES\ODPP_HASHEXIT folder.
DEFault
Use the default hash algorithm (default).
cache | nocache
Specify cache (default) to maintain a table of found lookup values in memory or nocache to discard found values. Using cache is faster when retrieving a value many times, but requires more memory than nocache.
PRESERVE=
Values inserted at the destination if the lookup value is not found or the source contains null or spaces or is a zero-length varchar.
NOT_FOUND
Insert the source value in the destination, if no match is found for the source column row.
colname
A column for which specified source values are inserted at the destination. The source values are:
NULL
SPACES
ZERO-LENGTH VARCHAR
seed=
Use SEED= to vary the hashing algorithm calculation.
n
A value from 1 to 2,000,000,000. The hashed value from the source column and the SEED value are matched with a sequential number in the lookup table to obtain the replacement value for the destination column. This seed value can be used as the key for the HMAC algorithm in the absence of an HMAC key exit.
HMAC
The seed value is provided by the HMAC key exit, which must be present. Use HMAC with ALGO=SHA256 only.

Single column example

Use hash lookup to insert values from a column in a lookup table into a destination table column, based on a value hashed from a source column.

For example, assume the source column, FIRST_NAME, contains first names and the destination column is to include replacement first names from the lookup table. A lookup table, NAME_LOOKUP, contains a column (FIRST) with first names and a column (SEQ) containing sequential values.

To obtain values for the destination column using the NAME_LOOKUP table, specify:

HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ, FIRST))

The values hashed from the FIRST_NAME column in the source table are matched with values in the SEQ column in the NAME_LOOKUP table. When a match is found, the corresponding value from the FIRST column in NAME_LOOKUP is inserted into the destination column.

Preserve example

The following statement extends the single column example to insert any source NULL and SPACES values rather than lookup table values:

HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ, FIRST),PRESERVE=(FIRST_NAME(NULL,SPACES)))

NoCache example

This statement extends the single column example to avoid maintaining a table of found lookup values in memory:

HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ, FIRST),NOCACHE)

Trim example

Extending the single column example to trim spaces and commas from the source value and convert the source value to uppercase before it is hashed:

HASH_LOOKUP(FIRST_NAME, TRIM=( ,\u),NAME_LOOKUP(SEQ,FIRST))



Feedback