Hash lookup obtains a value from a lookup table, according
to a hashed value derived from a source column and places it in a
destination column. With hash lookup you can consistently mask data
in any environment when using the same source value and lookup table.
Hash lookup hashes a value in a source
column, which need not be a column that is replaced, to derive a number
used to match a value in the key column for the lookup table. Values
from the selected row in the lookup table are inserted at the destination.
If a source value is found without a corresponding number in the lookup
table, a conversion error occurs.
Hash lookup is in two forms,
single column and multiple column. In both forms, the source column
values are hashed. You can use the multiple column form for any source
column to be replaced by lookup table values, but you must edit the
column map to remove the names of remaining source columns that are
also replaced.
The lookup table is typically
indexed and must include a numeric key column that contains sequential
number values without any gaps. The remaining columns contain replacement
values. Certain source column values (NULL, spaces in CHAR and VARCHAR
columns, and zero-length VARCHAR) are not hashed. The lookup table
must include a row for each of these reserved values, so that a lookup
value can be obtained for each.
| Source Value |
Lookup Table Key |
| NULL |
-1 |
| spaces (CHAR or VARCHAR) |
-2 |
| zero-length VARCHAR |
-3 |
The syntax is:
HASH_LOOKUP( { sourcesearchcol | SRCSEARCH=(sourcecol1,...,sourcecoln) }
[ ,trim=( [ char1char2... ] [ \u ] ) ][ ,dest=( col1,...,coln ) ]
,lktablename ( lktablekeycol, { valuecolname | values=( col1,...,coln ) } )
[ ,ALGO={ SHA256 | DEFault } ][ ,cache | ,nocache ]
[ ,PRESERVE=( [ NOT_FOUND, ] colname1 ( SPACES, NULL, ZERO_LEN ) ...
, colnamen ( SPACES, NULL, ZERO_LEN ) ]
[ ,seed= { n | HMAC } ] )
- sourcesearchcol
- For as single column search, the name of the source column from
which hashed values are derived. If not specified, the name of the
destination column is used. The source column need not be a column
that is replaced by lookup table values.
- SRCSEARCH=
- For a multiple column search,
the names of the source columns that contain the search values. The
list of column names must be enclosed in parentheses with names separated
by commas. Specify a maximum of 16 columns to search.
- trim=
- List of characters trimmed from the source value
before it is hashed. You can also use trim to convert the source value
to uppercase before it is hashed.
- char1char2. . .
- Characters trimmed from the source value before
it is hashed. For example, if trimming commas from the source, the
values Smith, John, and Smith John hash to the same value. The list
is case-sensitive. You can specify a space or comma as a character.
After the initial occurrence of a character, any additional occurrences
in the list are ignored. If the value is NULL or all spaces after
characters are trimmed, the source value is not hashed and is assigned
the appropriate reserved value (-1 or -2).
To specify
a backslash “\” or a right parentheses “)”,
you must precede the character with a backslash escape character.
For example, to specify a right parentheses, enter: trim=(\)). You
can use the escape character only with a backslash, a right parentheses,
or as part of the uppercase indicator.
- \u
- Converts the source value to uppercase before it
is hashed. Any trimmed characters are also converted to uppercase.
Upper and lower case letters hash differently. For example, John and
JOHN hash to different values. If you want upper and lower case values
to hash to the same value, use trim=\u to convert the source value
to upper case before it is hashed.
- dest=
- Names of the destination table columns in which
values from the lookup table are inserted. (Required for multiple
column lookup.)
- col1, . . . ,coln
- List of destination table columns, separated by
commas and enclosed in parentheses. The order of names must correspond
to the order of lookup table column names in values=.
- lktablename
- Name of the lookup table.
Specify the lookup table name as dbalias.creatorid.tablename, creatorid.tablename, or tablename. If you do not fully qualify the table name, the qualifiers for
the destination table are used.
- lktablekeycol
- The name of the column in the lookup table that contains the sequentially
numbered keys to match with values hashed from the source column or
columns.
- valuecolname
- Name of the column in the lookup table that contains
the translated search value to be inserted at the destination. (Required
for single column lookup.)
- values=
- For a multiple column lookup, the names of lookup
table columns that contain values to be inserted at the destination.
- col1, . . . ,coln
- List of lookup table columns, separated by commas
and enclosed in parentheses. The order of the column names must correspond
to the order of destination table column names in dest=.
- ALGO=
- Method or algorithm used to hash the source value.
- SHA256
- Use the SHA-256 algorithm. If SEED= is not specified or if SEED=HMAC
and an HMAC key exit is present, the HMAC key is used. If SEED=HMAC
and an exit is not present, an error occurs. Sample hash exits (ioqhashexit.dll
for Windows, libioqhashexit.so for UNIX, and libioqhashexit.sl for
HPUX PA-RISC) are provided in the bin folder of the Optim installation.
The source code for the hash exit is provided in the SAMPLES\ODPP_HASHEXIT
folder.
- DEFault
- Use the default hash algorithm (default).
- cache | nocache
- Specify cache (default) to maintain a table of
found lookup values in memory or nocache to discard found values.
Using cache is faster when retrieving a value many times, but requires
more memory than nocache.
- PRESERVE=
- Values inserted at the destination if the lookup value is not
found or the source contains null or spaces or is a zero-length varchar.
- NOT_FOUND
- Insert the source value in the destination, if no match is found
for the source column row.
- colname
- A column for which specified source values are inserted at the
destination. The source values are:
- NULL
- SPACES
- ZERO-LENGTH VARCHAR
- seed=
- Use SEED= to vary the hashing algorithm calculation.
- n
- A value from 1 to 2,000,000,000. The hashed value from the source
column and the SEED value are matched with a sequential number in
the lookup table to obtain the replacement value for the destination
column. This seed value can be used as the key for the HMAC algorithm
in the absence of an HMAC key exit.
- HMAC
- The seed value is provided by the HMAC key exit, which must be
present. Use HMAC with ALGO=SHA256 only.
Single column example
Use hash lookup to insert values from a column in a
lookup table into a destination table column, based on a value hashed
from a source column.
For example, assume the
source column, FIRST_NAME, contains first names and the destination
column is to include replacement first names from the lookup table.
A lookup table, NAME_LOOKUP, contains a column (FIRST) with first
names and a column (SEQ) containing sequential values.
To obtain values for the destination column using the
NAME_LOOKUP table, specify:
HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ,
FIRST))
The values hashed from the FIRST_NAME
column in the source table are matched with values in the SEQ column
in the NAME_LOOKUP table. When a match is found, the corresponding
value from the FIRST column in NAME_LOOKUP is inserted into the destination
column.
Preserve example
The following statement extends the single column example to insert
any source NULL and SPACES values rather than lookup table values:
HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ,
FIRST),PRESERVE=(FIRST_NAME(NULL,SPACES)))
NoCache example
This statement extends the single column example to avoid maintaining
a table of found lookup values in memory:
HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ, FIRST),NOCACHE)
Trim example
Extending the single column example to trim spaces and commas from
the source value and convert the source value to uppercase before
it is hashed:
HASH_LOOKUP(FIRST_NAME,
TRIM=( ,\u),NAME_LOOKUP(SEQ,FIRST))