Use the affinity privacy provider to mask data while maintaining
the format and character types of the source values. For example,
the provider can maintain the format of data such as account numbers
or driver's license numbers while masking character data with character
data and numeric data with numeric data.
If the source data is uppercase, alphabetic characters,
the affinity privacy provider generates uppercase, alphabetic characters
at the destination. The provider masks alphabetic and numeric characters,
but copies other characters in the source data to the destination.
For example, a credit card number in the format "nnnn nnnn
nnnn nnnn" is masked to a different number that includes
spaces at the same intervals, while one formatted with dashes is masked
as "nnnn-nnnn-nnnn-nnnn". Using this provider,
you can generate unique values, a different value for each occurrence
of the same source, and values with a length different from the source.
The provider includes two algorithms
for masking data, a default algorithm and a format preserving encryption
(FPE) algorithm.
The FPE algorithm is encryption-based
and offers stronger masking capability than the default. The FPE algorithm
is based on the Advanced Encryption Standard 256-bit (AES-256) algorithm,
can optionally use an encryption key that is supplied by the user,
and can produce masked values that are unique. The same user-supplied
key produces repeatable masked values. For this reason, knowledge
of the key should be secured from unauthorized users to prevent reverse
engineering to discover the original values.
The
FPE algorithm also produces outputs that are extremely varied and
without a discernible pattern. For example, two close or similar source
values (such as 001 and 002) are masked with values that are not similar
(such as 196 and 837), thus masking any pattern these values may have
had in the source data.
The TWEAK parameter,
available with the FPE algorithm only, can mask identical strings
within source values by using other strings within each source value
to influence the masking process. For example, for two source values
such as ABC-123 and DEF-123, the TWEAK parameter can mask the 123
string in each value by using the preceding string to produce the
masked values ABC-981 and DEF-704.
While the
FPE algorithm offers an industry standard level of encryption, consider
using the default algorithm when your data does not require such encryption.
The most significant differences between the two algorithms are the
degree of varied masking patterns, the time that is required to complete
the masking effort, and the strength of the masked results. The default
algorithm usually outperforms the FPE algorithm in processing time,
as the FPE algorithm in most cases takes more time to complete. The
complexity of the input data can add more time to the masking operation
of the FPE algorithm, depending on the amount of parallel processing
and I/O being performed.
Examples
- Account numbers
- Masking account numbers might require that the account number
format is maintained while producing repeatable results for testing.
To meet these requirements, you might use the following syntax:
pro=aff, algo=fpe, mtd=rep, key="Xyz123", rule=num, whenmatch=prefix(“33”),
flddef1=(name=acctnbr, datatype=varchar_sz )
This example
uses the following parameters:
- ALGO=FPE
- The FPE algorithm offers a high level of encryption and produces
masked values that are much less likely to reveal the original source
value than the default.
- MTD=REP
- This parameter produces consistent target values when the same
data is processed multiple times.
- KEY="fpekey123"
- This parameter provides the case-sensitive encryption key for
the FPE algorithm. To produce repeatable results, use the same key
value.
- RULE=NUM
- This parameter ensures that numeric source values produce numeric
target values.
- WHENMATCH=PREFIX
- This parameter prevents identical source and target values by
adding a prefix to target values that match the source.
This syntax produces following results:
| Source values |
Masked values |
| acct12345 |
uaho84506 |
| acct12345 |
uaho84506 |
| acct56789 |
uaho66769 |
- Driver's license numbers
- Masking drivers license numbers might require that the format
is maintained, as the format might be unique to the issuing agency.
To meet these requirements, you might use the following syntax:
pro=aff, algo=FPE, mtd=rep, key="fpekey123", flddef1=(name=driver_number, dt=wvarchar_sz)
This
example uses the following parameters:
- ALGO=FPE
- The FPE algorithm offers a high level of encryption and produces
masked values that are much less likely to reveal the original source
value than the default.
- MTD=REP
- This parameter produces consistent target values when the same
data is processed multiple times.
- KEY="fpekey123"
- This parameter provides the case-sensitive encryption key for
the FPE algorithm. To produce repeatable results, use the same key
value.
This syntax produces following results:
| Source values |
Masked values |
| M2267 89890 34567 |
O0280 50902 54225 |
| M2267 89890 34567 |
O0280 50902 54225 |
| P2267 89890 34567 |
D9254 86713 09303 |
- Passport numbers
- Passport number data might contain identical strings that occur
among multiple values. To produce masked data
that does not include such strings, use the TWEAKS parameter. This
data might also contain hyphens that can be removed from masked output
by using the REMOVE parameter. To meet these requirements, you might
use the following syntax:
pro=aff, algo=fpe, mtd=rep, key="fpekey123", iterations=20, tweaks=yes,
flddef1=(name=passport_nbr, dt=varchar_sz )
This example uses
the following parameters:
- ALGO=FPE
- The FPE algorithm offers a high level of encryption and produces
masked values that are much less likely to reveal the original source
value than the default.
- MTD=REP
- This parameter produces consistent target values when the same
data is processed multiple times.
- KEY="fpekey123"
- This parameter provides the case-sensitive encryption key for
the FPE algorithm. To produce repeatable results, use the same key
value.
- ITERATIONS=20
- This parameter determines the number of times the FPE algorithm
processes each value to ensure uniqueness.
- TWEAKS=YES
- This parameter ensures that identical strings are masked with
unique strings.
This syntax produces following results:
| Source values |
Masked values |
| US20139999 |
YT03385676 |
| US20139999 |
YT03385676 |
| CA11233388 |
AU96923919 |
Syntax
The affinity privacy provider uses
the following syntax:
- Masking
parameters
PROVIDER = AFFINITY ,
[ ALGORITHM = { DEFAULT | FPE } ] ,
METHOD = { REPEATABLE | HASH } ,
- ALGO=FPE
parameters
[ KEY = { "literal" | @environment-variable-name } ] ,
[ ITERATIONS = iterations-value ] ,
[ TWEAKS = { YES | NO } ] ,
- ALGO=DEF
parameters
[ SEED = { "literal" | @variable | RANDOM } ] ,
- CHAR parameters
[ LENGTH = { n | MAX } ] ,
[ COPY = (copy-start,copy-len [,"copy-lit"]) … ] ,
- Numeric
parameters
[ RULE = { NUMERIC | BINARY | SCALED } ] ,
- Source
value parameters
[ CASE = UPPER ] ,
[ REMOVE = “remove-chars“ ] ,
[ TRIM = RIGHT ] ,
[ WHENMATCH = { REVERSE | SHUFFLE | PREFIX( { "literal" | SEQ(low,high) | RAND(low,high) } |
SUFFIX( { "literal" | SEQ(low,high) | RAND(low,high) } |
OVERLAY( n, m, "literal" ) } ] ,
- Processing
parameters
[ DISCARDLIMIT = discard-limit-value ] ,
- Data
definition parameters
FLDDEFn = ( NAME = field-name ,
DATATYPE = datatype-value ,
[ PRECISION = field-precision-value ] ,
[ SCALE = field-scale-value ] ,
[ LENGTH = field-length-value ] ,
[ CODEPAGE = codepage-value ] ,
[ CPTYPE = { DB2ZOS | DB2LUW | ORACLE | SYBASE | ODBC | INFORMIX |
NETEZZA | SQLSERVER | TERADATA | ANY | NONE } ]
) ,
[ CODEPAGE = codepage-value ] ,
[ CPTYPE = { DB2ZOS | DB2LUW | ORACLE | SYBASE | ODBC | INFORMIX |
NETEZZA | SQLSERVER | TERADATA | ANY | NONE } ]
Masking parameters
Parameters
that determine how to mask data.
- PROVIDER (or PRO)
- Required. Enter the provider name, AFFINITY (or AFF).
Note: The PRO parameter must be first in the masking string. All other parameters can appear in any order.
- ALGORITHM (or ALGO)
- The algorithm to use. Enter one of the following options:
- DEFAULT (or DEF)
- Mask data by using the default algorithm.
ALGO=DEF is not compatible
with the parameters KEY, ITERATIONS, and TWEAKS.
- FPE
- Mask data by using the FPE algorithm. Use the KEY parameter to
enter the encryption key.
ALGO=FPE is not compatible with the parameters
SEED and MTD=HASH.
- METHOD (or MTD)
- The masking method to use. For small or less complex data sets,
the HASH method has a better strength of masking compared to the REPEATABLE
method. The HASH method might provide slightly slower processing compared
to the REPEATABLE method.
Enter one of the following options:
- REPEATABLE (or REP)
- Generates consistently repeatable and unique output values. The
same masked value is generated for every instance of a given source
value, and each unique source value has a unique output value.
- HASH
- Generates output values by using a hash algorithm. The output
values might not be unique nor repeatable between masking operations.
MTD=HASH
is not compatible with the parameter ALGO=FPE.
ALGO=FPE parameters
Parameters
for use with ALGO=FPE only.
- KEY
- The encryption key. To produce repeatable masked data, use the
same key value.
Enter one of the following options:
- "literal"
- A 1 - 32 character alphanumeric string (including special characters
such as $, @, and #). A NULL value is not valid.
- @environment-variable-name
- Name of an environment variable that provides the encryption key.
The environment variable name must be prefixed with the at sign (@)
and assigned a value before invoking the masking provider.
- ITERATIONS
- The number of times the FPE algorithm processes each source value
to avoid creating a masked value that matches the source value. Enter
an integer in the range 1 - 99. The default is 12. A value less than
12 has a negligible improvement on performance.
- TWEAKS
- If two values contain identical strings, determines whether to
mask the identical strings differently for each combination of non-identical
characters.
TWEAKS is not applicable to numeric data types or with
the parameters ALGO=DEF and RULE.
Enter one of the following
options:
- YES
- Default. Use tweaks to mask identical strings in source values
by using other strings within each source value to influence the masking
process. For example, if TWEAKS=YES, the provider might mask values
AB-1234 and CD-1234 to DX-4795 and NR-3687, rather than DX-4795 and
NR-4795.
- NO
- Do not use tweaks.
ALGO=DEF parameters
Parameters
for use with ALGO=DEF only.
- SEED
- Use the SEED parameter to influence the masking process and to
produce repeatable or non-repeatable output values. To produce repeatable
masked data, use the same seed value.
Enter one of the following
options:
- Parameters for producing repeatable values:
- "literal"
- A character string, within enclosing double quotation marks. Any
special characters in the string are ignored. The string is case-sensitive.
Note that there is no particular advantage to providing a lengthy
string. Using a literal potentially exposes the seed value and might
allow reverse engineering to discover the original values.
- @variable
- The name of an environment variable, preceded by the at sign (@).
To prevent reverse engineering to discover the original values, the
variable can be created by the Optim™ administrator
or other authorized person and secured from unauthorized access.
- Parameter for producing non-repeatable values:
- RANDOM (or RAN)
- Produces a non-repeatable output value. Although this option might
produce duplicate values each time the process is run, this option
can be used to generate test data from a limited set of source data
by combining the output of multiple runs.
- RANDOM (or RAN)
- Specifies a random seed value, which produces a non-repeatable
output value. This option can be used as a test data generator by
concatenating the output of multiple runs. This option might produce
duplicate values each time the process is run.
CHAR parameters
Parameters
for use with CHAR character types.
- LENGTH (or LEN)
- For character data types only. Specifies a fixed length for each
destination value, regardless of the length of the source value. The
source value is repeated or truncated to match the LENGTH value before
processing. If this parameter is not specified, the length of the
destination value is the same as the source value.
The trailing
blanks of an unfilled CHAR input value are considered part of a source
value, resulting in an output value that also has trailing blanks.
To include only non-blank characters in the output value, also specify
TRIM=RIGHT.
The destination value might not be unique when the
LENGTH parameter is specified. For example, if LEN=2 and the unique
values "test1" and "test2" are input, the provider masks only the
first two characters in each value, "te", and generates duplicate
output values.
Specify MTD=HASH when using LENGTH with ALGO=DEF.
To specify LENGTH with ALGO=FPE, use MTD=REP.
LEN is not compatible
with the parameter RULE.
Enter one of the following options:
- n
- An integer value that specifies the destination value length.
The value cannot exceed the length of the destination field.
- MAX
- Generate a value that matches the length of the destination field.
- COPY
- For character data types only. Specifies one or more pairs of
substrings to copy to the destination value. The provider can also
replace characters with a literal string. Note that this parameter
might cause non-unique destination values.
If the literal string
is shorter than the substring, the provider replicates the literal
until the lengths match. If the literal is longer than the substring,
it is truncated.
Spaces are supported within literals only (for
example, “X Y”).
COPY is not compatible with the
parameter RULE.
Enter the following options:
- copy-start
- The starting position of the substring to copy to the destination.
- copy-len
- The length of the substring to copy to the destination.
- copy-lit
- A literal string that replaces the source characters in the specified
positions. If the literal string is shorter than the substring, the
provider replicates the literal until the lengths match. If the literal
is longer than the substring, it is truncated.
Numeric parameters
Parameters
for use with numbers and numeric data types.
- RULE
- Specifies processing rules to ensure that numeric strings are
properly masked as numeric values.
Enter one of the following options:
- NUMERIC (or NUM)
- For numeric values within character data types only, specifies
that numeric values generate masked numeric values instead of other
characters. Without this option, the value 12345 in a character type
could produce a masked value that contains alphabetic characters.
If a source value is not a number, the provider processes the value
normally without an error or warning message. Use this option for
the following situations:
- To transform integers in a character data type field to match
the values of a numeric data type field. This usage maintains DBMS-type
foreign key integrity across differing data types.
- To correctly transform floating point numbers in character data
type fields. Without this parameter, the exponent character E (or
e) is transformed into another letter.
- To maintain leading zeros that might otherwise be removed. For
example, to prevent a value such as 00015 from being masked as a two-digit
value.
RULE=NUM is not compatible with the parameters COPY and LEN.
- BINARY (or BIN)
- Specifies that source values of a numeric data type are transformed
into values that conform to the provider data type restrictions. The
output value might differ from the source value in order of magnitude
and sign. Without this parameter, source values in numeric data type
fields are converted to character strings and then converted back
to the original data type.
Without RULE=BIN, there is a discrepancy
between the number of digits that are required to represent a numeric
value in character format and the maximum value that a given number
of digits can express. For example, an unsigned short integer needs
five-digit positions to hold the maximum unsigned value of 65535,
but five-digit positions can express a maximum string value of 99999,
which when converted back to an unsigned short integer results in
an overflow. Without RULE=BIN, boundary source values in numeric data
type fields might be transformed into values that use this discrepancy
and do not conform to the storage constraints.
RULE=BIN is not
compatible with the parameters COPY, LEN, TRIM, REMOVE, and CASE.
- SCALED (or SCA)
- Specifies that the output value has the same sign and approximately
the same magnitude as the source value of the numeric data type. For
example, when masking data such as salary figures, use this option
to ensure that a value such as 12,345.67 does not produce a value
such as -3.1.
Source value parameters
Parameters
for handling source values.
- CASE
- Specifies that the provider converts the source values to uppercase
before processing.
CASE is not compatible with the parameters RULE=BIN
and RULE=SCALED.
Enter the following option:
- UPPER (or UP)
- Convert the source values to uppercase before processing.
- REMOVE
- Specifies the characters, within enclosing double quotation marks,
to remove from the source value before processing.
REMOVE is not
compatible with the parameters RULE=BIN and RULE=SCALED.
- TRIM
- Specifies to remove trailing spaces from the source value before
processing.
TRIM is not compatible with the parameters RULE=BIN
and RULE=SCALED.
Enter the following option:
- RIGHT
- Right-trim the source value to remove trailing spaces before processing.
- WHENMATCH
- If the source value matches the masked value, determines how to
ensure that the masked value does not match the source. Spaces are
not supported, except within literals (for example, “X Y”).
WHENMATCH
might create masked values that are not unique. Also, format, length,
or both might be changed when using WHENMATCH.
Enter one of
the following options:
- REVERSE
- Change a matching value by reversing the order. For example, 1234
is changed to 4321.
- SHUFFLE
- Change a matching value by shuffling the order. For example, 1234
is changed to 4213.
- PREFIX
- Change a matching value by including a prefix. If the prefix creates
a masked value that exceeds the length of the data type field, the
row is skipped. Enter one of the following options:
- "literal"
- Enter a literal string to use as a prefix.
For numeric data
types, the string must be numeric. For all other data types, use alphanumeric
and special characters.
For character data types, the length
of the string cannot exceed the field length. For numeric data types,
if the literal value exceeds the data type limit, the row is skipped.
- SEQ(low,high)
- Use a sequential value for a prefix, where low is
the starting value and high is the highest possible
value. Each sequential number is incremented by one. Both values must
be integers no greater than 2,000,000,000. The low value
must be greater than 0 and less than the high value.
When the high value is reached, the sequence repeats.
- RAND(low,high)
- Use a random value for a prefix, where low is
the lowest possible random value and high is the
highest possible random value. Both values must be integers no greater
than 2,000,000,000. The low value must be greater
than 0 and less than the high value.
- SUFFIX
- Change a matching value by including a suffix. If the suffix creates
a masked value that exceeds the length of the data type field, the
row is skipped. Enter one of the following options:
- "literal"
- Enter a literal string to use as a suffix.
For numeric data
types, the string must be numeric. For all other data types, use alphanumeric
and special characters.
For character data types, the length
of the string cannot exceed the field length. For numeric data types,
if the literal value exceeds the data type limit, the row is skipped.
- SEQ(low,high)
- Use a sequential value for a suffix, where low is
the starting value and high is the highest possible
value. Each sequential number is incremented by one. Both values must
be integers no greater than 2,000,000,000. The low value
must be greater than 0 and less than the high value.
When the high value is reached, the sequence repeats.
- RAND(low,high)
- Use a random value for a suffix, where low is
the lowest possible random value and high is the
highest possible random value. Both values must be integers no greater
than 2,000,000,000. The low value must be greater
than 0 and less than the high value.
- OVERLAY(n,m,"literal")
- Replace consecutive characters with a literal value, where n is
the position of the first character, m is the number
of consecutive characters to replace, and "literal" is
the literal value. For example, OVERLAY(1,3,"ABC") would
mask the value 1234 as ABC4.
If the value of m differs
from the length of the "literal" value, the length
of the masked value might differ from the length of the source value.
For example, OVERLAY(1,4,"ABC") would mask the value
123456 as ABC56, and OVERLAY(1,1,"ABC") would mask
the value 123456 as ABC23456.
Processing parameters
Parameters
for managing provider processes.
- DISCARDLIMIT (or DLIM)
- Specifies the number of failed rows to discard before the provider stops processing.
Data definition
parameters
Parameters for defining source and target
data. For further information see, supported data
types.
- FLDDEF
- Required. Specifies the attributes of input values
to use for processing. See Field definition parameter.
- CODEPAGE (or CP)
- An integer value that specifies the codepage or character-set identifier of the source fields. The default is UTF-8. The CP parameter within the FLDDEF parameter overrides this value.
- CPTYPE (or CPT)
- The codepage type of the source fields. The CPT parameter within the FLDDEF parameter overrides this value.
When the origin of the data is DBMS-specific but not tied to any one DBMS, specify the value as ANY. When the origin of the data is from a non-DBMS source, specify the value as NONE. As there are no DBMS-specific code pages for Netezza®, a specification of NONE is implied when Netezza is specified.
Enter one of the following values:
| Value |
Description |
| DBZ (or DB2zOS) |
DB2® for z/OS® |
| DB2 (or DB2LUW) |
DB2 for Linux, UNIX, and Windows |
| IFX (or INFORMIX) |
Informix® |
| MSS (or SQLSERVER) |
Microsoft SQL Server |
| NZ or NETEZZA |
Netezza |
| ODBC |
ODBC |
| ORA (or ORACLE) |
Oracle |
| SYB (or SYBASE) |
Sybase |
| TD or TERADATA |
Teradata |
| ANY |
Any DBMS |
| NONE |
No DBMS |
Supported data types
The
provider supports the following data types. Note that some data types
are not compatible with RULE=BIN and RULE=SCALED.
| Data type |
Description |
RULE=BIN and RULE=SCALED |
| CHAR |
Fixed size character data that is left justified
and space padded. |
No |
| WCHAR |
Fixed size wide character data that is left
justified and space padded. |
No |
| VARCHAR_SZ |
Character data string which is terminated
by a NULL character. |
No |
| WVARCHAR_SZ |
Wide character data string that is terminated
by a NULL character. |
No |
| DOUBLE |
Double precision floating point number in
the range 1.7E +/- 308 (15 digits). |
Yes |
| FLOAT |
Floating point number in the range 3.4E +/-
38 (7 digits). |
Yes |
| LONG_LONG |
An 8 byte signed numeric value in the range
-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. |
Yes |
| U_LONG_LONG |
An 8 byte unsigned numeric value in the
range 0 to 18,446,744,073,709,551,615. |
Yes |
| INTEGER |
A 4 byte signed integer value in the range
-2,147,483,648 to 2,147,483,647. |
Yes |
| U_INTEGER |
A 4 byte unsigned integer value in the
range 0 to 4,294,967,295. |
Yes |
| SMALLINT |
A 2 byte signed integer value in the range
-32,768 to 32,767. |
Yes |
| U_SMALLINT |
A 2 byte unsigned integer value in the
range 0 to 65,535. |
Yes |
| TINYINT |
A single byte signed integer value in the
range -128 to 127. |
No |
| U_TINYINT |
A single byte unsigned integer value in
the range 0 to 255. |
Yes |
| DECIMAL_370 |
Packed decimal encoded buffer. |
No |
Using the provider in the data privacy application
CLI
Use the following syntax to include the affinity provider
in the data privacy application command-line interface (CLI): MASK="PRO=affinity,
algo=...".
The following example shows the data privacy
application CLI syntax for an affinity provider that uses the FPE
algorithm to transform values.
MASK"pro=aff, algo=fpe, mtd=rep, key="foo", iterations=5, whenmatch=shuffle,
flddef1=(name=col, dt=varchar_sz)"
Using the provider in a column map
Use the
following syntax to include the affinity provider in a column map: TRANS
PRO=affinity, algo=....
The following example shows
the column map syntax for an affinity provider that uses the FPE algorithm
to transform values.
TRANS PRO=aff, algo=fpe, mtd=rep, key="foo", iterations=5, whenmatch=shuffle,
flddef1=(name=col, dt=varchar_sz)