Numeric Masks
Numeric masks work specifically with number data types.
- Random Number
Generates a random integer/decimal between two numbers - supports triangular or uniform distribution - Random Boolean
Generates a random true/false or 1/0 value - Numeric Bucket
Generates replacement numbers whilst retaining specified ranges
Random Number (from_random_number)
This mask replaces the column value with a random number. The default number type is
integer. If decimal_places is set greater than 0, a decimal number will be
generated.
Note: For compatibility with some databases, the generated value is actually of string type. Most database engines automatically convert the string to the appropriate numeric type for the column being masked, so no typecast is required. For file masking however, if you need a numeric value, use a typecast mask with
typecast_asset tointeger,float, ordecimalas appropriate.
Parameters
max(required): The generated number is guaranteed to be less than or equal to this maximum value.min(optional): This is the minimum value for the generated number. Defaults to0.mode(optional): If this parameter is supplied, the number will be chosen from a triangular distribution where 'mode' is the peak. Useful for street numbers etc. (e.g. withminandmodeboth set to1to generate lower street numbers more frequently).decimal_places(optional): Number of decimal places to generate. Defaults to 0.
Example (integer)
This example replaces the quantity column in the products table with a random
integer between 1 and 200.
version: '1.0'
tasks:
- type: mask_table
table: products
key: id
rules:
- column: quantity
masks:
- type: from_random_number
min: 1
max: 200
Show result
| Before | After |
|
|
|---|
Example (decimal)
This example replaces the price column in the products table with a random
number to 2 decimal places between 1 and 200.
version: '1.0'
tasks:
- type: mask_table
table: products
key: id
rules:
- column: price
masks:
- type: from_random_number
min: 1
max: 200
decimal_places: 2
Show result
| Before | After |
|
|
|---|
Random boolean (from_random_boolean)
This mask replaces the column value with a random boolean value (1/0 or
true/false). from_random_boolean is effectively equivalent to a
from_random_number mask with options max: 1 and min: 0.
Note The return type is an integer which can be automatically cast to the appropriate boolean type in most databases. For file masking and Amazon DynamoDB, literal
1or0will be written to the file, which may not be the intended behaviour.
- If literal
true/falseboolean values are required, please chainfrom_random_booleanwith atypecast:masks: - type: chain masks: - type: from_random_boolean - type: typecast typecast_as: boolean
- If
"true"/"false"strings are required please use thefrom_choicesmask with"true"or"false"values (note the quoting around the values). For more information please refer tofrom_choices
Example
This example replaces the active column in the products table with random
boolean values.
version: '1.0'
tasks:
- type: mask_table
table: products
key: id
rules:
- column: active
masks:
- type: from_random_boolean
Show result
| Before | After |
|
|
|---|
Numeric Bucket (numeric_bucket)
This mask generates random integers from within the same "bucket" (numeric range) as the value to be masked. The mask is set up using an array of integers that define the lower bound of each bucket.
buckets Intro
Buckets are a series of defined lower bounds from which replacement values are randomly generated.
For example, an age column that should mask values into certain age buckets: 1-17, 18-24, 25-64, 65+. buckets: [18,25,65]
are specified, so if a value from the column is 16, the replacement value is randomly generated within the 1-17 range,
as opposed to any number.
Capping minimum and maximum values
The numeric bucket mask will generate numbers for the lowest bucket from 0/1 to <bucket> - 1. Similarly, for the highest
bucket, numbers will be generated in the range bucket to 231 - 1. This default behaviour may cause undesired
outputs which are outside the normal range for the data.
For example, consider a column containing ages. All ages currently in the column are 18 or older. Ages are to be retained in buckets 35 or younger, 36-65, 65+.
A ruleset could be defined with buckets: [36, 65], but this would not function correctly. Ages 35 or lower could be masked
to any value from 0-35, therefore the masked age could be below 18, which does not match the application rules. A similar
problem exists with ages over 65, the masked value could be any value from 65 to 231 - 1.
To solve this issue, lower and upper bounds should be specified. This requires an understanding of the existing data. Choosing
the upper bound would mean selecting a reasonable value that your ages could have. A better ruleset definition could be: buckets: [18, 36, 65, 100].
This would limit the lower age to 18 and upper age to 100.
Parameters
buckets(required): A series of numbers representing the lower bounds of the buckets, including the number e.g.[5, 10, 15 20]would result in ranges 1-4, 5-9, 10-14, 15-19, 20+. Note:bucketsshould not contain duplicates and should be in ascending order.force_change(optional): Since the replacement values are chosen randomly, it is possible that the number might be randomly replaced with the same one (for example,45is chosen as a replacement for45). Setforce_changetotrueto ensure the replacement value is not the same as the original value. Note that this makes the output slightly less random as the number of possible replacements is reduced by one.include_zero(optional): When set totrue, will lower the first lower bound to include 0, otherwise the lowest bound will be 1.scale_to(optional): Provide a number that better represents the scale of your data in the target column to which replacement values will be scaled. e.g., If the scale of your data is 1000 (values are multiples of 1000), then 4583 is not an appropriate replacement for 2000. Setscale_toto1000so the replacement values will be multiples of1000(4583 -> 5000).scale_tomust be a multiple of 10, and the values inbucketsmust also be multiples ofscale_to.
Note: If
bucketswere defined with a size of one (e.g.buckets: [1,3,5]) with theforce_changeparameter set totruethis would cause an infinite loop. Instead, an error will be raised. If the target column is of a type with a smaller range than 0 - 231-1 (smallint, tinyint, etc.), add a maximum value to thebucketsas replacement values could be larger than the column can store.numeric_bucketrequires a numeric value as the input. Atypecastmask can be used to convert any string values before masking withnumeric_bucket. For more information please refer to Typecast.
Example
This example replaces the age column in the users table with random values from the specified buckets.
version: '1.0'
tasks:
- type: mask_table
table: users
key: id
rules:
- column: age
masks:
- type: numeric_bucket
buckets: [16,18,25,65,110]
force_change: true
Show result
| Before | After |
|
|
|---|