Ruleset Library Tutorial
This tutorial guides you through creating and using a ruleset library to eliminate duplicated YAML in your rulesets.
You will learn how to create a library containing a reusable mask and a reusable rule, and then write a ruleset that uses the library rather than duplicating the YAML. It is a hands-on tutorial involving masking a small amount of test data so you can verify that using the library produces the same result as writing the ruleset out in full.
- Why use libraries?
- Tutorial prerequisites
- Test data
- Setup
- Part 1: A simple library and ruleset
- Part 2: Reusing a rule across tables
- Next Steps
Why use libraries?
When multiple tables or rulesets share the same masking logic, you end up duplicating YAML. Libraries let you define that logic once and reference it from anywhere. If you later need to change the logic, you update it in one place, and every ruleset that uses it automatically picks up the change.
The library format is very flexible, allowing you to deduplicate anything from a single mask all the way up to a full task definition. In this tutorial, we'll write a library with one mask and one rule. These are then referenced twice (once each for two tables), which halves the amount of YAML you would normally need to write.
Tutorial prerequisites
- DataMasque installed and licensed
- Access to the DataMasque web application
- A supported RDBMS engine running and connected to DataMasque
- Access to the RDBMS console (terminal or GUI client)
Test data
This tutorial is based on a very simple marketing management system with two tables:
customers and promoters.
Both tables store a person's name and a marketing score,
so the same masking logic applies to both.
This makes it a good candidate for a ruleset library —
rather than writing the same masks twice,
we define them once and reference them from both tables.
The tables in this tutorial database are deliberately over-simplified, in terms of the number of both columns and rows, for the purposes of clarity. A production ruleset library would typically cover many more columns and tables.
Connect to your RDBMS console and run the following SQL to create the tables and insert sample rows. If this is a shared instance, create a new database or schema for the tutorial data, which you can safely delete later.
Note: This DDL is written for PostgreSQL but should be compatible with most RDBMS engines.
CREATE TABLE customers
(
id int not null primary key,
name varchar(100),
marketing_score float
);
CREATE TABLE promoters
(
id int not null primary key,
promoter_name varchar(100),
marketing_score float
);
INSERT INTO customers (id, name, marketing_score)
VALUES
(1, 'Alice', 30.0),
(2, 'Bob', 50.0),
(3, 'Charlie', 25.5);
INSERT INTO promoters (id, promoter_name, marketing_score)
VALUES
(1, 'Dave', 10.0),
(2, 'Evelyn', 43.5),
(3, 'Fred', 75.5);
Both tables have a name column and a marketing_score column.
These shared columns are what makes a library useful here:
rather than writing the same masking logic twice,
we can define it once in a library and reference it from both tasks.
Setup
- Open the DataMasque web application and log in.
- Find the connection in the list on the left side of the Database Masking page, and click the pencil icon to edit. Alternatively, create a new connection by clicking the plus icon.
- Verify that the connection is configured to use the database and schema in which you created the test data.
- Ensure DataMasque can reach the RDBMS by clicking the Test Connection button.
Part 1: A simple library and ruleset
In this part we'll create a library with a single reusable mask, write a ruleset that references it, and run a masking job to verify it works.
Creating the library
- From the left menu, select the Ruleset Libraries page.
- Click on Add Library at the bottom right.
- At the top of the editor page, give the new library the name
tutorial. - Leave the namespace field blank.
The library editor is prepopulated with a template containing placeholders for all seven sections. Delete the template and paste the following YAML in its place:
version: "1.0"
masks:
name_mask:
type: concat
glue: ' '
masks:
- type: from_file
seed_file: DataMasque_firstNames_mixed.csv
seed_column: firstname-mixed
- type: from_file
seed_file: DataMasque_lastNames_v3.csv
seed_column: lastnames
This defines a mask called name_mask
that generates a realistic full name
by concatenating a random first name and last name from seed files.
Click Save and Exit to save the library. Verify that the library's validation status shows a green tick on the libraries page.
Creating the ruleset
Select the Database Masking page in the left menu.
In the Rulesets panel on the right side, click the + button, then click Skip to YAML Editor.
Paste the following ruleset:
version: "1.0"
imports:
- tutorial
tasks:
- type: mask_table
table: customers
key: id
rules:
- column: name
masks:
- $ref: "tutorial#masks/name_mask"
Two things to note:
- The
importsblock declares which libraries this ruleset uses. Any library referenced in the ruleset must be listed here. $ref: "tutorial#masks/name_mask"means "look up the value at pathmasks/name_maskin thetutoriallibrary and insert it here". The format is<library-name>#<path>, or<namespace>/<library-name>#<path>if the library has a namespace.
Click Save and Exit to save the ruleset.
Verify that the ruleset shows a green tick in the rulesets list, indicating it is valid.
If it shows an error, check the syntax
and ensure the imports block is included at the top.
Run masking and verify
Select the connection and ruleset and click Preview Run.
The run preview shows that the customers table will be masked.
Click Start Run to begin masking.
Once the run completes, query the table in your RDBMS console:
SELECT * FROM customers;
The name column should now contain random names,
while marketing_score remains unchanged.
Part 2: Reusing a rule across tables
Now let's mask marketing_score as well, and add the promoters table.
Since both tables share the same marketing_score column,
we'll define the masking logic once in the library and reference it from both tasks.
Expanding the library
Navigate to the Ruleset Libraries page and open the tutorial library for editing.
Add a database_rules section below the existing masks section:
database_rules:
marketing_score:
column: marketing_score
masks:
- type: from_random_number
min: 0.0
max: 100.0
decimal_places: 2
- type: typecast
typecast_as: float
This rule randomizes marketing_score with a value between 0 and 100.
from_random_number outputs a string,
so a typecast mask converts it to a float to match the column's data type.
Unlike a masks entry (which defines a single mask),
a database_rules entry is a complete rule -
it includes the target column and can be dropped directly into a task's rules list.
The full library should now look like this:
version: "1.0"
masks:
name_mask:
type: concat
glue: ' '
masks:
- type: from_file
seed_file: DataMasque_firstNames_mixed.csv
seed_column: firstname-mixed
- type: from_file
seed_file: DataMasque_lastNames_v3.csv
seed_column: lastnames
database_rules:
marketing_score:
column: marketing_score
masks:
- type: from_random_number
min: 0.0
max: 100.0
decimal_places: 2
- type: typecast
typecast_as: float
Click Save and Exit to save the updated library. Verify the library's validation status shows a green tick.
Updating the ruleset
Navigate to the Database Masking page and open the ruleset you created in Part 1 for editing.
Replace the ruleset contents with:
version: "1.0"
imports:
- tutorial
tasks:
- type: mask_table
table: customers
key: id
rules:
- column: name
masks:
- $ref: "tutorial#masks/name_mask"
- $ref: "tutorial#database_rules/marketing_score"
- type: mask_table
table: promoters
key: id
rules:
- column: promoter_name
masks:
- $ref: "tutorial#masks/name_mask"
- $ref: "tutorial#database_rules/marketing_score"
Notice how the two references work at different levels:
tutorial#masks/name_maskreferences a single mask, so it appears inside amaskslist (preceded by-).tutorial#database_rules/marketing_scorereferences an entire rule (including the targetcolumn), so it appears directly in theruleslist.
Both definitions are written once in the library and referenced twice in the ruleset - once per table.
Show equivalent ruleset without a library
version: "1.0"
tasks:
- type: mask_table
table: customers
key: id
rules:
- column: name
masks:
- type: concat
glue: ' '
masks:
- type: from_file
seed_file: DataMasque_firstNames_mixed.csv
seed_column: firstname-mixed
- type: from_file
seed_file: DataMasque_lastNames_v3.csv
seed_column: lastnames
- column: marketing_score
masks:
- type: from_random_number
min: 0.0
max: 100.0
decimal_places: 2
- type: typecast
typecast_as: float
- type: mask_table
table: promoters
key: id
rules:
- column: promoter_name
masks:
- type: concat
glue: ' '
masks:
- type: from_file
seed_file: DataMasque_firstNames_mixed.csv
seed_column: firstname-mixed
- type: from_file
seed_file: DataMasque_lastNames_v3.csv
seed_column: lastnames
- column: marketing_score
masks:
- type: from_random_number
min: 0.0
max: 100.0
decimal_places: 2
- type: typecast
typecast_as: float
With the library, those duplicated blocks are each replaced by a single $ref line.
If you later need to change the masking logic (for example, using a different seed file for names), you update it in the library and every referencing ruleset automatically picks up the change.
Click Save and Exit to save the ruleset. Verify that the ruleset shows a green tick.
Run masking again and verify
Select the ruleset and click Preview Run. The run preview should show that both tables will be masked. Click Start Run to begin masking.
Once the run completes, query both tables in your RDBMS console:
SELECT * FROM customers;
SELECT * FROM promoters;
You should see:
nameandpromoter_nameare masked with random namesmarketing_scoreis randomized in both tables
Next Steps
This tutorial demonstrated the masks and database_rules sections.
Libraries also support columns, tabular_file_rules, tasks, file_rules, and other sections
for different levels of reuse.
- Refer to Library Structure for details about all seven sections and what each is for.
- Refer to Referencing
for the full
$refsyntax, including how to override fields and how a library can reference other parts of itself.