DataMasque Portal

Ruleset Library Tutorial

This tutorial guides you through creating and using a ruleset library to eliminate duplicated YAML in your rulesets.

You will learn how to create a library containing a reusable mask and a reusable rule, and then write a ruleset that uses the library rather than duplicating the YAML. It is a hands-on tutorial involving masking a small amount of test data so you can verify that using the library produces the same result as writing the ruleset out in full.

Why use libraries?

When multiple tables or rulesets share the same masking logic, you end up duplicating YAML. Libraries let you define that logic once and reference it from anywhere. If you later need to change the logic, you update it in one place, and every ruleset that uses it automatically picks up the change.

The library format is very flexible, allowing you to deduplicate anything from a single mask all the way up to a full task definition. In this tutorial, we'll write a library with one mask and one rule. These are then referenced twice (once each for two tables), which halves the amount of YAML you would normally need to write.

Tutorial prerequisites

Test data

This tutorial is based on a very simple marketing management system with two tables: customers and promoters. Both tables store a person's name and a marketing score, so the same masking logic applies to both. This makes it a good candidate for a ruleset library — rather than writing the same masks twice, we define them once and reference them from both tables.

The tables in this tutorial database are deliberately over-simplified, in terms of the number of both columns and rows, for the purposes of clarity. A production ruleset library would typically cover many more columns and tables.

Connect to your RDBMS console and run the following SQL to create the tables and insert sample rows. If this is a shared instance, create a new database or schema for the tutorial data, which you can safely delete later.

Note: This DDL is written for PostgreSQL but should be compatible with most RDBMS engines.

CREATE TABLE customers
(
    id              int not null primary key,
    name            varchar(100),
    marketing_score float
);

CREATE TABLE promoters
(
    id              int not null primary key,
    promoter_name   varchar(100),
    marketing_score float
);

INSERT INTO customers (id, name, marketing_score)
VALUES
(1, 'Alice', 30.0),
(2, 'Bob', 50.0),
(3, 'Charlie', 25.5);

INSERT INTO promoters (id, promoter_name, marketing_score)
VALUES
(1, 'Dave', 10.0),
(2, 'Evelyn', 43.5),
(3, 'Fred', 75.5);

Both tables have a name column and a marketing_score column. These shared columns are what makes a library useful here: rather than writing the same masking logic twice, we can define it once in a library and reference it from both tasks.

Setup

  • Open the DataMasque web application and log in.
  • Find the connection in the list on the left side of the Database Masking page, and click the pencil icon to edit. Alternatively, create a new connection by clicking the plus icon.
  • Verify that the connection is configured to use the database and schema in which you created the test data.
  • Ensure DataMasque can reach the RDBMS by clicking the Test Connection button.

Part 1: A simple library and ruleset

In this part we'll create a library with a single reusable mask, write a ruleset that references it, and run a masking job to verify it works.

Creating the library

  • From the left menu, select the Ruleset Libraries page.
  • Click on Add Library at the bottom right.
  • At the top of the editor page, give the new library the name tutorial.
  • Leave the namespace field blank.

The library editor is prepopulated with a template containing placeholders for all seven sections. Delete the template and paste the following YAML in its place:

version: "1.0"

masks:
  name_mask:
    type: concat
    glue: ' '
    masks:
      - type: from_file
        seed_file: DataMasque_firstNames_mixed.csv
        seed_column: firstname-mixed
      - type: from_file
        seed_file: DataMasque_lastNames_v3.csv
        seed_column: lastnames

This defines a mask called name_mask that generates a realistic full name by concatenating a random first name and last name from seed files.

Click Save and Exit to save the library. Verify that the library's validation status shows a green tick on the libraries page.

Creating the ruleset

Select the Database Masking page in the left menu. In the Rulesets panel on the right side, click the + button, then click Skip to YAML Editor.

Paste the following ruleset:

version: "1.0"
imports:
  - tutorial

tasks:
  - type: mask_table
    table: customers
    key: id
    rules:
      - column: name
        masks:
          - $ref: "tutorial#masks/name_mask"

Two things to note:

  • The imports block declares which libraries this ruleset uses. Any library referenced in the ruleset must be listed here.
  • $ref: "tutorial#masks/name_mask" means "look up the value at path masks/name_mask in the tutorial library and insert it here". The format is <library-name>#<path>, or <namespace>/<library-name>#<path> if the library has a namespace.

Click Save and Exit to save the ruleset. Verify that the ruleset shows a green tick in the rulesets list, indicating it is valid. If it shows an error, check the syntax and ensure the imports block is included at the top.

Run masking and verify

Select the connection and ruleset and click Preview Run. The run preview shows that the customers table will be masked. Click Start Run to begin masking.

Once the run completes, query the table in your RDBMS console:

SELECT * FROM customers;

The name column should now contain random names, while marketing_score remains unchanged.

Part 2: Reusing a rule across tables

Now let's mask marketing_score as well, and add the promoters table. Since both tables share the same marketing_score column, we'll define the masking logic once in the library and reference it from both tasks.

Expanding the library

Navigate to the Ruleset Libraries page and open the tutorial library for editing.

Add a database_rules section below the existing masks section:

database_rules:
  marketing_score:
    column: marketing_score
    masks:
      - type: from_random_number
        min: 0.0
        max: 100.0
        decimal_places: 2
      - type: typecast
        typecast_as: float

This rule randomizes marketing_score with a value between 0 and 100. from_random_number outputs a string, so a typecast mask converts it to a float to match the column's data type.

Unlike a masks entry (which defines a single mask), a database_rules entry is a complete rule - it includes the target column and can be dropped directly into a task's rules list.

The full library should now look like this:

version: "1.0"

masks:
  name_mask:
    type: concat
    glue: ' '
    masks:
      - type: from_file
        seed_file: DataMasque_firstNames_mixed.csv
        seed_column: firstname-mixed
      - type: from_file
        seed_file: DataMasque_lastNames_v3.csv
        seed_column: lastnames

database_rules:
  marketing_score:
    column: marketing_score
    masks:
      - type: from_random_number
        min: 0.0
        max: 100.0
        decimal_places: 2
      - type: typecast
        typecast_as: float

Click Save and Exit to save the updated library. Verify the library's validation status shows a green tick.

Updating the ruleset

Navigate to the Database Masking page and open the ruleset you created in Part 1 for editing.

Replace the ruleset contents with:

version: "1.0"
imports:
  - tutorial

tasks:
  - type: mask_table
    table: customers
    key: id
    rules:
      - column: name
        masks:
          - $ref: "tutorial#masks/name_mask"
      - $ref: "tutorial#database_rules/marketing_score"

  - type: mask_table
    table: promoters
    key: id
    rules:
      - column: promoter_name
        masks:
          - $ref: "tutorial#masks/name_mask"
      - $ref: "tutorial#database_rules/marketing_score"

Notice how the two references work at different levels:

  • tutorial#masks/name_mask references a single mask, so it appears inside a masks list (preceded by -).
  • tutorial#database_rules/marketing_score references an entire rule (including the target column), so it appears directly in the rules list.

Both definitions are written once in the library and referenced twice in the ruleset - once per table.

Show equivalent ruleset without a library

version: "1.0"

tasks:
  - type: mask_table
    table: customers
    key: id
    rules:
      - column: name
        masks:
          - type: concat
            glue: ' '
            masks:
              - type: from_file
                seed_file: DataMasque_firstNames_mixed.csv
                seed_column: firstname-mixed
              - type: from_file
                seed_file: DataMasque_lastNames_v3.csv
                seed_column: lastnames
      - column: marketing_score
        masks:
          - type: from_random_number
            min: 0.0
            max: 100.0
            decimal_places: 2
          - type: typecast
            typecast_as: float
  - type: mask_table
    table: promoters
    key: id
    rules:
      - column: promoter_name
        masks:
          - type: concat
            glue: ' '
            masks:
              - type: from_file
                seed_file: DataMasque_firstNames_mixed.csv
                seed_column: firstname-mixed
              - type: from_file
                seed_file: DataMasque_lastNames_v3.csv
                seed_column: lastnames
      - column: marketing_score
        masks:
          - type: from_random_number
            min: 0.0
            max: 100.0
            decimal_places: 2
          - type: typecast
            typecast_as: float

With the library, those duplicated blocks are each replaced by a single $ref line.

If you later need to change the masking logic (for example, using a different seed file for names), you update it in the library and every referencing ruleset automatically picks up the change.

Click Save and Exit to save the ruleset. Verify that the ruleset shows a green tick.

Run masking again and verify

Select the ruleset and click Preview Run. The run preview should show that both tables will be masked. Click Start Run to begin masking.

Once the run completes, query both tables in your RDBMS console:

SELECT * FROM customers;
SELECT * FROM promoters;

You should see:

  • name and promoter_name are masked with random names
  • marketing_score is randomized in both tables

Next Steps

This tutorial demonstrated the masks and database_rules sections. Libraries also support columns, tabular_file_rules, tasks, file_rules, and other sections for different levels of reuse.

  • Refer to Library Structure for details about all seven sections and what each is for.
  • Refer to Referencing for the full $ref syntax, including how to override fields and how a library can reference other parts of itself.