DataMasque Portal

Introduction to Unstructured Masking

Unstructured masking enables the detection and masking of sensitive information in text documents and text columns in databases.

Original Masked
Mr P. Sherman lives at 42 Wallaby Way, Sydney. Mr M. Adrian lives at 27 Kiwi Drive, Auckland.

The unstructured_text mask works within all of DataMasque's supported databases and file types. It contains five different methods of detecting sensitive data in unstructured text:

  • from pattern-based matching with regex and checksum,
  • to reference-based based matching with context_sources and seed_files.

Enabling the optional DataMasque AI Engine will grant access to the powerful ai_detect matcher, allowing for AI-powered entity detection (powered by AWS Bedrock).

Note: For a preview build of the DataMasque AI Engine, please contact support@datamasque.com.

How Unstructured Masking Works

Unstructured masking follows a match-then-mask pattern:

  1. Match - Matchers detect entities (e.g., "P_001") and assign labels (e.g., PATIENT_ID).
  2. Mask - Each label is assigned a Mask.

The following ruleset matches on all 3-digit patient IDs and masks them by replacing them with [REDACTED].

- column: clinical_notes
  masks:
    - type: unstructured_text

      # Match
      matchers:
        regex:
          - label: "PATIENT_ID" 
            pattern: "P_\d{3}"

      # Mask
      masks:
        - label: "PATIENT_ID"
          masks:
            - type: from_fixed
              value: "[REDACTED]"

This ruleset is a basic example with a single matcher and mask. More information on different matchers and parameters is available on the Matchers and Labels guide.

The following pages are intended to be followed in order and build on more of the unstructured masking concepts.

Unstructured Masking Documentation