Introduction to Unstructured Masking
Unstructured masking enables the detection and masking of sensitive information in text documents and text columns in databases.
| Original | Masked |
|---|---|
| Mr P. Sherman lives at 42 Wallaby Way, Sydney. | Mr M. Adrian lives at 27 Kiwi Drive, Auckland. |
The unstructured_text mask works within all of DataMasque's supported databases and file types.
It contains five different methods of detecting sensitive data in unstructured text:
- from pattern-based matching with
regexandchecksum, - to reference-based based matching with
context_sourcesandseed_files.
Enabling the optional DataMasque AI Engine will grant access to the powerful ai_detect matcher,
allowing for AI-powered entity detection (powered by AWS Bedrock).
Note: For a preview build of the DataMasque AI Engine, please contact support@datamasque.com.
How Unstructured Masking Works
Unstructured masking follows a match-then-mask pattern:
- Match - Matchers detect entities (e.g.,
"P_001") and assign labels (e.g.,PATIENT_ID). - Mask - Each label is assigned a Mask.
The following ruleset matches on all 3-digit patient IDs and masks them by replacing them with [REDACTED].
- column: clinical_notes
masks:
- type: unstructured_text
# Match
matchers:
regex:
- label: "PATIENT_ID"
pattern: "P_\d{3}"
# Mask
masks:
- label: "PATIENT_ID"
masks:
- type: from_fixed
value: "[REDACTED]"
This ruleset is a basic example with a single matcher and mask. More information on different matchers and parameters is available on the Matchers and Labels guide.
The following pages are intended to be followed in order and build on more of the unstructured masking concepts.
Unstructured Masking Documentation
- Configuring AI Engine (Optional)
Setting up DataMasque with the DataMasque AI Engine (if available). - Matchers and Labels
A guide to some key concepts behind theunstructured_textmask. - Deterministic Masking
A guide to driving consistency with deterministic masking. - Examples
Examples of common scenarios alongside rulesets. - Ruleset YAML Specification
Ruleset specification for theunstructured_textmask. - Word Document Masking
Supported content, limitations, andimage_handlingfor Word documents (.docx). - Troubleshooting
Solving problems with unstructured masking.