DataMasque Portal

Troubleshooting

This guide helps you diagnose and resolve common issues when using DataMasque's subsetting feature. For each problem, you'll find a clear description, common causes, and specific steps to resolve it.

Configuration Issues

Problem: "Include patterns must match at least one table"

Error message:

ERROR: Include patterns must match at least one table in the database

Cause: None of your include patterns matched any tables in the database. This could happen if:

  • Pattern syntax is incorrect (missing schema prefix, wrong wildcards)
  • Table/schema names don't match the pattern
  • Case sensitivity issues

Solution:

  1. Verify your patterns use fully-qualified schema.table format:

    include:
      - glob: public.*              # Correct: includes schema
      - glob: customers             # Wrong: missing schema
    
  2. Check table and schema names in your database:

    • Review actual table names
    • Verify schema names match your patterns
    • Check case sensitivity (quoted vs unquoted identifiers)
  3. Test with a simple exact match first:

    include:
      - table: public.customers     # Start with exact match to verify
    
  4. For regex patterns, verify syntax with escaped dots:

    include:
      - regex: "public\\.customer_.*"    # Correct: escaped dot
      - regex: "public.customer_.*"      # Wrong: dot matches any character
    

Problem: "Pattern must have explicit schema prefix"

Error message:

ERROR: Pattern must have an explicit schema prefix (e.g., 'schema.table')
ERROR: Wildcard schemas are not allowed in glob/regex patterns

Cause: Your glob or regex pattern does not have a fully-qualified schema prefix, or uses wildcards in the schema portion of the pattern.

Solution:

  1. Ensure all glob and regex patterns have an explicit schema prefix:

    include:
      - glob: public.customer_*         # Correct: explicit schema
      - glob: customer_*                # Wrong: missing schema
      - regex: "orders\\.order_.*"      # Correct: explicit schema
      - regex: "order_.*"               # Wrong: missing schema
    
  2. Do not use wildcards or pattern matching in the schema portion:

    include:
      - glob: public.temp_*             # Correct: schema is explicit
      - glob: "*.temp_*"                # Wrong: wildcard schema
      - regex: "public\\..*_backup$"    # Correct: schema is explicit
      - regex: ".*\\..*_backup$"        # Wrong: wildcard schema
    
  3. If you need to match tables across multiple schemas, use separate patterns:

    include:
      - glob: schema1.customer_*
      - glob: schema2.customer_*
      - glob: schema3.customer_*
    

Problem: "Pattern precedence conflicts"

Error message:

ERROR: Table 'public.customers' matches multiple patterns with conflicting settings

Cause: A table matches multiple patterns that have conflicting configuration (different target_percent values, or both include and skip patterns).

Solution:

  1. Review Pattern Precedence Rules to understand how DataMasque resolves conflicts.

  2. Use exact table patterns for highest precedence:

    include:
      - glob: public.*              # Matches all public tables at 10%
      - table: public.customers     # Exact match overrides glob
        target_percent: 15
    
  3. Make patterns more specific to avoid unintended matches:

    include:
      - glob: public.customer_*     # More specific than public.*
      - glob: public.order_*
    
  4. Use skip patterns to exclude specific tables:

    include:
      - glob: public.*
    skip:
      - table: public.temp_data     # Explicitly skip this table
    

Table Primary Key Issues

Problem: "Table has no primary key or unique index"

Error message:

ERROR: Table 'logs.user_events' has no primary key or unique index

Cause: DataMasque requires tables to have a primary key or unique index for hash-based row selection.

Note for Oracle databases: Oracle tables without a primary key or unique index can still be subsetted using Oracle's built-in ROWID pseudo-column. This fallback is automatic and requires no additional configuration. However, because ROWID values are non-deterministic, repeated subsetting runs may select different rows.

Solution (for PostgreSQL and SQL Server): Choose one of the following.

  1. Add a primary key to the table.

  2. Create a unique index that can serve as a primary key.

  3. Exclude the table from subsetting using a skip pattern (this will only work if it is not a child table of another table that is included in subsetting):

    include:
      - glob: logs.*
    skip:
      - table: logs.user_events  # Skip table without primary key
    

Circular Foreign Key Relationships

Circular FKs Are Supported

DataMasque subsetting automatically handles circular foreign key relationships, including:

  • Multi-table cycles (e.g., DEPARTMENTSEMPLOYEES)
  • Self-referencing tables (e.g., EMPLOYEES.manager_idEMPLOYEES.id)

How it works: When circular FKs are detected, subsetting uses strongly connected component (SCC) analysis to identify circular relationships. During copy-back operations, tables within circular groups are processed using graph condensation to maintain a valid ordering, whilst FK constraints remain disabled throughout the operation.

Note: FK constraints are re-enabled and validated at the end of subsetting. If data integrity issues exist in the source data, they will be detected at this point.

Self-Referential Tables and Target Percentage

Behaviour: For tables with self-referential foreign keys (also known as recursive foreign keys), the actual retention percentage may exceed the configured subset_percent_goal.

Example: Consider an EMPLOYEES table with 10 rows where manager_id references EMPLOYEES.id:

  • subset_percent_goal is set to 20% (2 rows)
  • Hash-based selection picks 2 employees
  • Upward cascade adds their managers to maintain FK integrity
  • Downward cascade adds their direct reports
  • Actual result: 4-6 rows retained (40-60%)

Why this happens: Self-referential foreign keys require bidirectional cascading:

  1. Downward cascade: Includes subordinates (e.g., employees who report to selected employees)
  2. Upward cascade: Includes the management chain up to the top of the hierarchy

This ensures all foreign key constraints are satisfied, but means more rows are retained than the target percentage.

What to expect:

  • The subsetting operation will complete successfully
  • A warning will be logged indicating the actual retention percentage exceeds the target
  • The run will finish with a "finished_with_warnings" status
  • All foreign key constraints will be satisfied

Workaround: If you need to achieve a specific target size for self-referential tables, reduce the subset_percent_goal and run iteratively until the desired final size is reached. For example, to achieve 20% retention, you might need to set subset_percent_goal to 10%.

DROP and TRUNCATE Issues

Problem: "Cannot drop table due to foreign key constraints"

Error message:

ERROR: Cannot drop table 'staging.temp_data' because other objects depend on it

Cause: You attempted to drop a table that has foreign key references from other tables that are not being dropped.

Solution:

  1. Review foreign key relationships using database tools to identify dependent tables.

  2. Drop dependent tables as well:

    drop:
      - table: staging.temp_data        # Parent table
      - table: staging.temp_data_items  # Child table with FK to temp_data
    
  3. Or use a glob pattern to drop an entire schema:

    drop:
      - glob: staging.*  # Drops all tables in schema
    
  4. Alternatively, move the table to skip instead of drop if you want to preserve it:

    include:
      - glob: staging.*
    skip:
      - table: staging.temp_data  # Skip instead of drop
    

Problem: "TRUNCATE failed due to foreign key references"

Error message:

ERROR: Cannot truncate table 'public.audit_log' because other tables reference it via foreign key constraints

Cause: Database does not allow truncating tables with foreign key references (behavior varies by database type).

Solution:

  1. Truncate child tables as well:

    truncate:
      - table: public.audit_log       # Parent table
      - table: public.audit_details   # Child table
    
  2. Or use a glob pattern to truncate related tables:

    truncate:
      - glob: public.audit_*  # Truncates all audit tables
    
  3. For databases that support it, foreign key constraints are temporarily disabled during truncate operations. If this fails, check database-specific limitations.

Insufficient Disk Space

Problem: Insufficient disk space during processing

Error message (Oracle examples):

ORA-01652: unable to extend temp segment by <size> in tablespace <tablespace_name>
ORA-01653: unable to extend table <table_name> by <size> in tablespace <tablespace_name>
ORA-01654: unable to extend index <index_name> by <size> in tablespace <tablespace_name>

Cause: The subsetting process creates temporary tables during processing, which requires additional disk space beyond the original data size.

Solution:

  1. Free up disk space on the database server before running the subset operation.

  2. Reduce the target percentage to require less temporary storage:

    defaults:
      target_percent: 5  # Reduce from higher percentage
    
  3. Configure additional temporary storage space if available.

Getting Help

If you cannot resolve an issue using this guide:

  1. Check the DataMasque application logs for additional error details in the web interface
  2. Contact DataMasque support with:
    • Complete subsetting configuration (YAML)
    • Full error message from the application
    • Database type and version information
    • Available disk space and system resources
    • All application logs

Prevention Best Practices

  • Test with higher percentages first: Start with 25-50% subsets to validate relationships and identify issues, then reduce to your target percentage.

  • Verify database access: Ensure your database connection has appropriate permissions before running large subsets.

  • Check disk space: Verify adequate disk space is available - plan for at least 2x your target subset size in temporary storage.

  • Review table structures: Ensure all tables have primary keys or unique indexes before attempting subsetting.