Performance Optimisation
In this document, you'll discover guidelines focused on performance optimisation. It provides guidance on utilising multiple processes, allowing for parallel execution and, consequently, enhanced performance.
- Masking using parallelism and workers
- Masking a table with multiple workers
- Performing tasks in parallel
- Worker and parallel task count selection
- If performance does not increase
- Masking large text data
- Snowflake recommendations
Masking using parallelism and workers
When planning how to parallelize tasks, always bear in mind the total number of worker processes that could be running concurrently. The total number of worker processes is equal to the number of parallel tasks times the number of workers per task.
For example:
1 task in parallel * 2 workers = 2 worker processes
or
2 tasks in parallel * 2 workers = 4 worker processes
DataMasque can run a maximum of 10 parallel tasks simultaneously, however each task may have multiple workers (thus allowing more than 10 worker processes).
The total number of worker processes should not exceed twice the number of CPUs available to the DataMasque instance. For example, if your virtual machine has four CPUs, the total number of worker processes should not exceed eight.
For Linux installation on a EC2 or similar VM, you should ensure your system has at least 1 GiB of free RAM per concurrent worker process. DataMasque itself, and the operating system, together use about 5GiB of RAM on a typical server. We also recommend keeping a buffer of 1GiB of free RAM to avoid the machine becoming unresponsive. Thus, for example, if your machine has 16GiB of RAM, limit the number of concurrent worker processes to 10:
- 16 GiB total memory
- Deduct 5 GiB for the OS and DataMasque
- Keep a 1 GiB buffer
- 10 GiB remaining, divided by a 1 GiB allowance per process, yields a maximum of 10 workers.
Masking a table with multiple workers
To improve masking performance on a single table, you can enable parallelism which allows multiple processes to work
together simultaneously to mask a single table. This can be achieved simply by specifying a number of
workers greater than 1 for a task.
It is also recommended to increase the batch size in addition to increasing the number of workers to achieve optimal performance. Increasing the batch size increases the number of rows that are fetched, masked, and updated in a single operation. This will reduce the database operation overhead at the cost of increasing the memory usage of DataMasque. More details on batch size can be found under the Database Run Options guide.
When using multiple workers (workers > 1),
each worker process operates on a separate batch of rows and these worker processes will run simultaneously.
This can result in reduced masking run time,
and hence improved performance as more rows of the table are masked at once.
Note: Increasing the number of workers will increase the amount of memory used (as well as CPU consumption). It is recommended to monitor resource usage when using parallelism.
In the ruleset specification below workers: 4 is specified,
therefore four worker processes will be used to mask the users table simultaneously.
version: "1.0"
tasks:
- type: mask_table
table: users
workers: 4
key: id
rules:
- column: last_name
masks:
- type: from_fixed
value: 'redacted last name'
The following diagram describes how multiple worker processes work in the example ruleset above.

Notes:
- Number of rows in each buffer is set by the batch size parameter/run option.
- As each worker finishes masking a batch of rows, it will move on to the next unmasked batch of rows.
Performing tasks in parallel
When using the parallel task type, DataMasque performs masking using multiple processes which allows masking
to run in parallel across multiple tables at once. Parallel tasks can reduce the time needed to mask a
database when compared to performing masking on individual tables sequentially.
Below is an example ruleset of how mask_table table tasks can be set up in to run in parallel.
Three tables are masked simultaneously in each parallel task block.
Once the first three tables are masked,
the next parallel task block is executed,
until finally all three parallel task blocks are complete and all tables in the ruleset are masked.
version: "1.0"
tasks:
- type: parallel
tasks:
- type: mask_table
table: table_1
...
- type: mask_table
table: table_2
...
- type: mask_table
table: table_3
...
- type: parallel
tasks:
- type: mask_table
table: table_4
...
- type: mask_table
table: table_5
...
- type: mask_table
table: table_6
...
- type: parallel
tasks:
- type: mask_table
table: table_7
...
- type: mask_table
table: table_8
...
- type: mask_table
table: table_9
...
Note:
mask_unique_keytasks are not allowed to be run in parallel.
The following diagram describes how parallel execution works in the example ruleset shown above.

Worker and parallel task count selection
As a general guideline, DataMasque should be configured to execute two workers or parallel tasks per CPU (or vCPU).
For example, a virtual machine with two vCPUs could run:
- A single
mask_tabletask with four workers; or, - Two
mask_tabletasks in parallel, each with two workers; or, - Four
mask_tabletasks in parallel, each with one worker, etc.
Increasing the worker or parallel count will also increase the amount of memory that DataMasque consumes. Therefore, it is crucial to monitor memory usage carefully when adjusting these settings.
We recommend experimenting with different worker counts to find the optimal configuration for your environment, while monitoring memory usage if increasing worker counts.
Memory consumption will increase if any of the following increase:
- Worker count.
- The number and types of columns being masked (database masking).
- Batch size (database masking).
- The size of individual files being masked (file masking).
Memory consumption does not increase based on the number of rows in a table, or the number of files being masked.
Once you have a stable number of workers for a particular table or file structure, the worker count should not need to be adjusted as the number of rows in the table grows or number of files to mask increases.
If performance does not increase
If you find that the performance of DataMasque does not improve as you add more workers, then you will need to identify the bottleneck.
First, make sure the DataMasque instance itself has enough free resources. Some masking configurations or certain tables/files may use more CPU and memory than others. Even if you are under the recommended number of workers per vCPU, check that the DataMasque instance has not exhausted its resources.
If the DataMasque instance has CPU, memory and IOPS capacity, it may indicate that the bottleneck lies elsewhere.
Consider the following possibilities:
- Database Server Limitations: Your database server might need additional resources, such as more CPU power, higher IOPS, or increased memory, to handle the increased load efficiently.
- Network Speed: For file masking tasks, particularly when dealing with cloud storage, performance is heavily dependent on your network speed. To a lesser extent, network speed can also affect database masking tasks, especially if there is high latency. Check for network saturation, and try to host your database(s) and DataMasque instance in the same region and availability zone if possible.
- Disk I/O: The speed of the mounted file share can also impact file masking tasks. If you are working with on-premise or mounted storage, consider upgrading to faster disks or optimizing the storage configuration.
In these cases, adjusting the resources allocated to your database server or network infrastructure may yield better results. We recommend continuing to monitor resource usage closely and experimenting with these settings to find the optimal configuration for your environment.
Masking large text data
To mask data, DataMasque loads data into memory in batches. The memory consumed depends on the size of each row and the number of rows processed at a time.
For smaller data types, such as integers, floats, dates, and small VARCHAR columns,
memory usage is typically minimal.
Since these data types result in small row sizes, usually only a few kilobytes,
they generally do not cause memory issues.
However, when masking larger data types, memory usage becomes more significant.
Large text fields, for instance, can use up to 4 times their size in memory when loaded into DataMasque.
For example, a single TEXT row of 100 MB would consume 400 MB of memory.
If you are masking 20 rows, each 100 MB in size, the memory requirement becomes: 20 rows x 100 MB x 4 = 8000 MB.
This would exceed the available memory in an 8 GB DataMasque instance.
To avoid memory exhaustion, you can reduce the batch_size to a lower value
based on the size of the rows being processed.
For example, masking 5 rows instead of 20 would significantly reduce memory usage.
Additionally, certain mask types, such as xml, json, and imitate, require extra memory during masking operations.
These types can consume up to 20 times the size of the data being masked.
When working with large text data, be cautious to avoid memory overload.
Snowflake Recommendations
Introduction
Snowflake differs from other RDBMS systems like Postgres in two significant ways.
- It is designed for very large volumes of data: writing more data in fewer requests is very efficient. For example, writing twenty times the amount of data might only take five times as long.
- It has a major limitation: due to table locking, only one batch of masked data can be written back to the target table at a time.
The above imply that using multiple workers to mask one table will only have a limited effect - the workers will sit idle, waiting for their turn to write back the masked data. Using larger batch sizes is more effective: it reduces the average write time per row, reduces the number of batches to mask, and means each worker spends more time masking rather than waiting.
Provided the machine running DataMasque has sufficient RAM, it is strongly recommended to use larger batch sizes. As an example, during internal testing, 8 workers did not perform faster than 4 workers with the default batch size of 50,000 rows. However, large performance gains were achieved with 8 workers by increasing the batch size to 500,000 rows.
Recommendations
The following is a list of recommendations for masking Snowflake databases.
- Where possible, create the table(s) to be masked as a zero-copy clone of the original production data
(using
CREATE TABLE <table to be masked> CLONE <production table>). - We strongly recommend creating a warehouse solely for use by DataMasque. This not only avoids contention, but also allows you to clearly see the compute costs incurred by DataMasque.
- A warehouse size of Small or Medium is more than adequate for masking. Unless you are masking a lot of tables in parallel, you are unlikely to get faster performance with a larger warehouse, as the bottleneck is not the warehouse, but the machine running DataMasque.
- Use large batch sizes. A good value is on the order of 250,000 to 1,000,000 rows, depending on the number of columns, their data types, and available RAM.
- Mask multiple tables in parallel to avoid table lock contention. See Parallel tasks.
- Use between 4 and 12 workers per table. Fewer than 4 workers is not efficient, while too many workers results in lock contention.
Note: The DataMasque UI defaults the batch size to 250,000 when a Snowflake database is selected as the target for masking.
Note: Use of very large batch sizes with multiple workers and/or parallel tasks can exhaust the RAM on the machine running DataMasque and render it unresponsive. Select a batch size and number of workers that maximises masking speed while retaining at least 1GiB of free memory.
Note: With larger batch sizes, the workers are mostly CPU-bound performing the actual masking, rather than waiting for Snowflake or performing I/O. This maximises masking efficiency, but means CPU usage will be much higher than for other databases. The maximum number of workers (across all parallel tasks) should therefore be limited to 1.5x the number of vCPUs at the very most, rather than the above suggested value of twice the number of vCPUs. For example, on a machine with 16 vCPUs, limit the number of concurrent workers to 24.