DataMasque Installation on Red Hat Openshift on AWS (ROSA)
DataMasque supports deployment to ROSA clusters, with Elastic File Service (EFS) as the persistent volume storage attached to the cluster. DataMasque images are pushed from Docker to your account's private Container Registry (ECR in this document).
- DataMasque Installation on Red Hat Openshift on AWS (ROSA)
- Supported Versions and Instance Types
- Installation
- Prerequisites
- Instance Metadata Service (IMDS)
- Cluster Configuration
- DataMasque Role Setup
- DataMasque Deployment
- Increasing the number of
masque-agentpods - Upgrading DataMasque on ROSA
- Upgrading from DataMasque v2.19.1 or Earlier
- Upgrading DataMasque v2.20.0 or Newer
Supported Versions and Instance Types
DataMasque supports ROSA version 4.17.
DataMasque supports ROSA worker nodes of EC2 instance type c5.2xlarge or larger. The minimum number of nodes is 2.
DataMasque on ROSA does not support masking files in Mounted Share connections. File masking on AWS S3 or Azure Blob Storage is supported.
Installation
At a high level, the installation process is:
- A. Configure EFS Operator in the ROSA cluster
- B. Configure an EFS instance and access point(s).
- C. Load DataMasque Docker images into a local Docker installation, then push them to your container registry.
- D. Update the Helm Chart variables.
- E. Deploy the configuration to the ROSA cluster with
helm.
Steps A and B are executed using rosa, ccotl and aws commands on the command line.
Example configuration files and commands are provided in this guide.
Step C uses a script to push the DataMasque Docker packages to ECR.
Step D requires creating and editing the values.yaml for Helm, based on steps A and B.
Step E is to deploy the configuration using helm.``
Prerequisites
Before performing installation, the following tools must be installed on the machine where the deployment instructions are being followed:
Must be configured with authentication tokens. If authentication is saved in a specific profile.
rosaopenshift-cliccotlkubectlhelm- DataMasque Helm Chart, downloaded from the DataMasque Portal
Docker
Use these instructions as a guide for installing Docker packages only. It is not necessary to install DataMasque into Docker.
jqCLI tool
Instance Metadata Service (IMDS)
The EC2 nodes for ROSA must have access to the AWS Instance Metadata Service. For IMDSv1, no additional configuration is required. For IMDSv2, make sure the IMDS hop limit is set to at least 2.
Refer to the DataMasque AWS installation guide for more information on configuring IMDS.
Cluster Configuration
DM requires EFS access to the ROSA cluster, for that it is necessary to set up Red Hat EFS Operator.
- Identify the OpenID provider ARN used by the ROSA cluster
$ aws iam list-open-id-connect-providers
The output should be something similar to: arn:aws:iam::
$ rosa list oidc-config
- Next, create a directory named credrequest containing the credentials file named CredentialsRequest.yaml. The policy in the file below allows access to any EFS object; for security reasons, it's possible to refine access to a specific object.
# credrequests/CredentialsRequest.yaml
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
name: openshift-aws-efs-csi-driver
namespace: openshift-cloud-credential-operator
spec:
providerSpec:
apiVersion: cloudcredential.openshift.io/v1
kind: AWSProviderSpec
statementEntries:
- action:
- elasticfilesystem:*
effect: Allow
resource: '*'
secretRef:
name: aws-efs-cloud-credentials
namespace: openshift-cluster-csi-drivers
serviceAccountNames:
- aws-efs-csi-driver-operator
- aws-efs-csi-driver-controller-sa
- Then, create the EFS operator role using the ccotl tool. The OIDC provider ARN identified in step 1 must be assigned to the --identity-provider-an parameter. A subdirectory named manifests in the current path is created with the output of the command.
$ ccoctl aws create-iam-roles --name=<EFS_ROLE_PREFIX> --region=<REGION> --credentials-requests-dir=credrequest --identity-provider-arn=<OIDC_ROSA_ARN_HERE>
Example of the output of the ccotl command.
2025/03/14 14:53:04 Role arn:aws:iam::897875381524:role/rosa-efs-openshift-cluster-csi-drivers-aws-efs-cloud-credentials created
2025/03/14 14:53:04 Saved credentials configuration to: /Users/admin/rosa-setup/efs-operator/manifests/openshift-cluster-csi-drivers-aws-efs-cloud-credentials-credentials.yaml
2025/03/14 14:53:04 Updated Role policy for Role rosa-efs-openshift-cluster-csi-drivers-aws-efs-cloud-credentials
Access the Openshift Web console as cluster-admin and install the EFS CSI Driver Operator (by Red Hat) by clicking on Operators > Operator Hub. Locate the AWS EFS CSI Operator by typing AWS EFS CSI in the filter box. Selecting the latest 4.17 version. In the Role ARN field, the ARN of the role created previously in Step 3 of this tutorial must be entered. All namespaces on the cluster (default) is selected. Installed Namespace is set to openshift-cluster-csi-drivers.
Deploy the CSI Driver. Click Administration, CustomResourceDefinitions, ClusterCSIDriver. On the Instances tab, click Create ClusterCSIDriver. Paste the yaml below
apiVersion: operator.openshift.io/v1
kind: ClusterCSIDriver
metadata:
name: efs.csi.aws.com
spec:
managementState: Managed
and click on Create
Wait for the following Conditions to change to a "True" status:
AWSEFSDriverNodeServiceControllerAvailable
AWSEFSDriverControllerServiceControllerAvailable
Validate the AWS EFS CSI Driver Operator installation using the command below:
$ oc get clusterserviceversions -n openshift-operators
The output should contain the role ARN and the version that was installed, for example:
aws-efs-csi-driver-operator.v4.17.0-202503101104 AWS EFS CSI Driver Operator 4.17.0-202503101104 Succeeded
- Now that the EFS operator is installed, the EFS can be created. The EFS is used to persist data across pod restarts as well as share data between pods.
First, retrieve the ID of VPC in which the EKS cluster was created, so that the EFS can be created in the same VPC.
$ VPC_ID=$(aws eks describe-cluster \
--name $CLUSTER_NAME \
--query "cluster.resourcesVpcConfig.vpcId" \
--output text)
The VPC ID is now in the VPC_ID variable and can be used in subsequent commands.
Next, the CIDR range of the VPC is required, which is fetched with this command and stored in the CIDR_RANGE variable:
$ CIDR_RANGE=$(aws ec2 describe-vpcs \
--vpc-ids $VPC_ID \
--query "Vpcs[].CidrBlock" \
--output text \
--region $AWS_REGION_ID)
New Security Groups are created that allow NFS ingress to the EFS (port 2049).
First, create the security group saving its ID into the SECURITY_GROUP_ID variable:
$ SECURITY_GROUP_ID=$(aws ec2 create-security-group \
--group-name DmEfsSecurityGroup \
--description "EFS security group for DataMasque" \
--vpc-id $vpc_id \
--output text)
Then ingress on port 2049 is granted to the CIDR range.
$ aws ec2 authorize-security-group-ingress \
--group-id $SECURITY_GROUP_ID \
--protocol tcp \
--port 2049 \
--cidr $CIDR_RANGE
Next the file system can be created.
The file system ID is stored in the FILE_SYSTEM_ID variable and is required later when configuring the Helm values.
$ FILE_SYSTEM_ID=$(aws efs create-file-system \
--region $AWS_REGION_ID --encrypted \
--performance-mode generalPurpose \
--tags Key=Name,Value=datamasque-eks-efs-file-system \
--query 'FileSystemId' \
--output text)
Display the file system ID using echo:
$ echo $FILE_SYSTEM_ID
fs-11223344556677889
Retain this value for later.
- Mount targets for the EFS need to be created. These are IP addresses assigned to the EFS instance in the specified subnets it should be made available.
To get a list of available subnets, use the describe-subnets command (this is not necessary if you already know the
subnets you want to add the EFS to).
$ aws ec2 describe-subnets \
--filters "Name=vpc-id,Values=$VPC_ID" \
--query 'Subnets[*].{SubnetId: SubnetId,AvailabilityZone: AvailabilityZone,CidrBlock: CidrBlock}' \
--output table
This command assumes that the VPC_ID variable from Step 7 is still in scope.
After you know the IDs of the subnets in which the EFS should be available,
execute create-mount-target for each subnet.
$ aws efs create-mount-target \
--file-system-id $FILE_SYSTEM_ID \
--subnet-id <subnet_id> \
--security-groups $SECURITY_GROUP_ID
The variables FILE_SYSTEM_ID and SECURITY_GROUP_ID are expected to still be in scope from Step 7.
This command must be run once for each subnet (which may not be all subnets listed in the previous command,
however, just the ones to add the EFS to).
For example, for subnets subnet-11111111111111111 and subnet-22222222222222222, execute:
$ aws efs create-mount-target \
--file-system-id $FILE_SYSTEM_ID \
--subnet-id subnet-11111111111111111 \
--security-groups $SECURITY_GROUP_ID
$ aws efs create-mount-target \
--file-system-id $FILE_SYSTEM_ID \
--subnet-id subnet-22222222222222222 \
--security-groups $SECURITY_GROUP_ID
- Create an access point for the EFS. The user ID and group ID are both set to
1000to match the user inside the DataMasque containers.
$ FILE_SYSTEM_AP_ID=$(aws efs create-access-point \
--file-system-id $FILE_SYSTEM_ID \
--posix-user Uid=1000,Gid=1000 \
--root-directory Path='/datamasque,CreationInfo={OwnerUid=1000,OwnerGid=1000,Permissions=777}' \
--output text \
--query 'AccessPointId')
Display the file system access point ID using echo:
$ echo $FILE_SYSTEM_AP_ID
fsap-00112233445566778
Retain this value for use when setting up the Helm values.yaml
(later in the deployment configuration).
DataMasque Role Setup
For the product to access other AWS services such as S3 buckets, it is necessary to create a role to be assumed by the DataMasque service account. The access can be restricted to a specific object.
- Identify the ARN of the identity provider used by the ROSA cluster in AWS using the following commands (the openid provider ARN will be used in step3):
aws iam list-open-id-connect-providers
The output should be something similar to: arn:aws:iam::
$ rosa list oidc-config
- Next, create a directory named
credrequests, and create the credentials file namedCredentialsRequest.yaml. The policy in the file below allows access to any S3 bucket; for security reasons, it's possible to refine access to a specific object. Make sure to keep the sameserviceAccountas in the helmchartvalues.yamlfile.
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
name: datamasque-cred
namespace: datamasque
spec:
providerSpec:
apiVersion: cloudcredential.openshift.io/v1
kind: AWSProviderSpec
statementEntries:
- action:
- s3:*
effect: Allow
resource: '*'
secretRef:
name: datamasque-secret-cred
namespace: datamasque
serviceAccountNames:
- datamasque-sa
- Create the DM role using the ccotl tool, the role ARN will be used in the helmchart. The tool creates the role using the cluster name as a prefix. The OIDC provider ARN identified in step 1 must be assigned to the --identity-provider-arn parameter. A directory named manifests is created with the output of the command
ccotl aws create-iam-roles --name=<DM_ROLE_PREFIX> --region=<REGION> --credentials-requests-dir=credrequests --identity-provider-arn=<ROSA_OIDC_PROVIDER_ARN>
The output is something similar to the following:
2025/03/15 14:55:29 Role arn:aws:iam::123456789012:role/rosa-dm-datamasque-datamasque-secret-cred created
2025/03/15 14:55:29 Saved credentials configuration to: /Users/admin/rosa-setup/efs-operator/manifests/datamasque-datamasque-secret-cred-credentials.yaml
2025/03/15 14:55:29 Updated Role policy for Role rosa-dm-datamasque-datamasque-secret-cred
The ROSA and EFS configuration is now complete, and DataMasque can be deployed to the cluster.
DataMasque Deployment
In this section, DataMasque images are uploaded to ECR,
then the values for Helm Chart need to be inserted into the Helm values.yaml file.
DataMasque can then be installed with Helm.
This corresponds to steps C to E in the high level overview.
Ensure Docker is installed on the machine executing these instructions.
You must have your AWS account's ECR host name,
which is normally in the format <AWS account id>.dkr.ecr.<AWS region>.amazonaws.com,
for example: 123456789012.dkr.ecr.us-east-1.amazonaws.com.
You will also need:
- The ARN of the DataMasque role (created in Step 2 above).
- The EFS file system ID (from Step 7 above). It begins with
fs-. - The EFS access point ID (from Step 9 above). It begins with
fsap-.
- Extract the DataMasque Docker package and
cdinto the installation directory.
$ tar -xvzf datamasque-docker-v<version>.pkg
$ cd datamasque/<version>/
- Before pushing to ECR,
dockermust be authenticated to your account's ECR host name. The AWS private registry authentication guide has instructions for setting up authentication. In general the command is in this format:
$ docker login -u AWS -p $(aws ecr get-login-password) <ecr_host>
For example:
$ docker login -u AWS -p $(aws ecr get-login-password) 123456789012.dkr.ecr.us-east-1.amazonaws.com
Note that if you normally require sudo to execute docker, then prepend it to the above command, e.g:
$ sudo docker login -u AWS -p $(aws ecr get-login-password) <ecr_host>
After authenticating, the images can be loaded and pushed to ECR.
- DataMasque must be loaded into Docker on the local machine before they can be pushed to ECR.
The
ecr-image-push.shcommand performs both of these steps. It must be called with your ECR host as the first and only argument. For example:
$ ./ecr-image-push.sh 123456789012.dkr.ecr.us-east-1.amazonaws.com
This script will load the images into the local Docker and then push them to the specified ECR, tagged with the current DataMasque version and build number. It may take a few minutes depending on the internet connection speed.
After pushing, the images' tag will be shown in the console, and this is required for the next step. IMPORTANT: If AWS ECR is used as the container registry, the AmazonEC2ContainerRegistryReadOnly policy must be attached to the worker role of the ROSA Cluster.
- Logged as cluster-admin in openshift-cli, create the datamasque project:
$ oc new-project datamasque
- Add the anyuid policy to the datamasque-sa service account in the datamasque project:
$ oc adm policy add-scc-to-user anyuid -z datamasque-sa -n datamasque
Download the DataMasque Helm Chart from the DataMasque Portal and extract it so that the
datamasque-helmchartdirectory is in the current directory.Use
helmto create thevalues.yamlfile using the following command:
$ helm show values ./datamasque-helmchart > values.yaml
- The variables for Helm are in the
values.yamlthat was just created. Please refer todatamasque-helmchart/README.mdfor the values that need to be replaced and where they should be inserted into thevalues.yamlfile. Ignore values that are not valid for deployment on ROSA, as they are only applicable to EKS. For example,createNamespaceis not valid on ROSA, as the namespace is created automatically in step 4. Note that you should not edit thedatamasque-helmchart/values.yamlfile as this is a template only.
For example, you will need your ECR repository, file system access point IDs (steps 7 and 9 above), DataMasque version tag (step 3), DM role ARN, and more.
Note: The repository specified in
values.yamlshould include/datamasqueat the end, not just the host name. For example, if using123456789012.dkr.ecr.us-east-1.amazonaws.comin theecr-image-push.shcommand, then specify123456789012.dkr.ecr.us-east-1.amazonaws.com/datamasqueas the repository invalues.yaml.
- After populating the values in
values.yaml, deploy DataMasque withhelm:
$ helm install datamasque ./datamasque-helmchart -f values.yaml
- After deployment, check if the pods are ready:
$ kubectl get pods --all-namespaces
If the pods take more than five minutes to enter Running status,
these commands can help to troubleshoot.
- To find more information about why a pod is stuck in
Pendingstatus, use thedescribe podcommand. For example, to see information about the podadmin-db-0:
$ kubectl describe --namespace <namespace-name> pod admin-db-0
- To see ROSA events, use
kubectl get events. Usegrepto filter for events for a particular pod. For example, to see events just foradmin-db-0:
$ kubectl get events | grep admin-db-0
- If any pods are unable to find persistent volume claims (PVCs) then you will see errors regarding PVCs when
describing the pod or in the event list. Use the
get pvccommand to check if all PVCs have been bound.
$ kubectl get pvc --all-namespaces
This should show the EBS and EFS volumes created during cluster setup (and may also contain any other volumes already attached to the EKS cluster).
For example:
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
default ebs-claim Bound pvc-00000000-1111-2222-3333-444444444444 20Gi RWO ebs-sc 11s
default efs-claim Bound efs-pv 40Gi RWX efs-sc 11s
The
ebs-claimID will differ.
Both volumes should have the status Bound. If they do not, check the IAM roles and permissions assigned to the
addons in steps 5 and 6 of the cluster setup instructions.
Also check that the correct IAM roles were used when setting the Helm values.
- For more information about why a PVC is not
Bound, use thedescribe pvccommand. For example, to describe theebs-claimPVC:
$ kubectl describe pvc ebs-claim
This will give more detailed information about any errors with the PVC.
Once all DataMasque pods are available, you can continue with installation.
- The IP address of the cluster can be found by checking the IP address(es) of the EC2 node instance for the EKS cluster. Visit this IP address in a browser (e.g. https://<Node IP>) to finish the Post-Installation Setup.
Retaining values.yaml For Upgrade
Your values.yaml is required to upgrade DataMasque to a newer version.
Please be sure to keep a backup of it in a safe place.
Increasing the number of masque-agent pods
DataMasque can perform masking runs faster by executing tasks using multiple workers, or by running multiple tasks in parallel (see the performance optimisation documentation). When using EKS, worker tasks can be balanced across multiple nodes by starting multiple agent pods.
By default, values.yaml runs two masque-agent instance.
If your EKS cluster has enough resources, the number of replicas can be increased.
Each additional masque-agent pod requires 1200m of CPU and 5Gi of memory.
To increase the number of replicas, locate the masqueagents variable in the values.yaml
file.
masqueagents: 2
Increase the number of replicas, for example, to 3:
masqueagents: 3
Then re-apply the Helm chart:
$ helm upgrade datamasque ./datamasque-helmchart -f values.yaml
This will add more masque-agent workers without affecting the other pods.
You can also change the number of replicas to 1 to scale down the pods;
however you should not change the number of replicas while a masking run is in progress.
Upgrading DataMasque on ROSA
Upgrading from DataMasque v2.19.1 or Earlier
DataMasque versions v2.19.1 and earlier utilized kubectl for deployment,
and required manual EKS configuration before deployment.
This deployment method is incompatible with Helm,
which is now used starting from DataMasque v2.20.0 and newer versions.
Therefore, upgrading to from v2.19.1 or earlier requires
manual steps to ensure your EKS is correctly configured and cannot be done using Helm.
For assistance with upgrading in this scenario, please contact DataMasque Support at support@datamasque.com.
Upgrading DataMasque v2.20.0 to newer versions can use Helm, as described below.
Upgrading DataMasque v2.20.0 or Newer
To upgrade to a newer version of DataMasque, a new EKS cluster does not need to be created.
The upgrade is performed by pushing new versions of the images to ECR,
then performing a helm upgrade with the values.yaml file that was used for installation.
Provided the same EFS volume is used, all data will be retained on upgrade.
- Extract the new DataMasque package version.
$ tar -xvzf datamasque-docker-v<version>.pkg
$ cd datamasque/<version>/
- Push the new DataMasque images to ECR, using the
ecr-image-push.shscript. Provide your ECR address as the argument.
$ ./ecr-image-push.sh 123456789012.dkr.ecr.us-east-1.amazonaws.com
Be sure that Docker has been authenticated to push to ECR before running this command.
This will output the new DataMasque tag, which is needed in step 4.
Copy the
values.yamlfile that was used during initial installation into the DataMasque extracted package directory (i.e. the same directory that contains theecr-image-push.shfile).Retrieve the current DataMasque internal database password from the Kubernetes secrets, using the following commands:
$ export DB_PASSWORD=$(kubectl get secret --namespace datamasque datamasque-db-secret -o jsonpath="{.data.postgres-password}" | base64 --decode)
$ echo "Internal DB password: $DB_PASSWORD"
This will output something like:
Internal DB password: T3hxV1N6RnJ6ck5ubzlHSQ==
You will need the password in the next step.
- Update the
values.yamlfile and change thetagto the newest DataMasque tag that was output by theecr-image-push.shscript in step 2. Also add the database password from step 4 into thepasswordconfiguration of thedbsection:
db:
# db.user -- DataMasque DB username
user: postgres
# db.password -- DataMasque DB password, leave empty to auto-generate
password: "T3hxV1N6RnJ6ck5ubzlHSQ=="
- Deploy DataMasque using the following command:
$ helm upgrade datamasque ./datamasque-helmchart -f values.yaml
Wait for the pods to be ready:
$ kubectl get pods --all-namespaces
After the pods are ready, connect to DataMasque with the same IP address or hostname as used previously. If the upgrade was successful, you will see the new DataMasque version in the bottom right of the web UI.