Mask Module
The Mask module applies masking strategies to sensitive fields before downstream processing or export.
Masking behavior is implemented in the core execution engine. Configuration support exists, but runtime enforcement should be validated per execution mode.
Purpose
Apply masking strategies to sensitive fields, replacing or obscuring values before they are exported or delivered to external systems. Masking is destructive by design unless you opt into tokenization with a local vault.
When It Runs
Pipeline Position: After Validate (optional)
Parse -> Validate -> **Mask** -> Transform -> Profile
Masking occurs before transformation and export, ensuring sensitive data is protected before any downstream operations.
Inputs
| Input | Type | Description |
|---|---|---|
| Data | Arrow IPC | Validated Arrow RecordBatch stream |
| Config | MaskConfig | Masking rules per field |
Outputs
| Output | Type | Description |
|---|---|---|
| Masked data | Arrow IPC | Masked Arrow RecordBatch stream |
Configuration
MaskConfig Structure
type MaskConfig = {
autoDetectPII?: boolean;
defaultStrategy?: MaskStrategy;
projectSecret?: string;
emailStrategy?: MaskStrategy;
phoneStrategy?: MaskStrategy;
cardStrategy?: MaskStrategy;
ipStrategy?: MaskStrategy;
ssnStrategy?: MaskStrategy;
passportStrategy?: MaskStrategy;
columnRules?: Record<string, MaskStrategy | { strategy: MaskStrategy; options?: MaskRuleOptions }>;
};
type MaskRuleOptions = {
first?: number;
last?: number;
fixed?: string;
projectSecret?: string;
pattern?: string;
replacement?: string;
};
MaskStrategy Options
Basic Strategies
| Strategy | Description | Example |
|---|---|---|
none | No masking (pass-through) | john@example.com -> john@example.com |
redact | Replace with static placeholder | john@example.com -> **** |
hash | One-way SHA-256 hash (optional secret) | john@example.com -> a8d91c... |
deterministic | Hash using projectSecret | john@example.com -> a8d91c... |
partial | Mask portion of value (configurable) | secret123 -> se***23 |
fixed | Replace with fixed string | secret -> [HIDDEN] |
null | Replace with null value | john@example.com -> null |
shuffle | Deterministically shuffle characters | secret -> teserc |
Semantic Strategies (Pro+ Tier)
These strategies automatically format output based on the data type:
| Strategy | Description | Example |
|---|---|---|
email | Mask local part, keep domain | john@example.com -> j***@example.com |
phone | Keep last 4 digits | 555-123-4567 -> ***-***-4567 |
ssn | SSN format with last 4 visible | 123-45-6789 -> ***-**-6789 |
creditcard | Card format with last 4 visible | 4111111111111111 -> ****-****-****-1111 |
Regex Strategy (Scale+ Tier)
| Strategy | Description |
|---|---|
regex | Replace matches using pattern and replacement template |
Regex strategy requires pattern and replacement options:
{
strategy: "regex",
options: {
pattern: "\\d{4}-\\d{4}-\\d{4}-\\d{4}",
replacement: "****-****-****-$4"
}
}
Replacement templates support capture groups ($1, $2, etc.).
Specific strategy availability depends on tier and field type.
Tokenize Strategy (Scale+ Tier)
| Strategy | Description |
|---|---|
tokenize | Replace values with vault-backed tokens |
Tokenize requires a projectSecret (or a vault secret provided by the host):
{
strategy: "tokenize",
options: {
projectSecret: "import-session-secret"
}
}
Tokens are deterministic per secret and recorded in the local token vault.
Tier Behavior
Mask features are available across multiple tiers:
| Feature | Free | Pro | Scale | Enterprise |
|---|---|---|---|---|
| Basic strategies (none/redact/hash/partial/null) | Yes | Yes | Yes | Yes |
| Deterministic + fixed | No | Yes | Yes | Yes |
| Semantic masking (email/phone/ssn/creditcard) | No | Yes | Yes | Yes |
| Shuffle + regex | No | No | Yes | Yes |
| Tokenization vault | No | No | Yes | Yes |
What This Module Does Not Do
- Does not run ML-based PII detection: full detection lives in the
pii-detectmodule; Mask uses config + heuristics - Does not guarantee identical enforcement across all execution modes: validate worker vs pipeline usage in your app
- Does not provide server-side reidentification: tokenization is reversible only with the local vault
- Does not mask based on content analysis alone: masking applies to configured fields and semantic hints
Constraints
Module Status
The public API exports type definitions. Actual masking implementation resides in the core execution engine.
Execution Mode Differences
Masking behavior may differ between execution modes:
| Aspect | Dashboard-Assisted | Headless |
|---|---|---|
| Implementation | Engine layer | Pipeline layer |
| Configuration | Schema-embedded | Pipeline config |
| Enforcement | Should be validated | Should be validated |
Runtime enforcement should be validated per execution mode. Do not assume identical behavior without testing.
Fail-Closed Behavior
In tested paths, masking failures throw MaskingFailedError and halt the pipeline. No fallback to unmasked data is provided.
Failure Modes
| Failure | Behavior |
|---|---|
| Configuration error | MaskingFailedError thrown; pipeline halts |
| Unsupported strategy | MaskingFailedError thrown; pipeline halts |
| Invalid field reference | Error thrown; pipeline halts |
Masking failures are fail-closed: no partial or unmasked results are produced.
Observed Status
Fully implemented. Masking is executed via WASM in the pipeline and engine layer, with:
- Core strategies (none/redact/hash/deterministic/partial/fixed/null/shuffle)
- Semantic strategies (email/phone/ssn/creditcard)
- Regex-based masking with capture group support
- Tokenization with a local token vault
- Tier-gated feature access and fail-closed error handling
Type definitions are exported publicly from @rowops/schema.