pii-detect API
Client-side PII detection using regex and heuristic scans.
npm install @rowops/pii-detect
Worker Factory
createPiiDetectWorker
Creates a web worker for PII detection.
import { createPiiDetectWorker } from "@rowops/pii-detect";
const worker = createPiiDetectWorker();
worker.postMessage({
requestId: "1",
type: "detect",
payload: {
ipcBytes: arrowIpcBuffer,
config: { sampleSize: 1000 },
},
});
worker.onmessage = (event) => {
const result = event.data.payload;
console.log("Detected PII:", result.columns);
};
Configuration
PiiDetectConfig
interface PiiDetectConfig {
/** Max values to sample per column (default: 1000) */
sampleSize?: number;
/** Min confidence to report (default: 0.7) */
confidenceThreshold?: number;
/** Custom regex patterns */
customPatterns?: Record<string, string>;
}
Types
PiiType
Supported PII types.
type PiiType =
| "email"
| "phone"
| "ssn"
| "credit_card"
| "ip_address"
| "passport"
| "drivers_license"
| "date_of_birth"
| "bank_account"
| "custom";
PiiAnnotation
Detection result for a column.
interface PiiAnnotation {
column: string;
piiType: PiiType;
confidence: number; // 0.0 - 1.0
sampleCount: number; // Matched values
totalCount: number; // Total sampled
matchRate: number; // sampleCount / totalCount
}
PiiDetectionResult
Full detection result.
interface PiiDetectionResult {
columns: PiiAnnotation[];
scanDurationMs: number;
rowsScanned: number;
}
Utility Functions
isHighSensitivity
Check if PII type requires strict handling.
import { isHighSensitivity } from "@rowops/pii-detect";
isHighSensitivity("ssn"); // true
isHighSensitivity("credit_card"); // true
isHighSensitivity("email"); // false
High-sensitivity types: ssn, credit_card, bank_account, passport
isContactInfo
Check if PII type is contact information.
import { isContactInfo } from "@rowops/pii-detect";
isContactInfo("email"); // true
isContactInfo("phone"); // true
isContactInfo("ip_address"); // true
isContactInfo("ssn"); // false
getSuggestedMaskStrategy
Get recommended masking strategy for a PII type.
import { getSuggestedMaskStrategy } from "@rowops/pii-detect";
getSuggestedMaskStrategy("email"); // "hash"
getSuggestedMaskStrategy("phone"); // "partial"
getSuggestedMaskStrategy("ssn"); // "hash"
getSuggestedMaskStrategy("credit_card"); // "partial"
getSuggestedMaskStrategy("passport"); // "redact"
Detection Patterns
Built-in Patterns
| Type | Pattern | Examples |
|---|---|---|
email | RFC 5322 | john@example.com |
phone | Various formats | +1-555-123-4567, (555) 123-4567 |
ssn | US SSN | 123-45-6789 |
credit_card | Luhn-validated | 4111-1111-1111-1111 |
ip_address | IPv4/IPv6 | 192.168.1.1 |
date_of_birth | Date formats | 1990-01-15, 01/15/1990 |
Custom Patterns
const config: PiiDetectConfig = {
customPatterns: {
employee_id: "^EMP-\\d{6}$",
internal_code: "^[A-Z]{2}-\\d{4}-[A-Z]{2}$",
},
};
Usage Example
import { createPiiDetectWorker, getSuggestedMaskStrategy } from "@rowops/pii-detect";
import type { PiiDetectionResult, MaskConfig } from "@rowops/pii-detect";
async function detectAndBuildMaskConfig(
arrowIpcBytes: Uint8Array
): Promise<MaskConfig> {
const worker = createPiiDetectWorker();
const result = await new Promise<PiiDetectionResult>((resolve) => {
worker.onmessage = (e) => {
resolve(e.data.payload);
worker.terminate();
};
worker.postMessage({
requestId: "1",
type: "detect",
payload: { ipcBytes: arrowIpcBytes, config: { sampleSize: 1000 } },
});
});
// Build mask config from detections
const columnRules: Record<string, string> = {};
for (const annotation of result.columns) {
if (annotation.confidence >= 0.8) {
columnRules[annotation.column] = getSuggestedMaskStrategy(annotation.piiType);
}
}
return { columnRules };
}
Tier Requirements
PII detection requires Pro tier or above.
| Feature | Free | Pro | Scale | Enterprise |
|---|---|---|---|---|
| PII Detection | No | Yes | Yes | Yes |
| Custom patterns | No | Yes | Yes | Yes |
| Full column scan | No | No | Yes | Yes |
Security Notes
- All detection runs client-side - No data leaves the browser
- Only metadata returned - Actual values never exposed
- Regex-based - No ML models or external API calls
- Deterministic - Same input produces same results