Validate Module
The Validate module applies schema-defined validation rules to data using WASM-accelerated processing, producing explicit valid/invalid classifications with structured error information.
Purpose
Apply schema validation rules to Arrow IPC data and produce a validation result that separates valid rows from invalid rows with detailed, structured error information.
When It Runs
Pipeline Position: After Parse
Parse → **Validate** → Mask → Transform → Profile
Validation executes on parsed tabular data before any masking or transformation is applied.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Hero Layer (@rowops/validate) │
│ validateData() / validateArrowData() │
│ - JSON → Arrow IPC conversion │
│ - Worker lifecycle management │
│ - Hook event routing │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Worker Layer │
│ validator.worker.ts │
│ - VALIDATE_ARROW protocol │
│ - Hook event emission │
│ - Cross-field validation │
└─────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────── ─────────────────┐
│ WASM Engine │
│ validate_rows_arrow() │
│ - Arrow IPC in, Arrow IPC out │
│ - Vectorized validation │
└─────────────────────────────────────────────────────────────┘
Inputs
| Input | Type | Description |
|---|---|---|
| Data | Arrow IPC (Uint8Array) | Arrow IPC bytes from Parse stage |
| Schema | ValidateSchemaVersion | Schema with field definitions and validation config |
| Options | ValidateDataOptions | Mapping, regex settings, cross-field rules |
Schema Field Configuration
Each field in the schema can define validation rules:
interface ValidateSchemaField {
key: string;
label?: string;
type?: 'string' | 'number' | 'boolean' | 'date' | 'enum';
required?: boolean;
enumValues?: string[];
regex?: string;
minLength?: number;
maxLength?: number;
min?: number;
max?: number;
semanticType?: string; // 'email', 'phone', 'url', etc.
}
Cross-Field Validation Rules
Define conditional validation between fields:
interface ValidateCrossFieldRule {
when: {
field: string;
equals?: unknown;
exists?: boolean;
};
then: {
field: string;
required?: boolean;
equals?: unknown;
};
code?: string;
message?: string;
}
Example: If country equals "US", then zipCode is required:
{
when: { field: 'country', equals: 'US' },
then: { field: 'zipCode', required: true },
code: 'US_ZIP_REQUIRED',
message: 'ZIP code is required for US addresses'
}
Outputs
| Output | Type | Description |
|---|---|---|
totalRows | number | Total rows validated |
validCount | number | Rows that passed all validation |
invalidCount | number | Rows with validation errors |
errors | ValidateFieldError[] | Structured errors with row index |
errorsByField | Record<string, number> | Error counts grouped by field |
errorsByCode | Record<string, number> | Error counts grouped by error code |
ValidateFieldError Structure
interface ValidateFieldError {
rowIndex: number; // Zero-based row index
field: string; // Field key that failed
code: string; // Error code (REQUIRED, TYPE_MISMATCH, etc.)
message: string; // Human-readable description
value?: unknown; // The failing value
}
High-Level API
validateData()
Validate JSON rows against a schema. Handles Arrow IPC conversion internally.
import { validateData } from '@rowops/validate';
const result = await validateData(
[
{ email: 'test@example.com', age: 25 },
{ email: 'invalid-email', age: -5 },
],
{
fields: [
{ key: 'email', type: 'string', semanticType: 'email' },
{ key: 'age', type: 'number', min: 0, max: 120 },
],
},
{ allowRegex: true, timeout: 30000 },
{
onProgress: (p) => console.log(`${p.percent}% complete`),
onComplete: (r) => console.log(`${r.validCount} valid, ${r.invalidCount} invalid`),
}
);
console.log(`Validated ${result.totalRows} rows in ${result.durationMs}ms`);
console.log(`Errors by field:`, result.errorsByField);
validateArrowData()
Validate Arrow IPC data directly (skip JSON conversion overhead).
import { validateArrowData } from '@rowops/validate';
const result = await validateArrowData(
ipcBytes, // Uint8Array from previous pipeline stage
schema,
{ crossFieldRules: myRules },
{ onProgress: (p) => updateProgress(p.percent) }
);
Validation Hooks
Observe validation lifecycle with hooks:
interface ValidateDataHooks {
onStart?: (meta: {
totalRows: number;
schemaFields: number;
regexEnabled: boolean;
crossFieldRulesCount: number;
}) => void;
onProgress?: (meta: {
stage: 'initializing' | 'validating' | 'cross-field' | 'finishing';
percent: number;
rowsProcessed: number;
totalRows: number;
validCount: number;
invalidCount: number;
message?: string;
}) => void;
onWarning?: (warning: {
code: string;
message: string;
field?: string;
}) => void;
onComplete?: (result: {
totalRows: number;
validCount: number;
invalidCount: number;
errorCount: number;
warningCount: number;
durationMs: number;
errorsByField: Record<string, number>;
errorsByCode: Record<string, number>;
}) => void;
onError?: (error: {
code: string;
message: string;
stage?: 'init' | 'chunk' | 'cross-field' | 'finalize';
}) => void;
}
React Component
RowOpsValidate
Standalone validation UI with progress and error display:
import { RowOpsValidate } from '@rowops/validate-react';
function ValidationPanel({ data, schema }) {
return (
<RowOpsValidate
rows={data}
schema={schema}
columnMapping={{ 'Email Address': 'email' }}
showSummary={true}
showErrors={true}
maxErrors={100}
autoValidate={true}
hooks={{
onValidateStart: (meta) => console.log(`Starting: ${meta.totalRows} rows`),
onProgress: (p) => console.log(`${p.percent}% - ${p.stage}`),
onValidateComplete: (r) => console.log(`Done: ${r.validCount} valid`),
onWarning: (w) => console.warn(w.message),
onError: (e) => console.error(e.message),
}}
onValidate={(result) => {
console.log(`Valid: ${result.validCount}, Invalid: ${result.invalidCount}`);
}}
onError={(errors) => {
console.log(`Found ${errors.length} validation errors`);
}}
onRowClick={(rowIndex) => {
scrollToRow(rowIndex);
}}
/>
);
}
Worker Protocol
The validation worker uses the VALIDATE_ARROW protocol:
// Request
{
type: 'VALIDATE_ARROW',
version: 1,
requestId: string,
timestamp: number,
payload: {
ipcBytes: Uint8Array,
schemaVersion: ValidateSchemaVersion,
allowRegex?: boolean,
mapping?: Record<string, string>
}
}
// Response
{
type: 'VALIDATE_COMPLETE',
version: 1,
requestId: string,
timestamp: number,
success: true,
payload: {
totalRows: number,
validCount: number,
invalidCount: number,
invalid: ValidateFieldError[]
}
}
Hook Events
The worker emits lifecycle events:
| Event | Description |
|---|---|
VALIDATE_HOOK_START | Validation beginning |
VALIDATE_HOOK_PROGRESS | Progress update with stage and percent |
VALIDATE_HOOK_CHUNK_START | Chunk processing started |
VALIDATE_HOOK_CHUNK_COMPLETE | Chunk processing finished |
VALIDATE_HOOK_WARNING | Non-fatal warning |
VALIDATE_HOOK_COMPLETE | Validation finished successfully |
VALIDATE_HOOK_ERROR | Fatal error occurred |
Configuration
Supported Validation Rules
| Rule | Field Property | Behavior |
|---|---|---|
| Required | required: true | Field must have non-empty value |
| Type | type: 'number' | Value must match expected type |
| Regex | regex: '^[A-Z]+' | Value must match pattern |
| Enum | enumValues: ['A', 'B'] | Value must be in allowed list |
| Min Length | minLength: 5 | String must be at least N chars |
| Max Length | maxLength: 100 | String must be at most N chars |
| Min Value | min: 0 | Number must be greater than or equal to value |
| Max Value | max: 100 | Number must be less than or equal to value |
| Semantic Type | semanticType: 'email' | Value must match semantic format |
Cross-Field Rules
Cross-field validation runs after per-field validation:
const schema = {
fields: [...],
validationConfig: {
crossFieldRules: [
{
when: { field: 'hasDiscount', equals: true },
then: { field: 'discountCode', required: true },
code: 'DISCOUNT_CODE_REQUIRED',
message: 'Discount code required when hasDiscount is true'
}
],
regexRulesEnabled: true
}
};
What This Module Does Not Do
- Does not modify row data: Validation is read-only; it classifies but does not alter values
- Does not silently drop invalid rows: Invalid rows are explicitly marked and accessible
- Does not automatically correct values: No auto-fix or fuzzy correction
- Does not halt on validation errors: Pipeline continues; invalid rows are collected
Constraints
Execution Mode
Validation behavior is consistent across dashboard-assisted and headless modes. The same WASM engine executes in both environments.
Tier-Gated Features
| Feature | Tier |
|---|---|
| Basic field validation | All tiers |
| Regex validation | All tiers |
| Cross-field validation | All tiers |
| Set-level rules (uniqueness, FK) | Scale+ |
Performance
- Validation executes via WASM for vectorized performance
- Large datasets are processed in chunks
- Worker pooling reduces overhead for repeated validations
- No network calls during validation
Error Codes
Common validation error codes:
| Code | Description |
|---|---|
REQUIRED | Required field is missing or empty |
TYPE_MISMATCH | Value does not match expected type |
REGEX_MISMATCH | Value does not match regex pattern |
ENUM_MISMATCH | Value is not in allowed enum list |
INVALID_FORMAT | Value format is invalid for type |
MIN_LENGTH | String shorter than minLength |
MAX_LENGTH | String longer than maxLength |
MIN_VALUE | Number less than min |
MAX_VALUE | Number greater than max |
CROSS_FIELD | Cross-field validation rule failed |
Implementation Status
| Feature | Status |
|---|---|
| WASM validation engine | Implemented |
| Arrow IPC protocol | Implemented |
| Structured errors | Implemented |
| Validation hooks | Implemented |
| Cross-field validation | Implemented |
| Hero layer API | Implemented |
| React component | Implemented |
| Worker pooling | Implemented |