Skip to main content

Validate Module

The Validate module applies schema-defined validation rules to data using WASM-accelerated processing, producing explicit valid/invalid classifications with structured error information.


Purpose

Apply schema validation rules to Arrow IPC data and produce a validation result that separates valid rows from invalid rows with detailed, structured error information.


When It Runs

Pipeline Position: After Parse

Parse → **Validate** → Mask → Transform → Profile

Validation executes on parsed tabular data before any masking or transformation is applied.


Architecture

┌─────────────────────────────────────────────────────────────┐
│ Hero Layer (@rowops/validate) │
│ validateData() / validateArrowData() │
│ - JSON → Arrow IPC conversion │
│ - Worker lifecycle management │
│ - Hook event routing │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Worker Layer │
│ validator.worker.ts │
│ - VALIDATE_ARROW protocol │
│ - Hook event emission │
│ - Cross-field validation │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ WASM Engine │
│ validate_rows_arrow() │
│ - Arrow IPC in, Arrow IPC out │
│ - Vectorized validation │
└─────────────────────────────────────────────────────────────┘

Inputs

InputTypeDescription
DataArrow IPC (Uint8Array)Arrow IPC bytes from Parse stage
SchemaValidateSchemaVersionSchema with field definitions and validation config
OptionsValidateDataOptionsMapping, regex settings, cross-field rules

Schema Field Configuration

Each field in the schema can define validation rules:

interface ValidateSchemaField {
key: string;
label?: string;
type?: 'string' | 'number' | 'boolean' | 'date' | 'enum';
required?: boolean;
enumValues?: string[];
regex?: string;
minLength?: number;
maxLength?: number;
min?: number;
max?: number;
semanticType?: string; // 'email', 'phone', 'url', etc.
}

Cross-Field Validation Rules

Define conditional validation between fields:

interface ValidateCrossFieldRule {
when: {
field: string;
equals?: unknown;
exists?: boolean;
};
then: {
field: string;
required?: boolean;
equals?: unknown;
};
code?: string;
message?: string;
}

Example: If country equals "US", then zipCode is required:

{
when: { field: 'country', equals: 'US' },
then: { field: 'zipCode', required: true },
code: 'US_ZIP_REQUIRED',
message: 'ZIP code is required for US addresses'
}

Outputs

OutputTypeDescription
totalRowsnumberTotal rows validated
validCountnumberRows that passed all validation
invalidCountnumberRows with validation errors
errorsValidateFieldError[]Structured errors with row index
errorsByFieldRecord<string, number>Error counts grouped by field
errorsByCodeRecord<string, number>Error counts grouped by error code

ValidateFieldError Structure

interface ValidateFieldError {
rowIndex: number; // Zero-based row index
field: string; // Field key that failed
code: string; // Error code (REQUIRED, TYPE_MISMATCH, etc.)
message: string; // Human-readable description
value?: unknown; // The failing value
}

High-Level API

validateData()

Validate JSON rows against a schema. Handles Arrow IPC conversion internally.

import { validateData } from '@rowops/validate';

const result = await validateData(
[
{ email: 'test@example.com', age: 25 },
{ email: 'invalid-email', age: -5 },
],
{
fields: [
{ key: 'email', type: 'string', semanticType: 'email' },
{ key: 'age', type: 'number', min: 0, max: 120 },
],
},
{ allowRegex: true, timeout: 30000 },
{
onProgress: (p) => console.log(`${p.percent}% complete`),
onComplete: (r) => console.log(`${r.validCount} valid, ${r.invalidCount} invalid`),
}
);

console.log(`Validated ${result.totalRows} rows in ${result.durationMs}ms`);
console.log(`Errors by field:`, result.errorsByField);

validateArrowData()

Validate Arrow IPC data directly (skip JSON conversion overhead).

import { validateArrowData } from '@rowops/validate';

const result = await validateArrowData(
ipcBytes, // Uint8Array from previous pipeline stage
schema,
{ crossFieldRules: myRules },
{ onProgress: (p) => updateProgress(p.percent) }
);

Validation Hooks

Observe validation lifecycle with hooks:

interface ValidateDataHooks {
onStart?: (meta: {
totalRows: number;
schemaFields: number;
regexEnabled: boolean;
crossFieldRulesCount: number;
}) => void;

onProgress?: (meta: {
stage: 'initializing' | 'validating' | 'cross-field' | 'finishing';
percent: number;
rowsProcessed: number;
totalRows: number;
validCount: number;
invalidCount: number;
message?: string;
}) => void;

onWarning?: (warning: {
code: string;
message: string;
field?: string;
}) => void;

onComplete?: (result: {
totalRows: number;
validCount: number;
invalidCount: number;
errorCount: number;
warningCount: number;
durationMs: number;
errorsByField: Record<string, number>;
errorsByCode: Record<string, number>;
}) => void;

onError?: (error: {
code: string;
message: string;
stage?: 'init' | 'chunk' | 'cross-field' | 'finalize';
}) => void;
}

React Component

RowOpsValidate

Standalone validation UI with progress and error display:

import { RowOpsValidate } from '@rowops/validate-react';

function ValidationPanel({ data, schema }) {
return (
<RowOpsValidate
rows={data}
schema={schema}
columnMapping={{ 'Email Address': 'email' }}
showSummary={true}
showErrors={true}
maxErrors={100}
autoValidate={true}
hooks={{
onValidateStart: (meta) => console.log(`Starting: ${meta.totalRows} rows`),
onProgress: (p) => console.log(`${p.percent}% - ${p.stage}`),
onValidateComplete: (r) => console.log(`Done: ${r.validCount} valid`),
onWarning: (w) => console.warn(w.message),
onError: (e) => console.error(e.message),
}}
onValidate={(result) => {
console.log(`Valid: ${result.validCount}, Invalid: ${result.invalidCount}`);
}}
onError={(errors) => {
console.log(`Found ${errors.length} validation errors`);
}}
onRowClick={(rowIndex) => {
scrollToRow(rowIndex);
}}
/>
);
}

Worker Protocol

The validation worker uses the VALIDATE_ARROW protocol:

// Request
{
type: 'VALIDATE_ARROW',
version: 1,
requestId: string,
timestamp: number,
payload: {
ipcBytes: Uint8Array,
schemaVersion: ValidateSchemaVersion,
allowRegex?: boolean,
mapping?: Record<string, string>
}
}

// Response
{
type: 'VALIDATE_COMPLETE',
version: 1,
requestId: string,
timestamp: number,
success: true,
payload: {
totalRows: number,
validCount: number,
invalidCount: number,
invalid: ValidateFieldError[]
}
}

Hook Events

The worker emits lifecycle events:

EventDescription
VALIDATE_HOOK_STARTValidation beginning
VALIDATE_HOOK_PROGRESSProgress update with stage and percent
VALIDATE_HOOK_CHUNK_STARTChunk processing started
VALIDATE_HOOK_CHUNK_COMPLETEChunk processing finished
VALIDATE_HOOK_WARNINGNon-fatal warning
VALIDATE_HOOK_COMPLETEValidation finished successfully
VALIDATE_HOOK_ERRORFatal error occurred

Configuration

Supported Validation Rules

RuleField PropertyBehavior
Requiredrequired: trueField must have non-empty value
Typetype: 'number'Value must match expected type
Regexregex: '^[A-Z]+'Value must match pattern
EnumenumValues: ['A', 'B']Value must be in allowed list
Min LengthminLength: 5String must be at least N chars
Max LengthmaxLength: 100String must be at most N chars
Min Valuemin: 0Number must be greater than or equal to value
Max Valuemax: 100Number must be less than or equal to value
Semantic TypesemanticType: 'email'Value must match semantic format

Cross-Field Rules

Cross-field validation runs after per-field validation:

const schema = {
fields: [...],
validationConfig: {
crossFieldRules: [
{
when: { field: 'hasDiscount', equals: true },
then: { field: 'discountCode', required: true },
code: 'DISCOUNT_CODE_REQUIRED',
message: 'Discount code required when hasDiscount is true'
}
],
regexRulesEnabled: true
}
};

What This Module Does Not Do

  • Does not modify row data: Validation is read-only; it classifies but does not alter values
  • Does not silently drop invalid rows: Invalid rows are explicitly marked and accessible
  • Does not automatically correct values: No auto-fix or fuzzy correction
  • Does not halt on validation errors: Pipeline continues; invalid rows are collected

Constraints

Execution Mode

Validation behavior is consistent across dashboard-assisted and headless modes. The same WASM engine executes in both environments.

Tier-Gated Features

FeatureTier
Basic field validationAll tiers
Regex validationAll tiers
Cross-field validationAll tiers
Set-level rules (uniqueness, FK)Scale+

Performance

  • Validation executes via WASM for vectorized performance
  • Large datasets are processed in chunks
  • Worker pooling reduces overhead for repeated validations
  • No network calls during validation

Error Codes

Common validation error codes:

CodeDescription
REQUIREDRequired field is missing or empty
TYPE_MISMATCHValue does not match expected type
REGEX_MISMATCHValue does not match regex pattern
ENUM_MISMATCHValue is not in allowed enum list
INVALID_FORMATValue format is invalid for type
MIN_LENGTHString shorter than minLength
MAX_LENGTHString longer than maxLength
MIN_VALUENumber less than min
MAX_VALUENumber greater than max
CROSS_FIELDCross-field validation rule failed

Implementation Status

FeatureStatus
WASM validation engineImplemented
Arrow IPC protocolImplemented
Structured errorsImplemented
Validation hooksImplemented
Cross-field validationImplemented
Hero layer APIImplemented
React componentImplemented
Worker poolingImplemented