Transform Module
The Transform module provides client-side ETL for normalization, enrichment, and reshaping of data. All transforms are compiled to bytecode and executed in Rust/WASM for maximum performance.
Purpose
Apply ETL transformations to row data, including:
- Type coercion and casting
- Value derivation with expressions
- Column renaming
- Row filtering
- String, numeric, and date manipulations
- Conditional transformations
Transforms execute after masking and before profiling in the pipeline.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Hero Layer (@rowops/transform) │
│ transformData() / transformArrowData() │
│ - JSON → Arrow IPC conversion │
│ - Worker lifecycle management │
│ - Hook event routing │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Worker Layer │
│ transform.worker.ts │
│ - TRANSFORM_INIT / TRANSFORM_RUN_CHUNK_ARROW protocol │
│ - Hook event emission │
│ - DSL compilation to bytecode │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ WASM Engine │
│ run_transform_arrow() │
│ - Bytecode VM execution │
│ - Arrow IPC in, Arrow IPC out │
└─────────────────────────────────────────────────────────────┘
All row-level operations execute in WASM. TypeScript handles only DSL construction and result handling.
Installation
npm install @rowops/transform
# For React integration:
npm install @rowops/transform-react
High-Level API
transformData()
Transform JSON rows using a TransformDSL. Handles Arrow IPC conversion internally.
import { transformData, TransformBuilder, col, upper } from '@rowops/transform';
import { resolveBrowserLicense } from "@rowops/import-core";
const transform = new TransformBuilder()
.derive("name_upper", upper(col("name")))
.cast("price", "number")
.build();
const { tierGateInit } = await resolveBrowserLicense({
projectId: "proj_xxx",
entitlementToken: "eyJ...",
});
const result = await transformData(
[
{ name: 'alice', price: '10.5' },
{ name: 'bob', price: '20.0' },
],
transform,
{ tierGate: tierGateInit },
{
onProgress: (p) => console.log(`${p.percent}% - ${p.stage}`),
onComplete: (r) => console.log(`${r.outputRows} rows transformed`),
}
);
console.log(result.rows);
// [{ name: 'alice', name_upper: 'ALICE', price: 10.5 }, ...]
transformArrowData()
Transform Arrow IPC data directly (skip JSON conversion overhead).
import { transformArrowData } from '@rowops/transform';
import { resolveBrowserLicense } from "@rowops/import-core";
const { tierGateInit } = await resolveBrowserLicense({
projectId: "proj_xxx",
entitlementToken: "eyJ...",
});
const result = await transformArrowData(
ipcBytes, // Uint8Array from previous pipeline stage
transform,
{ tierGate: tierGateInit },
{ onProgress: (p) => updateProgress(p.percent) }
);
Transform Hooks
Observe transform lifecycle with hooks:
interface TransformDataHooks {
onStart?: (meta: {
totalRows: number;
operationCount: number;
hasFilter: boolean;
hasLookup: boolean;
}) => void;
onProgress?: (meta: {
stage: 'compiling' | 'executing' | 'finishing';
percent: number;
rowsProcessed: number;
totalRows: number;
currentOperation: number;
totalOperations: number;
message?: string;
}) => void;
onStepStart?: (meta: {
stepIndex: number;
operationKind: string;
targetColumn?: string;
}) => void;
onStepComplete?: (meta: {
stepIndex: number;
operationKind: string;
durationMs: number;
rowsAffected: number;
}) => void;
onComplete?: (result: {
totalRows: number;
outputRows: number;
filteredCount: number;
errorCount: number;
warningCount: number;
durationMs: number;
operationsApplied: number;
}) => void;
onWarning?: (warning: { code: string; message: string; rowIndex?: number; column?: string }) => void;
onError?: (error: { code: string; message: string; stage?: string; stepIndex?: number }) => void;
}
React Integration
RowOpsTransform Component
import { RowOpsTransform } from '@rowops/transform-react';
import { TransformBuilder, col, upper } from '@rowops/transform';
const transform = new TransformBuilder()
.derive("name_upper", upper(col("name")))
.build();
<RowOpsTransform
rows={data}
transform={transform}
showSummary={true}
showPreview={true}
showOperations={true}
showBeforeAfter={true} // Show cell-level before/after comparison
maxPreviewRows={10}
hooks={{
onProgress: (p) => console.log(`${p.percent}% - ${p.stage}`),
onTransformComplete: (r) => console.log(`${r.outputRows} rows`),
onStepStart: (s) => console.log(`Step ${s.stepIndex}: ${s.operationKind}`),
onWarning: (w) => console.warn(w.message),
onError: (e) => console.error(e.message),
}}
onTransform={(result) => console.log(result)}
onError={(errors) => console.log(`${errors.length} errors`)}
onRowClick={(idx, row) => scrollToRow(idx)}
/>
Component Features
- Progress bar: Visual progress with stage and percent
- Operations list: Shows all transform operations with descriptions
- Summary: Input/output row counts, filtered count, duration
- Before/After Preview: Cell-level diff showing what changed
- Error table: Per-row errors with row index and message
- Worker pooling: Reuses workers across component instances
TransformBuilder
Build transforms declaratively with the fluent builder API:
import {
TransformBuilder,
col, lit, upper, round, add, gt, and
} from "@rowops/transform";
const transform = new TransformBuilder()
.cast("price", "number")
.cast("quantity", "number")
.derive("name_upper", upper(col("name")))
.derive("total", mul(col("price"), col("quantity")))
.filter(gt(col("quantity"), lit(0)))
.rename("price", "unit_price")
.build();
Transform Operations
Cast
Convert a column to a different type.
builder.cast("age", "number")
builder.cast("active", "boolean")
builder.cast("created", "date")
Target Types: string, number, boolean, date, null
Rename
Rename a column.
builder.rename("old_name", "new_name")
Derive
Create a new column from an expression.
builder.derive("full_name", concat(col("first"), lit(" "), col("last")))
builder.derive("price_rounded", round(col("price"), 2))
Filter
Filter rows based on a predicate expression.
builder.filter(gt(col("quantity"), lit(0)))
builder.filter(and(isNull(col("deleted")), eq(col("status"), lit("active"))))
Lookup
Lookup values from a registered lookup table.
builder.lookup("country_code", tableId, "null")
OnMissing options: "null", "error", "keep"
Conditional
Apply different operations based on a condition.
builder.conditional(
gt(col("amount"), lit(1000)),
[{ kind: "derive", target: "tier", expr: lit("premium") }],
[{ kind: "derive", target: "tier", expr: lit("standard") }]
)
Expression Functions
Literals and References
| Function | Description | Example |
|---|---|---|
lit(value) | Literal value | lit(42), lit("hello") |
col(name) | Column reference | col("price") |
colIndex(n) | Column by index | colIndex(0) |
Arithmetic
| Function | Description | Example |
|---|---|---|
add(a, b) | Addition | add(col("a"), col("b")) |
sub(a, b) | Subtraction | sub(col("total"), col("discount")) |
mul(a, b) | Multiplication | mul(col("price"), col("qty")) |
div(a, b) | Division | div(col("amount"), lit(100)) |
mod(a, b) | Modulo | mod(col("index"), lit(2)) |
neg(a) | Negation | neg(col("value")) |
Comparison
| Function | Description | Example |
|---|---|---|
eq(a, b) | Equal | eq(col("status"), lit("active")) |
ne(a, b) | Not equal | ne(col("type"), lit("deleted")) |
lt(a, b) | Less than | lt(col("age"), lit(18)) |
le(a, b) | Less than or equal | le(col("score"), lit(100)) |
gt(a, b) | Greater than | gt(col("price"), lit(0)) |
ge(a, b) | Greater than or equal | ge(col("quantity"), lit(1)) |
Logical
| Function | Description | Example |
|---|---|---|
and(a, b) | Logical AND | and(col("active"), col("verified")) |
or(a, b) | Logical OR | or(col("admin"), col("moderator")) |
not(a) | Logical NOT | not(col("deleted")) |
Null Handling
| Function | Description | Example |
|---|---|---|
isNull(a) | Check if null | isNull(col("email")) |
coalesce(a, b) | First non-null | coalesce(col("nickname"), col("name")) |
String Functions
| Function | Description | Example |
|---|---|---|
upper(a) | Uppercase | upper(col("name")) |
lower(a) | Lowercase | lower(col("email")) |
trim(a) | Trim whitespace | trim(col("input")) |
concat(a, b) | Concatenate | concat(col("first"), col("last")) |
substr(a, start, len?) | Substring | substr(col("code"), 0, 3) |
normalizeWhitespace(a) | Collapse multiple spaces | normalizeWhitespace(col("text")) |
Numeric Functions
| Function | Description | Example |
|---|---|---|
round(a, decimals?) | Round to decimals | round(col("price"), 2) |
floor(a) | Round down | floor(col("rating")) |
ceil(a) | Round up | ceil(col("quantity")) |
abs(a) | Absolute value | abs(col("delta")) |
Date Functions
| Function | Description | Example |
|---|---|---|
parseDate(a, format?) | Parse string to date | parseDate(col("date_str")) |
formatDate(a, format) | Format date to string | formatDate(col("created"), "%Y-%m-%d") |
addDays(date, days) | Add days to date | addDays(col("start"), lit(30)) |
dateDiff(a, b) | Difference in days | dateDiff(col("end"), col("start")) |
now() | Current timestamp | now() |
Date formats: Uses strftime-style format strings (e.g., %Y-%m-%d, %H:%M:%S)
Array Functions
| Function | Description | Example |
|---|---|---|
split(a, delimiter) | Split string to array | split(col("tags"), ",") |
join(a, delimiter) | Join array to string | join(col("items"), "; ") |
Type Conversion
| Function | Description | Example |
|---|---|---|
castToString(a) | Convert to string | castToString(col("id")) |
castToNumber(a) | Convert to number | castToNumber(col("amount")) |
castToBool(a) | Convert to boolean | castToBool(col("active")) |
Worker Protocol
The transform worker uses these message types:
| Message | Description |
|---|---|
TRANSFORM_INIT | Compile DSL and create transform plan |
TRANSFORM_RUN_CHUNK_ARROW | Execute plan on Arrow IPC data |
TRANSFORM_REGISTER_LOOKUP | Register a lookup table |
TRANSFORM_DISPOSE | Free plan and cleanup |
Hook Events
| Event | Description |
|---|---|
TRANSFORM_HOOK_START | Transform beginning |
TRANSFORM_HOOK_PROGRESS | Progress update with stage and percent |
TRANSFORM_HOOK_STEP_START | Individual operation starting |
TRANSFORM_HOOK_STEP_COMPLETE | Individual operation completed |
TRANSFORM_HOOK_WARNING | Non-fatal warning |
TRANSFORM_HOOK_COMPLETE | Transform finished successfully |
TRANSFORM_HOOK_ERROR | Fatal error occurred |
Tier Restrictions
| Feature | Free | Pro | Scale | Enterprise |
|---|---|---|---|---|
| Basic operations | Yes | Yes | Yes | Yes |
| Derive expressions | Yes | Yes | Yes | Yes |
| Conditionals | No | Yes | Yes | Yes |
| Row filtering | No | Yes | Yes | Yes |
| Lookup tables | No | No | Yes | Yes |
| Max operations | 10 | 20 | 50 | Unlimited |
Constraints
Row Count Invariants
- Non-filter transforms must preserve row count exactly
- Filter transforms may reduce row count (tracked via
filteredCount) - Violations trigger
TRANSFORM ROW COUNT MISMATCHerror
Limits
| Limit | Value |
|---|---|
| Max operations per transform | 65,535 |
| Max string length | 65,535 bytes |
| Max expression depth | 100 |
Fail-Closed Behavior
Transform failures throw TransformCompileError or TransformFailedError and halt the pipeline. No partial results are produced.
Failure Modes
| Failure | Behavior |
|---|---|
| Invalid expression | TransformCompileError thrown |
| Type coercion failure | Row error recorded |
| Division by zero | Row error recorded |
| Date parse failure | Row error recorded |
| Tier violation | Error thrown before execution |
Implementation Status
| Feature | Status |
|---|---|
| WASM transform engine | Implemented |
| Arrow IPC protocol | Implemented |
| Transform hooks | Implemented |
| Hero layer API | Implemented |
| React component | Implemented |
| Worker pooling | Implemented |
| Before/after preview | Implemented |