Skip to main content

Transform Module

The Transform module provides client-side ETL for normalization, enrichment, and reshaping of data. All transforms are compiled to bytecode and executed in Rust/WASM for maximum performance.


Purpose

Apply ETL transformations to row data, including:

  • Type coercion and casting
  • Value derivation with expressions
  • Column renaming
  • Row filtering
  • String, numeric, and date manipulations
  • Conditional transformations

Transforms execute after masking and before profiling in the pipeline.


Architecture

┌─────────────────────────────────────────────────────────────┐
│ Hero Layer (@rowops/transform) │
│ transformData() / transformArrowData() │
│ - JSON → Arrow IPC conversion │
│ - Worker lifecycle management │
│ - Hook event routing │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Worker Layer │
│ transform.worker.ts │
│ - TRANSFORM_INIT / TRANSFORM_RUN_CHUNK_ARROW protocol │
│ - Hook event emission │
│ - DSL compilation to bytecode │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ WASM Engine │
│ run_transform_arrow() │
│ - Bytecode VM execution │
│ - Arrow IPC in, Arrow IPC out │
└─────────────────────────────────────────────────────────────┘

All row-level operations execute in WASM. TypeScript handles only DSL construction and result handling.


Installation

npm install @rowops/transform
# For React integration:
npm install @rowops/transform-react

High-Level API

transformData()

Transform JSON rows using a TransformDSL. Handles Arrow IPC conversion internally.

import { transformData, TransformBuilder, col, upper } from '@rowops/transform';
import { resolveBrowserLicense } from "@rowops/import-core";

const transform = new TransformBuilder()
.derive("name_upper", upper(col("name")))
.cast("price", "number")
.build();

const { tierGateInit } = await resolveBrowserLicense({
projectId: "proj_xxx",
entitlementToken: "eyJ...",
});

const result = await transformData(
[
{ name: 'alice', price: '10.5' },
{ name: 'bob', price: '20.0' },
],
transform,
{ tierGate: tierGateInit },
{
onProgress: (p) => console.log(`${p.percent}% - ${p.stage}`),
onComplete: (r) => console.log(`${r.outputRows} rows transformed`),
}
);

console.log(result.rows);
// [{ name: 'alice', name_upper: 'ALICE', price: 10.5 }, ...]

transformArrowData()

Transform Arrow IPC data directly (skip JSON conversion overhead).

import { transformArrowData } from '@rowops/transform';
import { resolveBrowserLicense } from "@rowops/import-core";

const { tierGateInit } = await resolveBrowserLicense({
projectId: "proj_xxx",
entitlementToken: "eyJ...",
});

const result = await transformArrowData(
ipcBytes, // Uint8Array from previous pipeline stage
transform,
{ tierGate: tierGateInit },
{ onProgress: (p) => updateProgress(p.percent) }
);

Transform Hooks

Observe transform lifecycle with hooks:

interface TransformDataHooks {
onStart?: (meta: {
totalRows: number;
operationCount: number;
hasFilter: boolean;
hasLookup: boolean;
}) => void;

onProgress?: (meta: {
stage: 'compiling' | 'executing' | 'finishing';
percent: number;
rowsProcessed: number;
totalRows: number;
currentOperation: number;
totalOperations: number;
message?: string;
}) => void;

onStepStart?: (meta: {
stepIndex: number;
operationKind: string;
targetColumn?: string;
}) => void;

onStepComplete?: (meta: {
stepIndex: number;
operationKind: string;
durationMs: number;
rowsAffected: number;
}) => void;

onComplete?: (result: {
totalRows: number;
outputRows: number;
filteredCount: number;
errorCount: number;
warningCount: number;
durationMs: number;
operationsApplied: number;
}) => void;

onWarning?: (warning: { code: string; message: string; rowIndex?: number; column?: string }) => void;
onError?: (error: { code: string; message: string; stage?: string; stepIndex?: number }) => void;
}

React Integration

RowOpsTransform Component

import { RowOpsTransform } from '@rowops/transform-react';
import { TransformBuilder, col, upper } from '@rowops/transform';

const transform = new TransformBuilder()
.derive("name_upper", upper(col("name")))
.build();

<RowOpsTransform
rows={data}
transform={transform}
showSummary={true}
showPreview={true}
showOperations={true}
showBeforeAfter={true} // Show cell-level before/after comparison
maxPreviewRows={10}
hooks={{
onProgress: (p) => console.log(`${p.percent}% - ${p.stage}`),
onTransformComplete: (r) => console.log(`${r.outputRows} rows`),
onStepStart: (s) => console.log(`Step ${s.stepIndex}: ${s.operationKind}`),
onWarning: (w) => console.warn(w.message),
onError: (e) => console.error(e.message),
}}
onTransform={(result) => console.log(result)}
onError={(errors) => console.log(`${errors.length} errors`)}
onRowClick={(idx, row) => scrollToRow(idx)}
/>

Component Features

  • Progress bar: Visual progress with stage and percent
  • Operations list: Shows all transform operations with descriptions
  • Summary: Input/output row counts, filtered count, duration
  • Before/After Preview: Cell-level diff showing what changed
  • Error table: Per-row errors with row index and message
  • Worker pooling: Reuses workers across component instances

TransformBuilder

Build transforms declaratively with the fluent builder API:

import {
TransformBuilder,
col, lit, upper, round, add, gt, and
} from "@rowops/transform";

const transform = new TransformBuilder()
.cast("price", "number")
.cast("quantity", "number")
.derive("name_upper", upper(col("name")))
.derive("total", mul(col("price"), col("quantity")))
.filter(gt(col("quantity"), lit(0)))
.rename("price", "unit_price")
.build();

Transform Operations

Cast

Convert a column to a different type.

builder.cast("age", "number")
builder.cast("active", "boolean")
builder.cast("created", "date")

Target Types: string, number, boolean, date, null

Rename

Rename a column.

builder.rename("old_name", "new_name")

Derive

Create a new column from an expression.

builder.derive("full_name", concat(col("first"), lit(" "), col("last")))
builder.derive("price_rounded", round(col("price"), 2))

Filter

Filter rows based on a predicate expression.

builder.filter(gt(col("quantity"), lit(0)))
builder.filter(and(isNull(col("deleted")), eq(col("status"), lit("active"))))

Lookup

Lookup values from a registered lookup table.

builder.lookup("country_code", tableId, "null")

OnMissing options: "null", "error", "keep"

Conditional

Apply different operations based on a condition.

builder.conditional(
gt(col("amount"), lit(1000)),
[{ kind: "derive", target: "tier", expr: lit("premium") }],
[{ kind: "derive", target: "tier", expr: lit("standard") }]
)

Expression Functions

Literals and References

FunctionDescriptionExample
lit(value)Literal valuelit(42), lit("hello")
col(name)Column referencecol("price")
colIndex(n)Column by indexcolIndex(0)

Arithmetic

FunctionDescriptionExample
add(a, b)Additionadd(col("a"), col("b"))
sub(a, b)Subtractionsub(col("total"), col("discount"))
mul(a, b)Multiplicationmul(col("price"), col("qty"))
div(a, b)Divisiondiv(col("amount"), lit(100))
mod(a, b)Modulomod(col("index"), lit(2))
neg(a)Negationneg(col("value"))

Comparison

FunctionDescriptionExample
eq(a, b)Equaleq(col("status"), lit("active"))
ne(a, b)Not equalne(col("type"), lit("deleted"))
lt(a, b)Less thanlt(col("age"), lit(18))
le(a, b)Less than or equalle(col("score"), lit(100))
gt(a, b)Greater thangt(col("price"), lit(0))
ge(a, b)Greater than or equalge(col("quantity"), lit(1))

Logical

FunctionDescriptionExample
and(a, b)Logical ANDand(col("active"), col("verified"))
or(a, b)Logical ORor(col("admin"), col("moderator"))
not(a)Logical NOTnot(col("deleted"))

Null Handling

FunctionDescriptionExample
isNull(a)Check if nullisNull(col("email"))
coalesce(a, b)First non-nullcoalesce(col("nickname"), col("name"))

String Functions

FunctionDescriptionExample
upper(a)Uppercaseupper(col("name"))
lower(a)Lowercaselower(col("email"))
trim(a)Trim whitespacetrim(col("input"))
concat(a, b)Concatenateconcat(col("first"), col("last"))
substr(a, start, len?)Substringsubstr(col("code"), 0, 3)
normalizeWhitespace(a)Collapse multiple spacesnormalizeWhitespace(col("text"))

Numeric Functions

FunctionDescriptionExample
round(a, decimals?)Round to decimalsround(col("price"), 2)
floor(a)Round downfloor(col("rating"))
ceil(a)Round upceil(col("quantity"))
abs(a)Absolute valueabs(col("delta"))

Date Functions

FunctionDescriptionExample
parseDate(a, format?)Parse string to dateparseDate(col("date_str"))
formatDate(a, format)Format date to stringformatDate(col("created"), "%Y-%m-%d")
addDays(date, days)Add days to dateaddDays(col("start"), lit(30))
dateDiff(a, b)Difference in daysdateDiff(col("end"), col("start"))
now()Current timestampnow()

Date formats: Uses strftime-style format strings (e.g., %Y-%m-%d, %H:%M:%S)

Array Functions

FunctionDescriptionExample
split(a, delimiter)Split string to arraysplit(col("tags"), ",")
join(a, delimiter)Join array to stringjoin(col("items"), "; ")

Type Conversion

FunctionDescriptionExample
castToString(a)Convert to stringcastToString(col("id"))
castToNumber(a)Convert to numbercastToNumber(col("amount"))
castToBool(a)Convert to booleancastToBool(col("active"))

Worker Protocol

The transform worker uses these message types:

MessageDescription
TRANSFORM_INITCompile DSL and create transform plan
TRANSFORM_RUN_CHUNK_ARROWExecute plan on Arrow IPC data
TRANSFORM_REGISTER_LOOKUPRegister a lookup table
TRANSFORM_DISPOSEFree plan and cleanup

Hook Events

EventDescription
TRANSFORM_HOOK_STARTTransform beginning
TRANSFORM_HOOK_PROGRESSProgress update with stage and percent
TRANSFORM_HOOK_STEP_STARTIndividual operation starting
TRANSFORM_HOOK_STEP_COMPLETEIndividual operation completed
TRANSFORM_HOOK_WARNINGNon-fatal warning
TRANSFORM_HOOK_COMPLETETransform finished successfully
TRANSFORM_HOOK_ERRORFatal error occurred

Tier Restrictions

FeatureFreeProScaleEnterprise
Basic operationsYesYesYesYes
Derive expressionsYesYesYesYes
ConditionalsNoYesYesYes
Row filteringNoYesYesYes
Lookup tablesNoNoYesYes
Max operations102050Unlimited

Constraints

Row Count Invariants

  • Non-filter transforms must preserve row count exactly
  • Filter transforms may reduce row count (tracked via filteredCount)
  • Violations trigger TRANSFORM ROW COUNT MISMATCH error

Limits

LimitValue
Max operations per transform65,535
Max string length65,535 bytes
Max expression depth100

Fail-Closed Behavior

Transform failures throw TransformCompileError or TransformFailedError and halt the pipeline. No partial results are produced.


Failure Modes

FailureBehavior
Invalid expressionTransformCompileError thrown
Type coercion failureRow error recorded
Division by zeroRow error recorded
Date parse failureRow error recorded
Tier violationError thrown before execution

Implementation Status

FeatureStatus
WASM transform engineImplemented
Arrow IPC protocolImplemented
Transform hooksImplemented
Hero layer APIImplemented
React componentImplemented
Worker poolingImplemented
Before/after previewImplemented