Skip to content

Query document

Query documents are flat JSON or TOML profiles. The wire format accepts exactly one top-level operation key per document. Nested "operation": { … } and "output": { … } objects are rejected at parse time.

TOML is converted to a JSON value tree, then the same wire parser runs — both formats are first-class.

Required field

FieldJSONTOML
dataset"dataset": "temperature"dataset = "temperature"

Must name a dataset in the .tet catalog; non-empty.

Operation keys (pick one)

Use any single key from the operations reference.

MeaningJSONTOML
Scalar mean"mean": []mean = []
Mean along axis 0"mean": 0mean = 0
Partial over axes 0 and 1"sum": [0, 1]sum = [0, 1]
Mean along named axis"mean": "time"mean = "time"
95th percentile on axis 0"quantile": { "q": 0.95, "axis": 0 }[quantile] / q = 0.95 / axis = 0
Histogram"histogram": { "bins": 10, "axis": 0 }[histogram] / bins = 10 / axis = 0
Transform"transform": { "method": "zscore" }[transform] / method = "zscore"
Spill export"spill": "slice.bin"spill = "slice.bin"

Parametric ops (quantile, histogram, transform, null_count, …) use inline JSON objects or TOML inline/subtables.

Selection

Optional half-open box per axis. Omitted selection means the full dataset shape.

json
{
  "dataset": "temperature",
  "selection": [
    { "start": 0, "stop": 100, "step": 2 },
    { "start": 0, "stop": 3 }
  ],
  "mean": []
}
toml
dataset = "temperature"
mean = []

[[selection]]
start = 0
stop = 100
step = 2

[[selection]]
start = 0
stop = 3
FieldNotes
startInclusive lower bound (default 0)
stopExclusive upper bound (default axis length)
stepStride; must be non-zero (default 1)
start_label / stop_labelResolve to indices via footer coords

Shorthand box form:

json
{
  "dataset": "temperature",
  "selection": { "start": [0, 0], "stop": [2, 3] }
}
toml
dataset = "temperature"

[selection]
start = [0, 0]
stop = [2, 3]

See Selections & metadata for dimension names, coordinate labels, and striding behavior.

Execution hints

Optional "execution" object controls memory and I/O:

json
{
  "dataset": "temperature",
  "mean": [],
  "execution": {
    "memory_budget_percent": 40,
    "fold_parallel": false,
    "device": "auto"
  }
}
toml
dataset = "temperature"
mean = []

[execution]
memory_budget_percent = 40
fold_parallel = false
device = "auto"
FieldEffect
memory_budget_bytesHard cap on anonymous RAM for dense paths (highest precedence)
memory_budget_percentPercent of host RAM (100 = 100%; stored as basis points internally)
fold_parallelfalse forces sequential chunk visits; does not enable linear scan
devicecpu, auto, metal, cuda, cuda:N, rocm, cuda:multi, … (needs -x)

Precedence for memory budget: query memory_budget_bytes → per-file TIDX header → query memory_budget_percent → TIDX percent → 256 MiB fallback.

CLI --device overrides execution.device. CLI -t supplies the .tet path — there is no .tet path field inside the query document.

Transform output routing

When using "transform", optional "write" controls where pass-2 output goes:

json
{
  "dataset": "temperature",
  "transform": { "method": "zscore", "axis": 0 },
  "write": { "target": "sidecar", "timestamp": false }
}
toml
dataset = "temperature"

[transform]
method = "zscore"
axis = 0

[write]
target = "sidecar"
timestamp = false
TargetBehavior
switch (default)RAM when selection fits budget; spill otherwise
ramDense buffer in memory
spillWrite to caller path (write.path or top-level spill semantics)
sidecarPublish a one-chunk .tet beside the source file

Sidecar auto filename: {stem}.{method}[.{timestamp}].tet. Catalog dataset name: {source}-{method} (e.g. temperature-zscore).

write is mutually exclusive with top-level "spill".

Spill export (no operation)

Export the full logical selection without a reduction key:

json
{
  "dataset": "temperature",
  "spill": "slice.bin"
}
toml
dataset = "temperature"
spill = "slice.bin"

Relative paths resolve against the .tet file's directory, not the shell cwd. Paths must lie under the spill allowlist (see Execution strategies).

Input format detection

SourceDetection
File .json / .tomlExtension
Inline or stdinLeading { → JSON; otherwise TOML
- or omitted QUERYRead stdin
bash
# TOML file
tet query mean.toml -t data.tet -x -q

# Inline JSON
tet query '{"dataset":"temperature","mean":[]}' -t data.tet -x -q

# Inline TOML (no leading brace)
tet query 'dataset = "temperature"
mean = []' -t data.tet -x -q

# Stdin TOML
echo 'dataset = "temperature"
mean = []' | tet query - -t data.tet -x -q

Validation limits

Untrusted input is bounded:

  • Payload size ≤ 1 MiB
  • JSON nesting depth ≤ 64
  • Unknown top-level fields rejected (deny_unknown_fields)
  • selection rank and operation.axes count ≤ 8 (MAX_NDIM)
  • Axis tokens: non-negative decimal indices or dimension names — not arbitrary expressions

Treat echoed dataset names and axis labels as untrusted display data in logs and UIs.

TOML patterns for nested wire shapes

Wire shapeTOML
Scalar opmean = []
Single axis indexmean = 0
Multiple axessum = [0, 1]
Named axismean = "time"
Parametric op[quantile] subtable with fields
Per-axis selection[[selection]] array-of-tables
Box selection[selection] with start = […], stop = […]
Execution block[execution] subtable

Not supported in either format: nested "operation" / "output" objects.

Latka Industries