Skip to content

Query cookbook

Copy-paste query patterns for common tasks. All examples assume a dataset named temperature in data.tet unless noted.

Query documents are flat — use "mean": [] / mean = [], not nested "operation" objects. Every example below shows JSON and TOML; both compile to the same wire document.

See the Query engine overview for architecture and limits.

Scalar reductions

Mean:

json
{ "dataset": "temperature", "mean": [] }
toml
dataset = "temperature"
mean = []
bash
tet query mean.toml -t data.tet -x -q
# dataset=temperature status=ok op=mean mean=3.5

Sum, min, max, count — swap the operation key:

json
{ "dataset": "temperature", "sum": [] }
toml
dataset = "temperature"
sum = []
OpJSONTOML
min"min": []min = []
max"max": []max = []
count"count": []count = []

Variance and standard deviation:

json
{ "dataset": "a", "var": [] }
toml
dataset = "a"
var = []
json
{ "dataset": "a", "std": [] }
toml
dataset = "a"
std = []

Product and norms:

json
{ "dataset": "a", "product": [] }
{ "dataset": "a", "norm_l1": [] }
{ "dataset": "a", "norm_l2": [] }
toml
dataset = "a"
product = []

# Separate files, or swap the key:
# norm_l1 = []
# norm_l2 = []

NaN-skipping mean/std:

json
{ "dataset": "temperature", "nan_mean": [] }
toml
dataset = "temperature"
nan_mean = []
json
{ "dataset": "temperature", "nan_std": [] }
toml
dataset = "temperature"
nan_std = []

Partial-axis reductions

Sum along axis 0:

json
{ "dataset": "a", "sum": 0 }
toml
dataset = "a"
sum = 0

Reduce over multiple axes:

json
{ "dataset": "a", "sum": [0, 1] }
toml
dataset = "a"
sum = [0, 1]

Mean along a named axis (requires footer dim_names):

json
{ "dataset": "temperature", "mean": "time" }
toml
dataset = "temperature"
mean = "time"
bash
tet info data.tet --metadata   # check dim_names

Selections and previews

Full 2×3 slice preview:

json
{
  "dataset": "temperature",
  "selection": { "start": [0, 0], "stop": [2, 3] }
}
toml
dataset = "temperature"

[selection]
start = [0, 0]
stop = [2, 3]
bash
tet query slice.toml -t data.tet -x --format table --preview 6

2×2 sub-slice:

json
{
  "dataset": "temperature",
  "selection": { "start": [0, 0], "stop": [2, 2] }
}
toml
dataset = "temperature"

[selection]
start = [0, 0]
stop = [2, 2]

Strided selection:

json
{
  "dataset": "temperature",
  "selection": [
    { "start": 0, "stop": 100, "step": 2 },
    { "start": 0, "stop": 48 }
  ],
  "mean": []
}
toml
dataset = "temperature"
mean = []

[[selection]]
start = 0
stop = 100
step = 2

[[selection]]
start = 0
stop = 48

Slice by coordinate label (requires footer coords):

json
{
  "dataset": "temperature",
  "selection": [
    { "start_label": "2024-01-01", "stop_label": "2024-01-03" },
    { "start": 0, "stop": 2 }
  ],
  "mean": []
}
toml
dataset = "temperature"
mean = []

[[selection]]
start_label = "2024-01-01"
stop_label = "2024-01-03"

[[selection]]
start = 0
stop = 2

QC counts

Scalar QC (swap the key for each op):

json
{ "dataset": "temperature", "nan_count": [] }
toml
dataset = "temperature"
nan_count = []

Also: null_count, inf_count, any_inf, any_nan, all_finite — same shape with = [].

Null count with explicit fill value:

json
{
  "dataset": "temperature",
  "null_count": { "fill": -999, "axis": 0 }
}
toml
dataset = "temperature"

[null_count]
fill = -999
axis = 0

Partial-axis QC:

json
{ "dataset": "temperature", "nan_count": 0 }
toml
dataset = "temperature"
nan_count = 0

Index operations

json
{ "dataset": "temperature", "arg_min": [] }
toml
dataset = "temperature"
arg_min = []
json
{ "dataset": "temperature", "arg_max": 0 }
toml
dataset = "temperature"
arg_max = 0

Quantiles and histograms

Median:

json
{ "dataset": "a", "median": [] }
toml
dataset = "a"
median = []
json
{ "dataset": "a", "median": 0 }
toml
dataset = "a"
median = 0

Quantile on axis 0:

json
{ "dataset": "a", "quantile": { "q": 0.95, "axis": 0 } }
toml
dataset = "a"

[quantile]
q = 0.95
axis = 0

Histogram with auto edges from data min/max:

json
{ "dataset": "a", "histogram": { "bins": 10, "axis": 0 } }
toml
dataset = "a"

[histogram]
bins = 10
axis = 0

Histogram with fixed range:

json
{ "dataset": "a", "histogram": { "bins": 10, "min": 0, "max": 1 } }
toml
dataset = "a"

[histogram]
bins = 10
min = 0
max = 1

Covariance and correlation

For 2-D datasets where one axis is observations and the other is variables:

json
{ "dataset": "variables", "covariance": { "axis": 0 } }
toml
dataset = "variables"

[covariance]
axis = 0
json
{ "dataset": "variables", "correlation": 0 }
toml
dataset = "variables"
correlation = 0

Transforms

Z-score with sidecar output (stable filename):

json
{
  "dataset": "temperature",
  "transform": { "method": "zscore" },
  "write": { "target": "sidecar", "timestamp": false }
}
toml
dataset = "temperature"

[transform]
method = "zscore"

[write]
target = "sidecar"
timestamp = false
bash
tet query zscore_sidecar.toml -t data.tet -x -q
# Publishes temperature.zscore.tet beside data.tet

Other methods: minmax, l1, l2, center, scale, log1p, sqrt, softmax.

Transform along an axis:

json
{
  "dataset": "temperature",
  "transform": { "method": "softmax", "axis": 0 },
  "write": { "target": "ram" }
}
toml
dataset = "temperature"

[transform]
method = "softmax"
axis = 0

[write]
target = "ram"

Write to spill file when selection is large:

json
{
  "dataset": "temperature",
  "transform": { "method": "zscore" },
  "write": { "target": "spill", "path": "normalized.bin" }
}
toml
dataset = "temperature"

[transform]
method = "zscore"

[write]
target = "spill"
path = "normalized.bin"

Spill export (full tensor slice)

Export logical selection to binary without a reduction:

json
{
  "dataset": "temperature",
  "selection": { "start": [0, 0], "stop": [100, 48] },
  "spill": "slice.bin"
}
toml
dataset = "temperature"
spill = "slice.bin"

[selection]
start = [0, 0]
stop = [100, 48]

Relative path resolves beside data.tet. Use --spill-allow for extra roots.

Memory and execution hints

Raise RAM budget for tier-C ops on large selections:

json
{
  "dataset": "temperature",
  "median": [],
  "execution": { "memory_budget_percent": 50 }
}
toml
dataset = "temperature"
median = []

[execution]
memory_budget_percent = 50

Force sequential chunk I/O:

json
{
  "dataset": "temperature",
  "mean": [],
  "execution": { "fold_parallel": false }
}
toml
dataset = "temperature"
mean = []

[execution]
fold_parallel = false

Experimental GPU (requires built features):

json
{
  "dataset": "temperature",
  "mean": [],
  "execution": { "device": "auto" }
}
toml
dataset = "temperature"
mean = []

[execution]
device = "auto"
bash
tet query gpu.toml -t data.tet -x --device auto -q
# --device CLI flag overrides execution.device when both are set

Output format cheat sheet

Works with either .json or .toml query files:

bash
tet query q.toml -t data.tet              # plan only, full JSON
tet query q.json -t data.tet -x -q       # execute, one-line output
tet query q.toml -t data.tet -x --format stats
tet query q.toml -t data.tet -x --format table --preview 6
tet query q.toml -t data.tet --format plan

Re-run from history

bash
tet qhist list --dataset temperature
tet qhist run 1    # re-run newest matching query

Fixture queries

The tetration repo ships paired .json / .toml fixtures in fixtures/queries/:

FixtureNotes
mean_temperatureScalar mean
mean_strided_temperatureMean + strided selection
slice_full_temperatureFull 2×3 preview
slice_2x2_temperature2×2 sub-slice
sum_a / sum_axis0_aMulti-chunk u8 dataset
var_aScalar variance
quantile_axis0_aQuantile on axis 0

Further reading

Latka Industries