NumPy interchange
tet-py materializes dense selections and transforms through the same three sinks as the tetration query engine: ram, spill, and sidecar. Decode stays in Rust; Python receives numpy.ndarray (copy path in 0.1.x).
For a small sample while running a reduction (not full materialization), use preview=N on mean / query_execute — QueryResult.preview returns a 1-D capped array (tet query -x --preview N).
Requires NumPy 2.x (installed with tet-py).
Read — ram
Materialize a full logical selection into RAM:
import tet
with tet.open("data.tet") as f:
arr = f.read_numpy("temperature")
arr = f.dataset("temperature").to_numpy(f)
# sub-selection (preferred for large tensors)
arr = f.read_numpy("a", selection=tet.selection_slices(tet.axis_slice(0, 100)))Integer dtypes (i32, u8, …) work where the engine materializes them. f16 / u32 / u64 export is not yet available (tetration#20).
No preflight on read_numpy
read_numpy decodes the full logical selection with no memory-budget check. For huge tensors, slice the selection or use read_spill.
Transform — ram / spill / sidecar
# ram — fails if logical selection exceeds memory_budget_bytes
arr = f.transform.to_numpy.zscore("a")
# spill — dense row-major LE bytes on disk
s = f.transform.to_spill.softmax("a", path="a_softmax.bin")
arr = s.to_numpy()
# sidecar — derived one-chunk .tet published beside the source
side = f.transform.to_sidecar.center("a", path="a.center.tet")
arr = side.to_numpy(f)Omit path on spill/sidecar for engine temp files under the platform cache allowlist. Relative paths resolve beside the source .tet.
Transform methods: zscore, minmax, l1, l2, center, scale, log1p, sqrt, softmax.
to_numpy transforms support f32 / f64 only.
Sidecar (transform → .tet)
A sidecar is a derived, one-chunk .tet file written next to the source file. The engine runs pass-1 fold stats, materializes the transform, and publishes a sealed .tet (dataset name is typically {source}-{method}, e.g. a-zscore).
result = f.transform.to_sidecar.zscore("a", path="large.a.zscore.tet")
assert result.memory_strategy == "transform_sidecar"
# Load as ndarray
arr = result.to_numpy(f)
# Or reopen the sidecar as a normal TetFile (mmap, query, read_numpy)
derived = result.open(f)
list(derived) # e.g. ['a-zscore']
arr = derived.read_numpy("a-zscore")CLI equivalent:
{
"dataset": "a",
"transform": { "method": "zscore" },
"write": { "target": "sidecar", "timestamp": false }
}Sidecar vs spill: spill is opaque row-major bytes (.bin); sidecar is a full .tet you can mmap, re-query, and pass to other tools. Sidecar vs ram: sidecar avoids holding the full array in Python RAM and produces a persistent file.
Sidecar vs TetWriter: both produce a .tet on disk. Sidecar is the query engine writing a transform result with preset parameters (path beside source, dataset {source}-{method}, inherited metadata, transform history). TetWriter / write_dataset is authoring — you supply the NumPy array, name, chunks, and attrs with no source file or transform. Under the hood tetration today uses separate catalog write paths; unification is tracked in Linear THI-61.
There is no read_sidecar top-level op — only transforms can target the sidecar sink. To export a raw selection without a transform, use read_spill or slice + read_numpy.
Read — spill
Top-level selection export (wire mmap_spill), not a transform:
s = f.read_spill("a", path="a_full.bin")
arr = s.to_numpy()Memory budget
The query engine resolves memory_budget_bytes the same way as tet query -x:
- Query
execution.memory_budget_bytes - Per-file chunk-index header
- Query
execution.memory_budget_percent - Per-file header percent (0 → default 25% of host RAM)
transform.to_numpy (write: ram)
If the logical selection exceeds the budget, the engine returns TetError — use to_spill instead:
arr = f.transform.to_numpy.zscore("a") # in-RAM
s = f.transform.to_spill.zscore("a", path="a_zscore.bin") # no RAM cap on output size
arr = s.to_numpy()Inspect resolved budget via f.execute(doc, raw=True) → execution.memory_budget_bytes.
Comparison
| API | Over budget |
|---|---|
mean, sum, streaming folds | Chunk streaming — no full dense buffer |
median, quantile, histogram | Temp spill materialize or refuse |
transform.to_numpy | Refuse — use to_spill |
read_numpy | No preflight — slice or read_spill |
Write
Row-major float32 / float64 NumPy arrays via TetWriterSession:
import numpy as np
import tet
arr = np.arange(6, dtype=np.float32).reshape(2, 3)
# one-shot
tet.write_dataset("out.tet", "temperature", arr, chunk_shape=(2, 3))
# session — multiple datasets, history, footer metadata
w = tet.TetWriter.create("out.tet")
w.push_history_event("write", "notebook")
w.write_dataset(
"temperature",
arr,
chunk_shape=(2, 3),
attrs={"units": "K"},
dim_names=("row", "col"),
coords={"row": ("r0", "r1")},
)
w.commit()
# append to existing file
w = tet.TetWriter.open_append("out.tet")
w.write_dataset("humidity", arr)
w.commit()On commit, footer tool is set to tet-py. Roundtrip: write_dataset → read_numpy → reductions.
Integer write dtypes beyond f32/f64 are planned (tet-py#8).
Sink summary
| Sink | Read | Transform | Write |
|---|---|---|---|
| ram | read_numpy, Dataset.to_numpy() | transform.to_numpy.* | — |
| spill | read_spill → .to_numpy() | transform.to_spill.* → .to_numpy() | — |
| sidecar | Reopen via SidecarTransformResult.open() or tet.open(path) | transform.to_sidecar.* → .to_numpy(f) | — (use TetWriter for author-written files) |
Planned
| Topic | Tracking |
|---|---|
| Zero-copy mmap → NumPy | tet-py#11 |
read_numpy preflight | tet-py#9 |
Integer write_dataset | tet-py#8 |
tet.convert (h5py, zarr, …) | tet-py#10 |
Use tet convert today for HDF5 / NetCDF / Zarr import when native libs are available.