Skip to content

Catalog & datasets

When dataset_count > 0, bytes starting at offset 32 contain the dataset directory.

Directory header

OffsetSizeField
328dataset_blob_len — length of concatenated dataset records
40*dataset_records — exactly dataset_count records in order

Dataset record

Each record is variable length:

FieldTypeNotes
name_lenu32 LEByte length of UTF-8 name
dtypeu32 LEWire tag 1–10 (see Overview)
ndimu32 LERank in 1 … 8
reservedu32 LEWrite 0
name[u8]UTF-8 of length name_len
padding0–7 BZero bytes so record length to shape[0] is 8-aligned
shapendim × u64 LEGlobal array shape
chunk_shapendim × u64 LEChunk size along each axis (must tile shape)

Records are concatenated in catalog order. The dataset_id in the chunk index is the 0-based index into this list.

Axis metadata

v1 dataset records carry shape and chunk_shape only. Richer metadata lives in the optional THST footer under metadata.datasets[<name>]:

json
{
  "datasets": {
    "temperature": {
      "dim_names": ["time", "station"],
      "coords": {
        "time": { "labels": ["2024-01-01", "2024-01-02"] },
        "station": { "labels": ["A", "B"] }
      },
      "attrs": { "units": "K", "long_name": "surface temperature" }
    }
  }
}

Dimension names vs coordinate labels

LayerWhat it namesCountExample
Dimension namesEach axisndim stringstime, lat, lon
Coordinate labelsEach position along one axisshape[d] valuestimestamps, station codes
  • Dimension name — enables "mean": "time" in query JSON/TOML instead of "mean": 0
  • Coordinate label — enables slice/filter by value (start_label / stop_label in selections)

Analogues: NetCDF dimension name vs coordinate variable; pandas Index.name vs Index values; xarray dims vs coords.

Inspecting the catalog

bash
tet info file.tet                  # dataset table (default)
tet info file.tet --metadata       # footer dim_names / coord previews
tet info file.tet --json           # full dump

In Rust, use read_tet_summary_v1 on a mmap'd byte slice. See Open & inspect.

Latka Industries