Skip to content

Models, cache, GGUF, report

steerkit.models.load(model_id, device=None, dtype=None, *, encoder=None)

Load a model into a ModelHandle.

encoder=None (default) auto-detects: model ids containing 'bert', 'roberta', 'deberta', 'electra', or 'albert' are loaded via HookedEncoder; everything else goes through HookedTransformer. Pass encoder=True / encoder=False to override.

Devices: MPS on Apple Silicon, CUDA otherwise, else CPU.

steerkit.models.ModelHandle

Thin wrapper over a TransformerLens HookedTransformer / HookedEncoder carrying tokenizer + config conveniences. Both backends share the run_with_cache and hooks(...) interfaces this library uses.

is_encoder property

True if backed by HookedEncoder (BERT-style; bidirectional, no autoregressive generate). Use pooling="mean" for these and the encoder-side prediction APIs (probe.predict_at_mask).

format_chat(prompt, response=None)

Format the prompt (and optional response) as a token tensor on the model's device.

Uses the tokenizer's chat template when available; otherwise concatenates plain text. Returns a 2-D tensor of token ids ([1, seq_len]).

Note: some tokenizers (Qwen2/Qwen3) return a BatchEncoding from apply_chat_template even with return_tensors="pt"; we unwrap to input_ids.

steerkit.cache.save_activations_zarr(activations, path, *, metadata)

Atomically write a {layer: [n_pairs, 2, d_model]} dict to a Zarr v3 store.

The activations are stacked along a new third axis so the on-disk array has shape [n_pairs, 2, n_sites, d_model]; the layer indices are saved as the layer_indices attribute (in the same order as the third axis).

steerkit.cache.load_activations_zarr(path)

Load a Zarr cache produced by save_activations_zarr.

Returns (activations dict keyed by layer index, metadata dict).

steerkit.cache.hash_pairs(pairs)

Deterministic 16-hex-char hash of a list of contrast pairs (order matters).

steerkit.gguf_export.export_probe_to_gguf(probe, path, *, method=None, scale=1.0)

Write a single-layer Probe as a GGUF control vector. Returns the output path.

method selects which probe-family direction to export (defaults to probe.default_method). scale is multiplied into the direction before writing — useful when you want the gguf-saved magnitude to differ from the unit-normalized direction steerkit stores internally (e.g. bake in the calibrated auto_alpha).

steerkit.gguf_export.export_composite_to_gguf(composite, path, *, method=None, scale=1.0)

Write a CompositeProbe as a GGUF control vector with one entry per layer.

Probes targeting the same layer are summed into a single direction first (consistent with CompositeProbe.steer's same-hook folding). Each constituent probe's weight is multiplied into its direction; scale is applied uniformly on top of that.

steerkit.report.render_probe_report(probe, *, model=None, activations=None, per_layer=None, title=None)

Build the HTML for a single-probe report. Returns the raw HTML string.

All inputs except probe are optional; the relevant plot is included only when its inputs are supplied. * per_layer: full per-layer fits → layer-selection curve * activations: a [n_pairs, 2, d_model] tensor (typically activations[probe.layer]) → PCA projection * model: a loaded ModelHandle → logit-lens

steerkit.report.render_group_report(fit, *, model=None, activations_by_concept=None, title=None)

Build the HTML for a GroupFit. Includes per-concept layer curves and the cross-concept similarity heatmap; logit-lens for the best probe of each concept if a model is provided.