Models, cache, GGUF, report¶
steerkit.models.load(model_id, device=None, dtype=None, *, encoder=None)
¶
Load a model into a ModelHandle.
encoder=None (default) auto-detects: model ids containing 'bert',
'roberta', 'deberta', 'electra', or 'albert' are loaded via
HookedEncoder; everything else goes through HookedTransformer.
Pass encoder=True / encoder=False to override.
Devices: MPS on Apple Silicon, CUDA otherwise, else CPU.
steerkit.models.ModelHandle
¶
Thin wrapper over a TransformerLens HookedTransformer / HookedEncoder
carrying tokenizer + config conveniences. Both backends share the
run_with_cache and hooks(...) interfaces this library uses.
is_encoder
property
¶
True if backed by HookedEncoder (BERT-style; bidirectional, no
autoregressive generate). Use pooling="mean" for these and the
encoder-side prediction APIs (probe.predict_at_mask).
format_chat(prompt, response=None)
¶
Format the prompt (and optional response) as a token tensor on the model's device.
Uses the tokenizer's chat template when available; otherwise concatenates plain text.
Returns a 2-D tensor of token ids ([1, seq_len]).
Note: some tokenizers (Qwen2/Qwen3) return a BatchEncoding from
apply_chat_template even with return_tensors="pt"; we unwrap to input_ids.
steerkit.cache.save_activations_zarr(activations, path, *, metadata)
¶
Atomically write a {layer: [n_pairs, 2, d_model]} dict to a Zarr v3 store.
The activations are stacked along a new third axis so the on-disk array has
shape [n_pairs, 2, n_sites, d_model]; the layer indices are saved as the
layer_indices attribute (in the same order as the third axis).
steerkit.cache.load_activations_zarr(path)
¶
Load a Zarr cache produced by save_activations_zarr.
Returns (activations dict keyed by layer index, metadata dict).
steerkit.cache.hash_pairs(pairs)
¶
Deterministic 16-hex-char hash of a list of contrast pairs (order matters).
steerkit.gguf_export.export_probe_to_gguf(probe, path, *, method=None, scale=1.0)
¶
Write a single-layer Probe as a GGUF control vector. Returns the output path.
method selects which probe-family direction to export (defaults to
probe.default_method). scale is multiplied into the direction before
writing — useful when you want the gguf-saved magnitude to differ from
the unit-normalized direction steerkit stores internally (e.g. bake in
the calibrated auto_alpha).
steerkit.gguf_export.export_composite_to_gguf(composite, path, *, method=None, scale=1.0)
¶
Write a CompositeProbe as a GGUF control vector with one entry per layer.
Probes targeting the same layer are summed into a single direction first
(consistent with CompositeProbe.steer's same-hook folding). Each
constituent probe's weight is multiplied into its direction; scale
is applied uniformly on top of that.
steerkit.report.render_probe_report(probe, *, model=None, activations=None, per_layer=None, title=None)
¶
Build the HTML for a single-probe report. Returns the raw HTML string.
All inputs except probe are optional; the relevant plot is included only
when its inputs are supplied.
* per_layer: full per-layer fits → layer-selection curve
* activations: a [n_pairs, 2, d_model] tensor (typically activations[probe.layer]) → PCA projection
* model: a loaded ModelHandle → logit-lens
steerkit.report.render_group_report(fit, *, model=None, activations_by_concept=None, title=None)
¶
Build the HTML for a GroupFit. Includes per-concept layer curves and the
cross-concept similarity heatmap; logit-lens for the best probe of each concept
if a model is provided.