Visualization¶
steerkit.viz.plot_layer_selection(probes, *, by_classifier='auc_test_logistic', by_steering='steering_effect', x_axis='layer', title=None)
¶
Dual-curve plot of probe-classifier metric and (if available) LLM-judge steering effect as a function of layer depth.
The classifier curve is always drawn (left y-axis); the steering curve is drawn on the
right y-axis only if at least one probe has the by_steering metric attached.
steerkit.viz.plot_activation_projection(activations, *, method='pca', title=None, pos_label='concept', neg_label='neutral')
¶
2D projection of a [n_pairs, 2, d_model] activations tensor, colored by class.
The second axis is the contrast pair: index 0 is the positive (concept-bearing) response and index 1 is the negative (neutral) response. PCA only for now; UMAP can be added as an optional extra later.
steerkit.viz.plot_alpha_curve(ratios, *, ratio_max=1.5, chosen_alpha=None, title=None)
¶
Plot α vs perplexity ratio from calibrate_alpha's output.
A horizontal line at ratio_max shows the coherence ceiling; the chosen α
(if provided) is annotated with a vertical marker. Intent is to make the
auto-α decision transparent: which α values stayed under the ceiling, and
which one was picked.
steerkit.viz.plot_logit_lens(probe, model, *, top_k=20, method=None, title=None)
¶
Push the steering direction through the model's unembedding to get vocab logits, and render the top-K tokens as a horizontal bar chart.
A high-quality steering direction for "joy" should produce top tokens like "happy", "joyful", "delighted"; if the top tokens look unrelated, the probe is likely broken — this plot is the cheapest interpretability sanity check.
steerkit.viz.plot_similarity_heatmap(source, *, method=None, title=None, cmap='RdBu_r')
¶
Cosine-similarity heatmap between class direction vectors.
Accepts either
- a
MultinomialProbewhoseweightsrows are per-class directions - a
dict[name -> Probe]whose entries each carry a binary direction (typicallyGroupFit.best).
A diagonal of 1.0 is expected; off-diagonals at ~0 indicate orthogonal concepts; off-diagonals near ±1 indicate redundancy (e.g., joy ≈ −sadness).
steerkit.viz.plot_cross_model_overlay(probes_per_model, *, by='auc_test_logistic', title=None, mark_best=True)
¶
Overlay layer-selection curves from multiple models on a normalized-depth x-axis.
Useful for comparing where the same concept is most cleanly classified across
models — a methodology-comparison plot, not a steering-vector-transfer claim.
Each entry of probes_per_model is model_label -> dict[int, Probe] (e.g. the
per-layer fits returned by Probe.fit_all). Curves are aligned via
Probe.normalized_depth so models with different layer counts can be compared
visually.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
probes_per_model
|
dict[str, dict[int, Probe]]
|
Mapping from model label to per-layer |
required |
by
|
str
|
Metric to plot, for example |
'auc_test_logistic'
|
title
|
str | None
|
Optional figure title. |
None
|
mark_best
|
bool
|
If True, mark each model's best layer with a larger hollow dot. |
True
|
steerkit.viz.plot_token_scores(scores, *, title=None, figsize=None, color_pos='tab:red', color_neg='tab:blue', mark_response_start=True)
¶
Render per-token probe scores as a horizontal bar chart.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scores
|
TokenScores
|
a |
required |
title
|
str | None
|
optional figure title; defaults to one mentioning the layer + method. |
None
|
figsize
|
tuple[float, float] | None
|
matplotlib figure size; defaults scale with the number of tokens. |
None
|
color_pos
|
str
|
bar color for positive scores (concept-active positions). |
'tab:red'
|
color_neg
|
str
|
bar color for negative scores. |
'tab:blue'
|
mark_response_start
|
bool
|
when True and |
True
|
Returns the Figure (no plt.show() / plt.close() — caller decides).