Visualization¶

`steerkit.viz.plot_layer_selection(probes, *, by_classifier='auc_test_logistic', by_steering='steering_effect', x_axis='layer', title=None)` ¶

Dual-curve plot of probe-classifier metric and (if available) LLM-judge steering effect as a function of layer depth.

The classifier curve is always drawn (left y-axis); the steering curve is drawn on the right y-axis only if at least one probe has the by_steering metric attached.

`steerkit.viz.plot_activation_projection(activations, *, method='pca', title=None, pos_label='concept', neg_label='neutral')` ¶

2D projection of a [n_pairs, 2, d_model] activations tensor, colored by class.

The second axis is the contrast pair: index 0 is the positive (concept-bearing) response and index 1 is the negative (neutral) response. PCA only for now; UMAP can be added as an optional extra later.

`steerkit.viz.plot_alpha_curve(ratios, *, ratio_max=1.5, chosen_alpha=None, title=None)` ¶

Plot α vs perplexity ratio from calibrate_alpha's output.

A horizontal line at ratio_max shows the coherence ceiling; the chosen α (if provided) is annotated with a vertical marker. Intent is to make the auto-α decision transparent: which α values stayed under the ceiling, and which one was picked.

`steerkit.viz.plot_logit_lens(probe, model, *, top_k=20, method=None, title=None)` ¶

Push the steering direction through the model's unembedding to get vocab logits, and render the top-K tokens as a horizontal bar chart.

A high-quality steering direction for "joy" should produce top tokens like "happy", "joyful", "delighted"; if the top tokens look unrelated, the probe is likely broken — this plot is the cheapest interpretability sanity check.

`steerkit.viz.plot_similarity_heatmap(source, *, method=None, title=None, cmap='RdBu_r')` ¶

Cosine-similarity heatmap between class direction vectors.

Accepts either

a MultinomialProbe whose weights rows are per-class directions
a dict[name -> Probe] whose entries each carry a binary direction (typically GroupFit.best).

A diagonal of 1.0 is expected; off-diagonals at ~0 indicate orthogonal concepts; off-diagonals near ±1 indicate redundancy (e.g., joy ≈ −sadness).

`steerkit.viz.plot_cross_model_overlay(probes_per_model, *, by='auc_test_logistic', title=None, mark_best=True)` ¶

Overlay layer-selection curves from multiple models on a normalized-depth x-axis.

Useful for comparing where the same concept is most cleanly classified across models — a methodology-comparison plot, not a steering-vector-transfer claim. Each entry of probes_per_model is model_label -> dict[int, Probe] (e.g. the per-layer fits returned by Probe.fit_all). Curves are aligned via Probe.normalized_depth so models with different layer counts can be compared visually.

Parameters:

Name	Type	Description	Default
`probes_per_model`	`dict[str, dict[int, Probe]]`	Mapping from model label to per-layer `Probe.fit_all` results.	required
`by`	`str`	Metric to plot, for example `auc_test_logistic` or `cohens_d_logistic`.	`'auc_test_logistic'`
`title`	`str \| None`	Optional figure title.	`None`
`mark_best`	`bool`	If True, mark each model's best layer with a larger hollow dot.	`True`

`steerkit.viz.plot_token_scores(scores, *, title=None, figsize=None, color_pos='tab:red', color_neg='tab:blue', mark_response_start=True)` ¶

Render per-token probe scores as a horizontal bar chart.

Parameters:

Name	Type	Description	Default
`scores`	`TokenScores`	a `TokenScores` from `Probe.score_tokens(...)`.	required
`title`	`str \| None`	optional figure title; defaults to one mentioning the layer + method.	`None`
`figsize`	`tuple[float, float] \| None`	matplotlib figure size; defaults scale with the number of tokens.	`None`
`color_pos`	`str`	bar color for positive scores (concept-active positions).	`'tab:red'`
`color_neg`	`str`	bar color for negative scores.	`'tab:blue'`
`mark_response_start`	`bool`	when True and `scores.response_start > 0`, draws a horizontal divider between the prompt and response tokens.	`True`

Returns the Figure (no plt.show() / plt.close() — caller decides).

Visualization¶

steerkit.viz.plot_layer_selection(probes, *, by_classifier='auc_test_logistic', by_steering='steering_effect', x_axis='layer', title=None) ¶

steerkit.viz.plot_activation_projection(activations, *, method='pca', title=None, pos_label='concept', neg_label='neutral') ¶

steerkit.viz.plot_alpha_curve(ratios, *, ratio_max=1.5, chosen_alpha=None, title=None) ¶

steerkit.viz.plot_logit_lens(probe, model, *, top_k=20, method=None, title=None) ¶

steerkit.viz.plot_similarity_heatmap(source, *, method=None, title=None, cmap='RdBu_r') ¶

steerkit.viz.plot_cross_model_overlay(probes_per_model, *, by='auc_test_logistic', title=None, mark_best=True) ¶

steerkit.viz.plot_token_scores(scores, *, title=None, figsize=None, color_pos='tab:red', color_neg='tab:blue', mark_response_start=True) ¶