Data primitives¶
steerkit.data.ContrastPair
dataclass
¶
A single contrast pair: shared user prompt with a concept-bearing and a neutral assistant response.
Activations are extracted at the last token of the chat-templated full text (prompt + each response), which captures the model's internal state at the moment it has just produced (or "decided") the corresponding response style.
steerkit.concepts.Concept
dataclass
¶
A single named direction. Describes one axis of behavior the user wants to probe and steer.
Carries optional contrast pairs once the dataset has been generated or loaded.
steerkit.concepts.ConceptGroup
dataclass
¶
A group of concepts that share a neutral reference and a relationship type.
The shared neutral makes the resulting steering vectors live in a common coordinate frame so they can be linearly combined (e.g. 0.7joy + 0.3surprise).
relationship:
- "mutually_exclusive": only one concept applies at a time (e.g. emotion classes).
Probing supports a multinomial probe across the group in addition to per-concept
binary probes.
- "multi_label": concepts can co-occur; only per-concept binary probes are valid.
- "axes": concepts are orthogonal axes (e.g. emotion vs. formality); typically each
axis is its own ConceptGroup, and cross-group composition happens at steer-time.
generate_pairs(teacher, *, seed_prompts=None, max_pairs_per_concept=None, temperature=0.7, max_tokens=512)
¶
Generate contrast pairs for every concept in this group, attaching them in-place.
teacher can be a TeacherModel instance or a spec string like 'anthropic:claude-opus-4-7'.
Returns a dict mapping concept.name -> GenerationStats.
steerkit.concepts.singleton_group(concept, *, neutral_reference, relationship='mutually_exclusive', group_name=None)
¶
Wrap a single Concept in a ConceptGroup. Convenience for binary-axis use cases.