Skip to content

Data primitives

steerkit.data.ContrastPair dataclass

A single contrast pair: shared user prompt with a concept-bearing and a neutral assistant response.

Activations are extracted at the last token of the chat-templated full text (prompt + each response), which captures the model's internal state at the moment it has just produced (or "decided") the corresponding response style.

steerkit.concepts.Concept dataclass

A single named direction. Describes one axis of behavior the user wants to probe and steer.

Carries optional contrast pairs once the dataset has been generated or loaded.

steerkit.concepts.ConceptGroup dataclass

A group of concepts that share a neutral reference and a relationship type.

The shared neutral makes the resulting steering vectors live in a common coordinate frame so they can be linearly combined (e.g. 0.7joy + 0.3surprise).

relationship: - "mutually_exclusive": only one concept applies at a time (e.g. emotion classes). Probing supports a multinomial probe across the group in addition to per-concept binary probes. - "multi_label": concepts can co-occur; only per-concept binary probes are valid. - "axes": concepts are orthogonal axes (e.g. emotion vs. formality); typically each axis is its own ConceptGroup, and cross-group composition happens at steer-time.

generate_pairs(teacher, *, seed_prompts=None, max_pairs_per_concept=None, temperature=0.7, max_tokens=512)

Generate contrast pairs for every concept in this group, attaching them in-place.

teacher can be a TeacherModel instance or a spec string like 'anthropic:claude-opus-4-7'. Returns a dict mapping concept.name -> GenerationStats.

steerkit.concepts.singleton_group(concept, *, neutral_reference, relationship='mutually_exclusive', group_name=None)

Wrap a single Concept in a ConceptGroup. Convenience for binary-axis use cases.

steerkit.data.load_pairs_jsonl(path)

steerkit.data.save_pairs_jsonl(pairs, path)