Interventions¶
steerkit.intervention.apply_addition(activation, direction, alpha)
¶
act ← act + α·v. The standard CAA / repeng-style steering.
steerkit.intervention.apply_projection(activation, direction)
¶
act ← act − (act·v̂)v̂. Removes any component along direction.
Useful when you want to neutralize the concept rather than push toward it (e.g., "make the model produce neither overtly refusing nor overtly compliant responses, just whatever else is in there").
steerkit.intervention.apply_clamp(activation, direction, target)
¶
act ← act + (target − act·v̂)v̂. Forces the projection onto direction to
equal target exactly, while leaving the orthogonal component untouched.
More predictable than addition because it's invariant to whatever signal was
already there: a positive target produces a fixed concept strength rather
than additive perturbation.
steerkit.intervention.apply_multiplicative(activation, direction, gamma)
¶
act ← act + (γ−1)(act·v̂)v̂. Scales the existing component along direction
by γ (γ=1 is no-op, γ=0 is equivalent to projection, γ>1 amplifies, γ<0 reverses).
steerkit.intervention.make_hook(op, direction, *, alpha=None, target=None, gamma=None)
¶
Return a TL-compatible forward hook closure that applies the given op.
Required parameter per op
addition → alpha projection → (none) clamp → target multiplicative → gamma