Skip to content

Interventions

steerkit.intervention.apply_addition(activation, direction, alpha)

act ← act + α·v. The standard CAA / repeng-style steering.

steerkit.intervention.apply_projection(activation, direction)

act ← act − (act·v̂)v̂. Removes any component along direction.

Useful when you want to neutralize the concept rather than push toward it (e.g., "make the model produce neither overtly refusing nor overtly compliant responses, just whatever else is in there").

steerkit.intervention.apply_clamp(activation, direction, target)

act ← act + (target − act·v̂)v̂. Forces the projection onto direction to equal target exactly, while leaving the orthogonal component untouched.

More predictable than addition because it's invariant to whatever signal was already there: a positive target produces a fixed concept strength rather than additive perturbation.

steerkit.intervention.apply_multiplicative(activation, direction, gamma)

act ← act + (γ−1)(act·v̂)v̂. Scales the existing component along direction by γ (γ=1 is no-op, γ=0 is equivalent to projection, γ>1 amplifies, γ<0 reverses).

steerkit.intervention.make_hook(op, direction, *, alpha=None, target=None, gamma=None)

Return a TL-compatible forward hook closure that applies the given op.

Required parameter per op

addition → alpha projection → (none) clamp → target multiplicative → gamma