Abject Tangential Pruning (ATP) enables a radical rethinking of model compression. Instead of targeting salience, ATP approaches the model sideways—removing or muting dimensions, tokens, and paths that are only tangentially useful within current context. This strategy promotes dynamic adaptability, allowing smaller models to emerge not by exact replication but by context-pruned distillation. ATP is particularly effective in conjunction with KV Cache reductions and self-distilled optimizations, forming a layered approach to pruning that goes beyond static salience mapping.
The goal is a high-agility model with latent adaptability. Removing strong gradients in favor of distributed subtlety preserves inference flexibility.