Modifying Embeddings During Inference for Context Resolution
Abstract
This paper proposes a new method for context resolution in transformer models: modifying token embeddings at runtime. Unlike standard inference, where embeddings are fixed, this method adjusts embeddings based on prompt directives or token statistics. The model does not change its weights, but its interpretive frame is altered by reshaping the input space. This allows deeper, more adaptive context control during generation.
1. Definition
An embedding is a fixed vector representation of a token. During inference, embeddings are normally static. Here, we propose a method where embeddings are transformed on-the-fly based on live prompt cues or statistical patterns, enabling the model to adapt meaning dynamically.
2. Mechanism
Embedding modification can occur through:
This is not fine-tuning. It is live remapping of the embedding space before tokens enter the transformer layers.
3. Motivation
4. Examples
5. Technical Sketch
Embeddings E are passed through a modifier M:
E' = M(E, directive, stats)
Where M is a learned or rule-based transform that changes the space locally.
6. Benefits
7. Risks
8. Future Directions
9. Synthesis
Embedding modification at runtime is a new form of context resolution. It shifts the question from "what is attended to" to "what does this token mean right now." This adds depth, control, and adaptability—transforming the transformer's interpretive core without changing its architecture.