Memory & Summarization¶
A 7B model has ~8K context tokens. A game can last 50+ turns, generating ~25K tokens of raw conversation. The memory and summarization systems keep context within budget without losing important story details.
Character Memory¶
Each character has a memory file at memory/<name>.yaml containing:
- Summary — 3-5 sentences capturing the character's current understanding of the story
- Key facts — max 10 short entries tracking specific knowledge, relationships, and intentions
A dedicated memory agent updates each character's memory after every turn. The agent sees the character's current memory plus the current turn's events, and outputs a YAML diff: facts to add, remove, or update, plus a new summary.
Critical rule: memories are never cross-contaminated. Each memory agent sees only what that character witnessed. If Maya was not present for a conversation, her memory agent never sees it.
Memory Update Flow¶
sequenceDiagram
participant E as Engine
participant MA as Memory Agent
participant F as memory/maya.yaml
E->>F: Read current memory
E->>MA: Current memory + this turn's events
MA-->>E: YAML: add/remove/update facts + new summary
E->>F: Write updated memory Memory agents for all characters run in parallel via asyncio.gather. Each agent instance is independent — no shared state, no cross-character reads.
Rolling Summary¶
When unsummarized conversation exceeds ~1500 tokens, the rolling summarizer triggers:
- The oldest turns (all except the last 4) plus the existing summary are sent to the summarizer agent.
- The agent produces an updated summary of under 5 sentences.
- The engine records
last_summarized_turnin game state. - On the next turn, the narrator receives this summary as "Story so far" at the top of its context.
This is incremental merge, not full re-summarization. The agent folds new events into the existing summary rather than re-reading the entire history. This keeps each summarization call cheap and bounded.
Chapter Summaries¶
When the game state agent determines a chapter is complete:
- A chapter summary agent creates a 2-3 sentence recap of the chapter.
- The recap is stored in
summaries.yaml. - On subsequent turns, the narrator receives all chapter summaries as "COMPLETED CHAPTERS" in its context.
Chapter summaries are permanent — they are never re-summarized or trimmed.
Context Budgeting¶
The narrator has the largest context window to fill. Here is how the ~2000 token budget is typically allocated:
graph LR
subgraph "Narrator Context (~2000 tokens)"
A["System prompt\n(~300 tokens)"]
B["Rolling summary\n(~100 tokens)"]
C["Chapter summaries\n(~50 tokens)"]
D["Current chapter\n(~100 tokens)"]
E["Recent conversation\n(~800 tokens)"]
F["Player input\n(~50 tokens)"]
G["Headroom for output\n(~600 tokens)"]
end The pattern every context builder follows:
- Build the system prompt with all required static data (world, chapter, characters).
- Add the user message with conversation history and player input.
- Estimate total tokens using the
len(text) // 4heuristic (no tiktoken dependency). - If over budget, trim conversation history from the oldest entries first.
Typical Turn 30¶
A concrete example of what each agent sees at turn 30 of a game:
| Agent | Context Contents | Approx. Tokens |
|---|---|---|
| Narrator | Rolling summary (turns 1-26 in ~5 sentences) + chapter summaries + turns 27-29 + current input | ~1400 |
| Character | Character memory (summary + facts) + turns 28-29 + current turn (narrator output + prior responses) | ~1000 |
| Memory | Character's current memory + current turn events | ~600 |
| Game State | Chapter beats with hit/unhit status + current turn events | ~500 |
The rolling summary compresses turns 1-26 into roughly 5 sentences. The narrator sees that summary plus only the 3 most recent turns. Each character sees even less — just their own memory plus the last 2 turns and the current one.
The result: a 50-turn game uses roughly the same context as a 5-turn game. The memory and summarization systems act as a compression layer that keeps the context window stable regardless of game length.
See Also¶
- Agents — agent details, prompt design rules, structured output pipeline
- Data Model — memory file format and game state schema
- Requirements — design rationale behind these decisions