Debugging¶
The turn debugger lets you step through agent calls one at a time, inspect prompts and responses, edit prompts and replay without restarting, and capture test fixtures. It wraps agents directly — calling the same run_narrator(), run_character(), etc. that run_turn() uses — so there is no "debugger mode" that could mask bugs.
Quick Start¶
This loads the game state from the specified save, then pauses before each agent call so you can inspect and interact.
Interactive Commands¶
| Command | Key | Description |
|---|---|---|
| Step | s | Execute current agent call, pause at next |
| Replay | r | Re-run current agent (same inputs, new LLM call) |
| Edit | e | Reload prompts from disk and replay |
| Inspect | i | Show prompt (prompt), response (response), parsed output (parsed), stats (stats), or all (all) |
| Skip | k | Skip current agent, move to next |
| Continue | c | Run all remaining agents without pausing |
| Fixture | p | Capture prompt/response as test fixture |
| Compare | pa | Compare current response against previous replay |
| Quit | q | Abort and exit |
Workflow: Fixing a Prompt¶
flowchart TD
A["Run debugger with\nproblem input"] --> B["Step to\nfailing agent"]
B --> C["Inspect prompt\n(i then prompt)"]
C --> D["Inspect response\n(i then response)"]
D --> E["Edit prompts.py\nin your editor"]
E --> F["Press 'e' to\nreload + replay"]
F --> G{"Output\ncorrect?"}
G -->|No| E
G -->|Yes| H["Compare (pa)\nagainst old output"]
H --> I["Capture fixture (p)\nfor regression test"] How Edit+Replay Works¶
The debugger uses importlib.reload() to hot-reload prompt changes. It must reload both prompts.py and context.py because context.py imports constants at import time.
The sequence on pressing e:
- User edits
src/theact/agents/prompts.pyin their editor - Debugger reloads the prompts module
- Debugger reloads the context module (picks up new constants)
- Agent call replays with the updated prompt
This gives a tight edit-test loop without restarting the process or re-running earlier agents.
Fixture Capture¶
Pressing p calls capture_fixture(), which saves the full AgentResult (messages, raw response, parsed data, tokens) as YAML in tests/fixtures/.
These fixtures feed into test_prompt_regression.py — debug a problem, capture the failing case, fix the prompt, and the captured fixture becomes a regression test.
Replay Mode¶
Walk through historical turns from a save that was run with debug=True (uses the diagnostics filesystem):
| Key | Action |
|---|---|
Enter | Next turn |
p | Previous turn |
j N | Jump to turn N |
d A B | Diff turn A against turn B |
q | Quit |
Design¶
The debugger wraps agents, not the engine. It requires a real game save — loading game state, characters, and conversation history from a save directory. There is no mock mode. This means the debugger exercises the exact same code paths as production, so any fix validated here works in the real game loop.
Troubleshooting¶
| Symptom | What to Check | Tool |
|---|---|---|
| Empty narrator response | System prompt too long | Context profiler |
| Narrator outputs prose instead of YAML | YAML hint missing or weak | Inspect prompt (i then prompt) |
| Character breaks voice | Personality too vague | Check character YAML file |
| Memory agent hallucinates facts | Turn events include wrong characters | Inspect memory prompt |
| Game state never completes chapter | Completion condition too strict | Check chapter YAML file |
| Model echoes prompt back | Context overflow | Context profiler |
Key Files¶
| File | Contents |
|---|---|
src/theact/debugger/debugger.py | TurnDebugger class |
src/theact/debugger/types.py | AgentResult, DebugStep, DebugSession |
scripts/debug_turn.py | CLI entry point |
See Also¶
- Observability — the logging infrastructure the debugger builds on
- Prompt Engineering — the broader iteration workflow