Prompt Engineering¶
When the model misbehaves during playtesting, the fix is almost always a prompt change. This guide covers the iteration loop, common fixes, A/B testing, and model quirks.
Where Prompts Live¶
All templates in a single file: src/theact/agents/prompts.py. By design --- one-line edits, not multi-file refactors. Constants are imported by context assembly (src/theact/engine/context.py).
The Iteration Loop¶
flowchart TD
A["Run playtest\n(5-10 turns)"] --> B["Read report"]
B --> C["Identify failing agent"]
C --> D["Debug with turn debugger\n(inspect prompt + response)"]
D --> E["Edit prompts.py"]
E --> F["Re-run playtest"]
F --> G{"Stable?"}
G -->|No| C
G -->|Yes| H["Run golden scenarios\n(regression check)"]
H --> I{"All pass?"}
I -->|No| C
I -->|Yes| J["Done"] Rules for Small Models¶
Follow the five prompt design rules defined in Agents — Prompt Design: one task per call, ~300 token system prompts, imperative mood, concrete examples, and constraints stated as rules. Read that section before editing prompts.
Common Fixes¶
Model outputs prose instead of YAML¶
- Ensure the YAML example in the system prompt uses realistic content, not placeholders.
- Check the system prompt is not too long --- the model may be losing the format instruction.
Character breaks voice¶
- Add specific speech patterns to the
personalityfield. - Add anti-patterns ("Never uses slang").
Narrator skips beats¶
- Simplify beat text ("Player finds supplies" not elaborate descriptions).
- Check beat phrasing describes natural events the narrator can weave in.
Memory agent hallucinates facts¶
- Add a concrete negative example to the prompt.
- Verify turn entries only include events the character witnessed.
Game state never completes chapter¶
- Usually a game file issue --- loosen the
completionfield. - Check the completion condition matches the beat set.
Turn Debugger¶
The fastest way to iterate. Step to the failing agent, inspect the prompt and response, edit, reload. See Debugging for full details.
A/B Testing¶
Compare prompt variants statistically to remove guesswork.
Quick Start¶
cp src/theact/agents/prompts.py src/theact/agents/prompts_v2.py
# Edit prompts_v2.py
uv run python scripts/ab_test.py \
--game lost-island --turns 10 \
--variant-a current --variant-b src/theact/agents/prompts_v2.py \
--runs 3
How It Works¶
For each run: load variant, monkey-patch prompts, run playtest, restore. Uses random.seed(run_number) so the AI player produces the same inputs across variants.
Metrics Compared¶
| Metric | Description |
|---|---|
| YAML parse success | % of calls producing valid YAML |
| Character response rate | % of turns with at least one character response |
| Avg turn seconds | Average wall-clock time per turn |
| Thinking tokens (total) | Total reasoning tokens across all calls |
| Prompt tokens | Total prompt tokens consumed |
| Content tokens | Total response content tokens |
| Beats hit | Total story beats triggered |
| Quality composite | Weighted overall score |
Interpreting Results¶
Better prompts = higher parse success + response rate, lower thinking tokens, similar or higher composite. Reliability comes first --- a variant that improves thinking tokens but drops parse success is worse.
Tips¶
- Start with
--turns 3 --runs 2for fast iteration. - Change one thing at a time.
- Use
--variant-a currentconsistently as the baseline. - Check the per-run breakdown --- averages hide variance.
Model Quirks¶
Known 7B model behaviors are documented in model-quirks.yaml. Check before debugging --- your issue may have a known workaround.
See Also¶
- Debugging for the turn debugger
- Playtesting for playtest reports
- Agents for agent details
- Observability for error taxonomy