Inscryption Agent
Personal project: an agent that plays the deckbuilder Inscryption. In progress.
Why this project
I wanted a constrained, well-defined environment to think about agent design from first principles. Inscryption is an excellent test bed:
- Partial information. You see your hand and the board; you don't see the deck or the opponent's plan
- Long-horizon decisions. Today's draft choices constrain three battles ahead; greedy short-term picks lose
- Discrete, learnable rules. Cards have small rule surfaces; combinations are the depth
- Visible feedback. Win/loss is unambiguous; sub-rewards (hp lost, cards traded) are computable
- Asymmetric meta-game. The player picks a deck and an ascension level; balancing risk against reward at the meta-layer is a separate decision policy
Most "agent on a game" projects target Atari, Chess, or Go — solved problems with clean observation spaces. I wanted something messier and more like the real world: a structured but partially-observed environment with multiple decision policies layered on top of one another.
Where the project is
Currently mid-build. Honest snapshot:
- ✅ Game state encoder (board, hand, deck-counts, modifiers)
- ✅ Move enumerator (legal action space per turn)
- ✅ Tactical battle policy v0 (one-step lookahead with hand-tuned eval function)
- ⚠️ Deck draft policy — first version works on Act 1; weak on later acts
- ⚠️ Ascension-vs-rest decision policy — not yet differentiated from greedy
- ⏳ Self-play harness for play-trace generation
- ⏳ Eval function tuning from play traces (the self-improvement loop)
The next milestone I'm pushing toward in the few days available: a self-play harness that produces traces, plus a minimal pipeline that uses those traces to refine the eval function used by the tactical policy. The full system is overkill for an MVP — but the shape of self-improvement, even at a small scale, is what makes the project interesting to me.
Design choices worth documenting
LLM where it earns its keep, not as a default
I deliberately don't call an LLM on every turn. Most decisions in Inscryption are search problems with cheap, well-defined eval — using a frontier model for them would be slow, expensive, and high-variance. The LLM gets called only at:
- Deck draft — where natural-language reasoning over card combinations beats a hand-coded heuristic
- Meta-game (ascension vs. rest) — where the trade-off is fuzzy and benefits from contextual reasoning
- Trace summarisation — converting play traces into structured "what went wrong" notes that feed the eval-function refinement loop
Everything else is search + heuristic. The LLM is a tool the agent reaches for, not the substrate it lives on.
Self-improvement loop, modest version
The proper version is reinforcement-learning-flavoured: collect traces, score them, update the policy. The MVP version is a much smaller loop:
- Run N games end-to-end with the current eval function
- Compute simple regression: which board states predicted the eventual outcome better than the eval said?
- Adjust eval function weights toward the predictive signal
- Repeat
It's not deep RL. It's a self-improving heuristic. That's the point of the project — not to build a state-of-the-art agent, but to think hands-on about what the smallest interesting feedback loop looks like.
Honest about the gap between "demo" and "good"
The current tactical battle policy is one-step lookahead with a hand-tuned eval. It will lose on Act 2 ascension runs against any decent player. Acknowledging this on the project page rather than papering over it is the same honest-instrumentation discipline as in Smart Skills — a tool that hides its own confidence is worse than one that surfaces it.
What I'm hoping to take from this
1. A proper test bed for thinking about agent self-improvement. Real production agents don't have the luxury of clean rewards. Inscryption gives me one.
2. A grounded sense of the LLM-vs-search trade-off. Most agent projects over-use the LLM. Building one where I had to choose, turn by turn, sharpens the intuition.
3. A repeatable shape for "minimum viable self-improvement loop." If I can find the smallest version that actually compounds, I can apply that pattern elsewhere — including back at work.
Project log. Code at github.com/4abandoment/inscryption-kcm-ai. Updated as I push.