Inscryption Agent

Personal project: an agent that plays the deckbuilder Inscryption. In progress.

Why this project

I wanted a constrained, well-defined environment to think about agent design from first principles. Inscryption is an excellent test bed:

Partial information. You see your hand and the board; you don't see the deck or the opponent's plan
Long-horizon decisions. Today's draft choices constrain three battles ahead; greedy short-term picks lose
Discrete, learnable rules. Cards have small rule surfaces; combinations are the depth
Visible feedback. Win/loss is unambiguous; sub-rewards (hp lost, cards traded) are computable
Asymmetric meta-game. The player picks a deck and an ascension level; balancing risk against reward at the meta-layer is a separate decision policy

Most "agent on a game" projects target Atari, Chess, or Go — solved problems with clean observation spaces. I wanted something messier and more like the real world: a structured but partially-observed environment with multiple decision policies layered on top of one another.

Where the project is

Currently mid-build. Honest snapshot:

✅ Game state encoder (board, hand, deck-counts, modifiers)
✅ Move enumerator (legal action space per turn)
✅ Tactical battle policy v0 (one-step lookahead with hand-tuned eval function)
⚠️ Deck draft policy — first version works on Act 1; weak on later acts
⚠️ Ascension-vs-rest decision policy — not yet differentiated from greedy
⏳ Self-play harness for play-trace generation
⏳ Eval function tuning from play traces (the self-improvement loop)

The next milestone I'm pushing toward in the few days available: a self-play harness that produces traces, plus a minimal pipeline that uses those traces to refine the eval function used by the tactical policy. The full system is overkill for an MVP — but the shape of self-improvement, even at a small scale, is what makes the project interesting to me.

Design choices worth documenting

LLM where it earns its keep, not as a default

I deliberately don't call an LLM on every turn. Most decisions in Inscryption are search problems with cheap, well-defined eval — using a frontier model for them would be slow, expensive, and high-variance. The LLM gets called only at:

Deck draft — where natural-language reasoning over card combinations beats a hand-coded heuristic
Meta-game (ascension vs. rest) — where the trade-off is fuzzy and benefits from contextual reasoning
Trace summarisation — converting play traces into structured "what went wrong" notes that feed the eval-function refinement loop

Everything else is search + heuristic. The LLM is a tool the agent reaches for, not the substrate it lives on.

Self-improvement loop, modest version

The proper version is reinforcement-learning-flavoured: collect traces, score them, update the policy. The MVP version is a much smaller loop:

Run N games end-to-end with the current eval function
Compute simple regression: which board states predicted the eventual outcome better than the eval said?
Adjust eval function weights toward the predictive signal
Repeat

It's not deep RL. It's a self-improving heuristic. That's the point of the project — not to build a state-of-the-art agent, but to think hands-on about what the smallest interesting feedback loop looks like.

Honest about the gap between "demo" and "good"

The current tactical battle policy is one-step lookahead with a hand-tuned eval. It will lose on Act 2 ascension runs against any decent player. Acknowledging this on the project page rather than papering over it is the same honest-instrumentation discipline as in Smart Skills — a tool that hides its own confidence is worse than one that surfaces it.

What I'm hoping to take from this

1. A proper test bed for thinking about agent self-improvement. Real production agents don't have the luxury of clean rewards. Inscryption gives me one.

2. A grounded sense of the LLM-vs-search trade-off. Most agent projects over-use the LLM. Building one where I had to choose, turn by turn, sharpens the intuition.

3. A repeatable shape for "minimum viable self-improvement loop." If I can find the smallest version that actually compounds, I can apply that pattern elsewhere — including back at work.

Project log. Code at github.com/4abandoment/inscryption-kcm-ai. Updated as I push.