Internal Slack Helper

A Slack-native AI helper that answers colleague questions from authoritative internal context, captures every interaction as evidence, and is configured for a company-specific domain through tools, MCPs and skills under continuous iteration.


The thesis

Inside any AI-native company, three patterns emerge fast:

  1. Generic chat models can't safely answer internal questions — they don't have the data
  2. Each team has its own private corpus (warehouse, docs, ticketing, code) that nobody routes through
  3. The most valuable internal assistant is one that learns from how colleagues actually use it — what they ask, what they accept, what they ignore — and adapts its tools, skills and routing accordingly

The Slack helper is built around that third pattern. It's not just a chat interface to internal data. It's an instrumented surface where every interaction becomes evidence, and that evidence feeds back into how the helper is configured.

The same thesis as the Smart Skills platform: the runtime that consumes a tool can produce evidence about it. Hooks first. Configuration follows.


What it does

A Slack app — running a local model — that any colleague can interact with from a DM or a channel mention. Behind it:

  • MCP connections to internal data sources: the warehouse, internal documentation, ticketing system, code repositories. Each MCP exposes a small, scoped surface — not a full dump, just the operations that map to actual asked-for tasks.
  • A skills library layered on top of the MCPs. Skills are reusable, opinionated workflows like "summarise yesterday's customer feedback," "find the most-cited internal doc on X," "draft a release note from the linked PR." Skills compose MCP operations into the answers people actually want.
  • A backend that surfaces every prompt, tool call, response, and outcome. Colleagues can react to responses (👍/👎/comment); reactions are logged alongside the trace.
  • A continuous iteration loop: the team behind the helper reviews logs, identifies gaps, and edits tools, MCPs and skills. The configuration is the product.

Why local

A local model means the prompts and the internal data they touch never leave the company perimeter. For a financial-services company that's table stakes; it also means the helper can be tuned aggressively to internal style without prompt-injection or vendor-drift risk.

The trade-off is capability ceiling. A small local model can't do what frontier models can. The architecture compensates by leaning on tools and skills rather than raw model capability. The model's job is routing, paraphrasing, and conversational glue. The skills do the work.

This is a deliberate inversion of the default LLM-app pattern. Most apps ship a powerful model and let it freestyle. This one ships a modest model with a sharp toolkit.


Architecture

┌──────────────────┐    ┌───────────────────────┐
│ Colleague in     │    │ Slack app (Bolt)      │
│ Slack DM or      │───▶│ - mention/DM router   │
│ channel mention  │    │ - reaction listener   │
└──────────────────┘    └───────────┬───────────┘
                                    │
                                    ▼
                         ┌────────────────────────┐
                         │ Local model runtime    │
                         │ + skill loader         │
                         └────────┬───────────────┘
                                  │
            ┌─────────────────────┼─────────────────────┐
            ▼                     ▼                     ▼
   ┌────────────────┐   ┌──────────────────┐   ┌────────────────┐
   │ Warehouse MCP  │   │ Docs/Wiki MCP    │   │ Ticketing MCP  │
   │ (read-scoped)  │   │ (read-scoped)    │   │ (read-scoped)  │
   └────────────────┘   └──────────────────┘   └────────────────┘
            │                     │                     │
            └─────────────────────┼─────────────────────┘
                                  ▼
                         ┌──────────────────────┐
                         │ Trace + feedback log │
                         │ (every prompt, tool  │
                         │ call, reaction,      │
                         │ outcome)             │
                         └──────────────────────┘
                                  │
                                  ▼
                         ┌──────────────────────┐
                         │ Iteration surface    │
                         │ (gaps → tool / skill │
                         │ / MCP edits)         │
                         └──────────────────────┘

Every box on the right column is the loop that makes the helper improve.


The iteration loop in practice

Every Slack interaction produces a trace:

  • The prompt
  • Which skill (if any) routed it
  • Which MCP operations fired and what they returned
  • The model's response
  • The colleague's reaction (👍, 👎, follow-up, silence)

Logs are reviewed on a cadence. Common patterns:

  • Wrong tool fired → tighten the skill's routing logic, or split into two skills
  • Right tool, bad output → adjust the MCP's response shape (less noise, better fields)
  • No tool fired, model freestyled → identify the missing skill, draft it, register
  • Successful pattern emerging in raw prompts → promote to a named skill so the next colleague gets it for free

This is the same evidence-backed curator workflow as the Smart Skills platform, applied to a different surface. The skill catalog there. The Slack helper here. Same shape.


What's interesting about it (research perspective)

The helper is also an instrument for understanding how my colleagues think about their own work. The traces are a longitudinal record of:

  • What questions people actually ask when they think no-one is watching
  • Which questions cluster by team, by season, by product launch
  • Where the company's implicit knowledge graph differs from its documented one — a docs MCP that gets queried for things the docs don't cover is a research signal
  • Which skills get adopted virally vs. which need promotion
  • Which colleagues are power users vs. light users — and what predicts adoption

Most user research projects stop at "what do users want." This one is set up to also answer "what do they do, repeatedly, and what does that reveal about the system they're working inside?"


What's deliberately out of scope

  • Write actions. Today every MCP is read-scoped. The cost of a bad write into the warehouse or ticketing system is much higher than the cost of a missing answer. Write is Phase 2.
  • Multi-turn agentic loops. The helper does single-step routing today. It doesn't plan over multiple tool calls. That's deliberate — it constrains failure modes.
  • Cross-company memory. Each colleague's traces are scoped to them. No "what's everyone been asking about" surface inside the helper itself; that view lives in the iteration backend.

What I'm taking from this

1. Instrumentation collapses the research/product gap. When every interaction is captured as evidence with the same shape, "research" stops being a separate motion and starts being a continuous read of a live system. The team that runs the helper is the team that runs the research.

2. Configuration is the product. The model is small. The MCPs are small. The skills are small. What makes the helper useful is the combination, tuned to a domain, iterated on. This is the part that doesn't transfer between companies — and the part that compounds.

3. Read-scoped tools are an underrated unlock. Most "AI assistant" projects fail at the trust boundary. Going read-only by default lets the helper do useful work in places organisations would never let a write-capable agent touch.


Built and iterated at Cleo. Architecture-level writeup; specifics omitted.