Squire: A Sandbox Coding Agent You Can Hand the Keys To

June 1, 2026 • 14 min read • Arkadij Kummer

#AI #Coding Agents #Background Agents #Sandbox #Security #Claude Code #Modal #GitHub #Slack #Developer Tools

A lot of people who work with coding agents describe feeling slightly less sharp than they used to. They spend the day watching sessions scroll past, skimming what the model proposes, and hitting enter. Each behaviour looks responsible on its own: watching feels like supervision, skimming feels like review, keeping the session open feels like control. But stacked together they produce someone who is neither really thinking about the problem nor really letting go of it. One hand stays on the wheel out of habit, while no real thinking is happening.

This is the worst of both worlds. The agent never gets full autonomy, so the human stays a bottleneck, and the human is not doing deep work, so the supervision is theatre. We call it the pseudo-micromanagement trap, and it has quietly become a common tax, because background coding agents are now normal infrastructure. Ramp reported its internal agent writes roughly 30% of the pull requests merged to its main repositories (probably more by now). The tools are everywhere, and the trap rides along with them.

Two questions follow: how to work with these agents without lobotomising yourself, and how to run one safely enough that leaving it unattended is rational. Squire, the sandbox coding agent we built at Bollwerk, is our answer to both. The post is in two parts: the argument first, then a walkthrough of how Squire is built, detailed enough to use as a reference for your own. If you only want the architecture, skip to Part II.

Overview

Full offload or full engagement
Squire in practice
How we got here
The security spine: a sandbox that cannot write
The replay loop, step by step
Defense in depth
What we took from Ramp and Symphony
Where we sit

Part I: The case for full offload

Full offload or full engagement

The trap has a mechanism underneath the feel. Eric Ma calls it a throughput problem in his essay on calibration: AI tools accelerate feedback loops past the rate human cognition can absorb, and when loop cadence exceeds brain throughput the queue overflows. Partial parallelism shreds throughput, because the switching cost is paid on every loop and the loops never close. Half-watching two agents is slower than fully handing off one and fully thinking about another.

So the rule we argue for is binary on purpose. For any piece of work you are in one of two regimes, and the unproductive middle is the thing to avoid.

Where to spend your finite attentiontwo regimes, no middle

Full offload

hand off

Where the agent matches or exceeds you

Boilerplate, refactors, well-scoped feature work, glue code, migrations: anything where you are not the deepest expert in the room. Fire the task and review the PR. Do not pseudo-supervise.

refactorsscoped featuresmigrations

Full engagement

go deep

Where your judgment still surpasses the model

Architecture, hard trade-offs, ambiguous problems, anything with real domain depth. This is where your finite cognitive budget belongs: one problem at a time.

architecturehard trade-offsdomain depth

The trap lives in the gap between these two columns: work that is offloaded but still half-watched, or engaged with but not really thought about.

The decision rule: for each piece of work, pick a regime and commit to it. The pseudo-micromanagement trap is the unproductive middle between them.

Both regimes preserve throughput, because in both the loop you are running actually closes. The engagement column is where humans still belong: architecture, hard trade-offs, ambiguous problems, the domain depth the model lacks. Spend the attention you save on the offload column there.

One deliberate exception sits outside the throughput logic. Sometimes the goal is to learn the thing rather than ship it efficiently. When you want a skill you do not yet have, building it alongside the agent is one of the better ways to acquire it: you stay in the loop to use the agent as a tutor, reading what it does, asking why, trying the next step yourself. The test is whether you can name what you get from staying in. "I am learning how this works" is a real answer; "I feel like I should keep an eye on it" is the trap.

All of this assumes the offload column is safe to use, and it only is if letting the agent run unattended cannot cause real damage. If an unsupervised agent can push to main, force-push over a colleague's branch, or be talked by a poisoned README into something destructive, then firing it off and reviewing later is negligence, and keeping a hand on the wheel is correct. So you earn the right to fully offload through architecture: make the dangerous actions structurally impossible, and the hand comes off the wheel safely. That is the property Squire is built around. Before the how, here is what working with it looks like.

Squire in practice

Squire is reachable from two surfaces, GitHub and Slack, and runs in two modes, non-interactive and interactive. The two are independent: you can fire a non-interactive run or open an interactive session from either surface.

Non-interactive is the common path. On GitHub you comment /squire-ship add a rate limiter to the webhook endpoint, 30 req/min per workspace on an issue, and the next thing you see is a pull request, opened by Squire, with the work done and a real description. Comment on an existing pull request and Squire checks out that branch, appends its commits, and updates the same PR, so asking for a change stays where the review already is. When a run turns up something out of scope, it can file a GitHub issue rather than dropping the observation.

A GitHub issue where a slash command dispatches Squire, followed by a pull request opened by Squire with the completed work. — A GitHub dispatch and the pull request it produced: the next thing you see after firing the task is something to review.

Interactive mode is for when you want to steer. Squire hands you a link, gated by OAuth so only you can open it, into a chat UI where you drive the work turn by turn. That is the engagement regime from earlier in action: deliberate, loop-closing work rather than the half-watched middle, and it is also how you learn an unfamiliar part of a codebase alongside the agent.

Slack is both a way in and the way Squire reports back. Mention @Squire with a task and it runs the same way, and when a run opens a PR or files an issue it posts the link into the thread that triggered it, so the outcome lands where you asked. The same mention also handles work that never becomes a PR: "where does this feature live?", "is this shipped yet?", or "summarise this" with a link. Squire reads the relevant repository or fetches the page and replies in the thread.

A Slack thread where a user pastes a link and mentions @Squire to summarise it; Squire replies in-thread with a concise summary. — Squire summarising a link in a Slack thread: the request and the summary stay attached to the conversation where the link was shared.

There is no bespoke command for the question-and-answer cases. A mention with a research-shaped prompt routes to a read-only repository sandbox, or to a no-repo sandbox with a web-fetch tool, and posts its answer to Slack instead of opening a PR. The same machinery that ships PRs answers questions and digests links; the only side effect of those cases is a Slack message.

What ties the surfaces together is that none of them need your laptop. The work lives in a sandbox you reach from an issue comment, PR comment, or a Slack message, so you can kick off a fix from your phone and read the PR later, wherever you are. The aim is to provide more surfaces for real work, and more places to start it from.

The shape is consistent across surfaces. You note the work where you noticed it, and the next thing you see is a result to review: a PR, a summary, an answer, instead of a session to babysit.

Part II: How Squire works

You have seen what Squire does. The rest of the post is how it does it, written to be reusable: if you are assembling your own background agent, this is the design we would hand you. The organising idea carries over from Part I. Make the dangerous things structurally impossible, then let everything else hang off that.

How we got here

The clearest published reference for this class of system is Ramp's, in "Why We Built Our Own Background Agent." Their agent, Inspect, runs each session in a Modal sandbox with a full dev environment wired into the tools an engineer uses, and Ramp published a spec with an open invitation to copy it. We took the Modal-sandbox half and skipped the parts we did not need yet: multiplayer, a hosted VS Code, a Chrome extension.

Our MVP was deliberately small. A GitHub webhook fires a slash command, an orchestrator spins up a Modal sandbox, the sandbox clones the repo and runs a coding agent, and the result comes back as a pull request. One constraint was fixed from the first design note: no GitHub write credentials ever enter the sandbox. Everything dangerous happens outside it, and that rule shaped every decision after. From there the system grew in legible steps:

The rebrand. The plain /agent command became Squire, the agent that does the prep work so the engineer can show up to the joust. Two modes split out: /squire-ship for fire-and-forget headless runs, and /squire for an interactive session you can steer.
Per-repo governance. A .squire/workflow.md file carries each repo's setup commands, verification hooks, and timeouts. It is read before a sandbox is created, so a parse error fails fast with a comment and burns no compute.
The interactive rewrite. The first interactive mode drove a terminal UI over ttyd and rendered badly, with flicker and line-by-line redraws. We replaced the whole path with a streaming-JSON chat UI fed by the agent CLI's event stream.
Slack. Slash commands, plus an @Squire mention that routes through a fast classifier to decide which repo to target and whether to run headless or interactive.
A no-repo research mode. A vanilla sandbox with no repository, no GitHub credentials, and Slack as its only output. This is what answers questions and summarises links.
An egress-proxy hardening pass. Outbound traffic from a sandbox now flows through an allowlisting proxy, closing the easy accidental-exfiltration paths.

None of these relaxed the founding constraint. Every "could the agent just do X?" resolved the same way: the orchestrator does X, outside the sandbox, with a credential the agent never sees. The system today is a single app with a small number of parts:

An orchestrator that receives GitHub webhooks and Slack events, checks authorisation, gathers context (issue, thread, repo guidance), dispatches sessions, and holds the GitHub write credentials for the replay step.
Ephemeral Modal sandboxes, one per session, started from periodically refreshed per-repo snapshots so a run begins from a recent checkout rather than a cold clone.
An agent process inside each sandbox, given task context up front and a read-only repo token, producing local commits for code work or messages for research and Slack output.
The verify-and-replay path in the orchestrator, the deterministic bridge from local commits to a pull request.
An egress proxy that confines outbound traffic to an allowlist.
Trigger plumbing: GitHub and Slack slash commands, plus the @Squire mention router.

A bit more on the Modal layer, since it carries most of the infrastructure. The whole system is one Modal app: a long-lived FastAPI orchestrator, the per-session sandboxes it spawns, the egress proxy, and the snapshot job all live in the same deployment and ship with a single command. Keeping them together means one place for secrets and networking instead of credentials stitched across clouds, and the orchestrator's cold start of a couple of seconds is invisible to a webhook, which simply retries if the first call is slow.

Each session runs in its own Modal Sandbox: an ephemeral container created when a run starts and torn down when it ends. The image is built ahead of time and carries everything a run needs, with Python, Node, the coding-agent CLI, git, and a secret scanner already in place. Nothing is installed at request time, so a session is ready to work the moment it boots.

Fast starts come from snapshots. A scheduled job keeps a warm per-repo image with dependencies installed and the repository already cloned, so a run resumes from a small git fetch instead of a cold clone of a large codebase. The result is a startup measured in seconds rather than minutes, which disappears into the gap between firing a command and looking back for the result.

Coordination uses Modal's own primitives. Dicts hold session state and de-duplicate webhook retries, so a redelivered event does not kick off a second run. For interactive sessions, the agent CLI runs in streaming-JSON mode and a per-session Queue carries its events out to the browser, which is how the chat UI stays live without the sandbox ever talking to the client directly. Secrets follow the same discipline as the rest of the design: minted per run, scoped to what that run needs, and short-lived, so nothing long-lived is baked into an image.

The security spine: a sandbox that cannot write

Two properties carry most of Squire's security. This section covers the first: no write-capable GitHub token ever enters the sandbox, which bounds what the agent can change. The egress proxy is the second, covered under defense in depth, and it bounds what can leave the box. The agent can read the repository it works on and write freely to its own local working tree, but it has no credential that can mutate anything in the GitHub organisation. It cannot push, open a PR, comment, or touch another repo.

This narrows what a compromised agent can do to one thing: produce commits in its own working tree. It cannot push them, open a pull request, or touch main, because it holds no write credential. The orchestrator does the writing, and it only ever creates the run's own branch and opens a pull request against the protected base. So a prompt injection that makes the agent write bad code does not bypass anything. The replay loop carries those commits faithfully, but they land in a pull request and meet the same review gate as any human's, and main is reached only when a person merges it. The calm answer to "what happens if the agent is fully compromised?" is that the attacker gets a single-repo, read-only, short-lived token, a throwaway container, and a pull request somebody still has to approve.

Everything that writes to GitHub lives in the orchestrator, outside the sandbox. The agent produces commits locally; the orchestrator turns them into a pull request through a deterministic replay loop.

Trigger to pull requestone write boundary

1orchestrator

Trigger: GitHub comment or Slack @mention

2orchestrator

Orchestrator: auth check and gather context

inside the sandbox · read-only, no GitHub write token

3sandbox

Read-only clone of the repository

4sandbox

Agent runs, commits to its local working tree

5sandbox

Export a deterministic git bundle

6orchestrator

Verify the bundle: caps, branch shape, workflow-file block

7orchestratorwrites to GitHub

Replay the commits via the GitHub Git Data API

8orchestrator

Open or update the pull request

Sandbox steps run with a read-only, single-repo token. Only the orchestrator holds GitHub write credentials, and only at the replay step.

The pipeline from trigger to pull request. The sandbox never holds a GitHub write token; the orchestrator does, and only at the replay step.

The replay loop, step by step

When the agent finishes, its commits exist only inside the sandbox. Getting them onto GitHub without ever handing the sandbox a push token works like this:

The agent commits locally. Standard git, normal commits, on a branch with a fixed, validated prefix. No network writes are involved or possible.
The sandbox exports a deterministic bundle. Everything since the pristine base the session started from is packed into a git bundle with delta compression disabled, so the same commits always produce the same bytes. That determinism is what makes the next step auditable: the same input always yields the same, inspectable output.
The orchestrator verifies the bundle. Outside the sandbox, the bundle is parsed and checked before anything touches GitHub. The branch name must match the expected shape, the parent chain must walk cleanly back to the known base commit, and there are caps on commit count, total size, and per-blob size. A bundle that touches CI workflow files is rejected, because the GitHub App has no permission to write them; an agent cannot rewrite the pipeline that will judge its own code.
The orchestrator replays the commits via the Git Data API. It walks the verified commits oldest first and reconstructs them through GitHub's low-level Git Data API: upload blobs, assemble trees, create commit objects, move the branch ref. This is the only component with GitHub write credentials, it is small and deterministic, and it takes no instructions from the model.
It opens or updates the pull request. The open is idempotent. An existing PR for the branch is amended; otherwise a new one is created. Re-running a task never spawns duplicate PRs.

The split is the whole game. The unpredictable, model-driven part of the system runs where it can do no external harm. The part that can mutate your repository is a deterministic, reviewable function of verified inputs. The trust boundary sits exactly between them.

Defense in depth

The zero-write property is the spine, and a few more layers sit around it.

Scope-down, read-only tokens. The token the agent does get is minted per session, scoped to a single repository, limited to read permissions, and expires in about an hour. It is enough to run read-only inspections (view the PR, read a diff, look at an issue) and nothing more. Even though the GitHub App may be installed across the whole org, the agent never sees more than the one repo it was dispatched against.

Hooks run with no secrets. Repos can define setup and verification hooks in .squire/workflow.md, but those hooks execute with no credentials mounted at all. The reasoning is that repository content is poisonable: a malicious change to a checked-in script must not be able to read a token and phone it home. So hooks are held to the agent's threat model. They can do what the agent can do, and no more.

The egress proxy is a polite wall, and we say so. Outbound traffic from a sandbox flows through an allowlisting proxy that permits only the handful of hosts a session legitimately needs: the model API, the package registries, the web-fetch provider, GitHub for reads. It reliably stops the easy, accidental paths: a confused agent running curl at some exfiltration endpoint, or a compromised package's post-install script calling home. We are deliberately honest that it is not a wall against a determined, tailored attacker; a sufficiently creative process could still find a covert channel. The proxy is there to stop accidents and confused tools. Containment is the credential model's job: the sandbox can read one repository and write nothing, so a leak is bounded to that repository's source, while pushing into the org or reaching another repo stays impossible.

What we took from Ramp and Symphony

Two published systems shaped Squire. Here is what we borrowed from each and where we diverged, especially on security, where the differences are the point.

From Ramp's Inspect we took the architectural backbone, and they deserve the credit: Modal sandboxes with per-repo image snapshots; a GitHub App minting a fresh installation token per clone, with git author identity set at commit time; a Slack repo-routing classifier with an explicit "unknown" fallback so the bot can ask when it is unsure; and a webhook to track PR state. Where we diverged: we run Claude Code rather than OpenCode (a billing and terms consideration for us specifically); we skipped Cloudflare Durable Objects and multiplayer, coordinating session state with Modal's primitives instead; and, the substantive one, we drew the containment boundary in a different place.

Some of that divergence is about audience. Ramp built Inspect partly to bring people who do not usually write code into the loop, which is why multiplayer, a hosted IDE, and a Chrome extension are central for them. Squire points the other way: it is for people who already write code, and the goal is to make their dispatch cheaper and to open more channels to do it from, so we invested in trigger surfaces and containment rather than collaborative editing.

Ramp's post is explicit about its model. In their words, their setup is "to have the sandbox push the changes ... and then send an event to the API with the branch name," and the API then "use[s] the user's GitHub token to call GitHub's pull request API." Moving the PR-open to the user's token is, they say, to avoid letting "any user ... approve their own changes." That is sound approval hygiene, but notice what it implies: in their documented architecture the sandbox holds a push-capable token and pushes the branch directly. Only the final pull-request call is moved out. We took the containment angle further: no write-capable token enters our sandbox, and all writes (the branch itself, not just the PR open) happen outside it via replay. The practical difference is where the guarantee lives. Ramp's containment rests on GitHub branch protection, a per-repo policy that has to be set and kept correct; ours rests on the absence of any push credential, so a forgotten or misconfigured branch-protection rule cannot expose main through the agent, because there was never a credential to misuse. The two designs aim at different objectives, and both are reasonable: their split optimises for review integrity, ours for agent containment.

OpenAI's Symphony (its published SPEC.md) was the conceptual model for several Squire features: the repo-owned workflow file, lifecycle hooks, and the operator state dashboard all trace to it. It is also the cleanest contrast we can draw, because the difference is documented on both sides. Symphony's threat model trusts repository content; ours treats it as poisonable. Symphony's hooks run with whatever authentication the orchestrator has; ours run with none, by design. Same surface feature, repo-controlled hooks, opposite trust assumption underneath. Most of our hardening decisions are the result of taking a pattern we liked and re-deriving it under a hostile-repo assumption.

Who holds write credentials in the sandbox?3 published systems

Ramp · Inspect

Sandbox holds a push token and pushes the branch. The PR-open is moved to the user’s token for approval hygiene.

sandbox can push

OpenAI · Symphony

Trusts repo content; hooks run with the orchestrator’s auth. Strong on ergonomics, permissive on trust.

trusted hooks

Bollwerk · Squire

No write token in the sandbox at all. All writes happen outside it via deterministic replay; hooks run with no secrets.

zero write in sandbox

All three are reasonable designs for different objectives: review integrity, ergonomics, and containment respectively. The contrast here is documented on each side.

Three published background-agent designs, compared on the one axis that matters most for full offload: what the sandbox can write to.

Where we sit

Squire began as a convenience, a way to fire a task and get a PR back. What it turned into is an argument. We build the containment in so deliberately because of the behaviour change from Part I: the containment is what makes full offload rational. You can only take your hand off the wheel if the car cannot drive off a cliff. Once the dangerous actions are structurally impossible, fully offloading the work the agent is good at is simply the obviously correct allocation of a scarce resource: your own attention.

That is the loop we are trying to close. Cage the agent well enough that you can stop watching it, and spend the attention you get back on the problems where you are still the best thinker available.

References

Eric J. Ma, "Calibration Is Synchronizing Feedback Loops With Neural Throughput"
Ramp, "Why We Built Our Own Background Agent" (Inspect)
OpenAI, Symphony