Squire: A Sandbox Coding Agent You Can Hand the Keys To
A lot of people who work with coding agents describe feeling slightly less sharp than they used to. They spend the day watching sessions scroll past, skimming what the model proposes, and hitting enter. Each behaviour looks responsible on its own: watching feels like supervision, skimming feels like review, keeping the session open feels like control. But stacked together they produce someone who is neither really thinking about the problem nor really letting go of it. One hand stays on the wheel out of habit, while no real thinking is happening.
This is the worst of both worlds. The agent never gets full autonomy, so the human stays a bottleneck, and the human is not doing deep work, so the supervision is theatre. We call it the pseudo-micromanagement trap, and it has quietly become a common tax, because background coding agents are now normal infrastructure. Ramp reported its internal agent writes roughly 30% of the pull requests merged to its main repositories (probably more by now). The tools are everywhere, and the trap rides along with them.
Two questions follow: how to work with these agents without lobotomising yourself, and how to run one safely enough that leaving it unattended is rational. Squire, the sandbox coding agent we built at Bollwerk, is our answer to both. The post is in two parts: the argument first, then a walkthrough of how Squire is built, detailed enough to use as a reference for your own. If you only want the architecture, skip to Part II.
Overview
- Full offload or full engagement
- Squire in practice
- How we got here
- The security spine: a sandbox that cannot write
- The replay loop, step by step
- Defense in depth
- What we took from Ramp and Symphony
- Where we sit
Part I: The case for full offload
Full offload or full engagement
The trap has a mechanism underneath the feel. Eric Ma calls it a throughput problem in his essay on calibration: AI tools accelerate feedback loops past the rate human cognition can absorb, and when loop cadence exceeds brain throughput the queue overflows. Partial parallelism shreds throughput, because the switching cost is paid on every loop and the loops never close. Half-watching two agents is slower than fully handing off one and fully thinking about another.
So the rule we argue for is binary on purpose. For any piece of work you are in one of two regimes, and the unproductive middle is the thing to avoid.
Boilerplate, refactors, well-scoped feature work, glue code, migrations: anything where you are not the deepest expert in the room. Fire the task and review the PR. Do not pseudo-supervise.
Architecture, hard trade-offs, ambiguous problems, anything with real domain depth. This is where your finite cognitive budget belongs: one problem at a time.
Both regimes preserve throughput, because in both the loop you are running actually closes. The engagement column is where humans still belong: architecture, hard trade-offs, ambiguous problems, the domain depth the model lacks. Spend the attention you save on the offload column there.
One deliberate exception sits outside the throughput logic. Sometimes the goal is to learn the thing rather than ship it efficiently. When you want a skill you do not yet have, building it alongside the agent is one of the better ways to acquire it: you stay in the loop to use the agent as a tutor, reading what it does, asking why, trying the next step yourself. The test is whether you can name what you get from staying in. "I am learning how this works" is a real answer; "I feel like I should keep an eye on it" is the trap.
All of this assumes the offload column is safe to use, and it only is if letting the agent run unattended cannot cause real damage. If an unsupervised agent can push to main, force-push over a colleague's branch, or be talked by a poisoned README into something destructive, then firing it off and reviewing later is negligence, and keeping a hand on the wheel is correct. So you earn the right to fully offload through architecture: make the dangerous actions structurally impossible, and the hand comes off the wheel safely. That is the property Squire is built around. Before the how, here is what working with it looks like.
Squire in practice
Squire is reachable from two surfaces, GitHub and Slack, and runs in two modes, non-interactive and interactive. The two are independent: you can fire a non-interactive run or open an interactive session from either surface.
Non-interactive is the common path. On GitHub you comment /squire-ship add a rate limiter to the webhook endpoint, 30 req/min per workspace on an issue, and the next thing you see is a pull request, opened by Squire, with the work done and a real description. Comment on an existing pull request and Squire checks out that branch, appends its commits, and updates the same PR, so asking for a change stays where the review already is. When a run turns up something out of scope, it can file a GitHub issue rather than dropping the observation.

Interactive mode is for when you want to steer. Squire hands you a link, gated by OAuth so only you can open it, into a chat UI where you drive the work turn by turn. That is the engagement regime from earlier in action: deliberate, loop-closing work rather than the half-watched middle, and it is also how you learn an unfamiliar part of a codebase alongside the agent.
Slack is both a way in and the way Squire reports back. Mention @Squire with a task and it runs the same way, and when a run opens a PR or files an issue it posts the link into the thread that triggered it, so the outcome lands where you asked. The same mention also handles work that never becomes a PR: "where does this feature live?", "is this shipped yet?", or "summarise this" with a link. Squire reads the relevant repository or fetches the page and replies in the thread.

There is no bespoke command for the question-and-answer cases. A mention with a research-shaped prompt routes to a read-only repository sandbox, or to a no-repo sandbox with a web-fetch tool, and posts its answer to Slack instead of opening a PR. The same machinery that ships PRs answers questions and digests links; the only side effect of those cases is a Slack message.
What ties the surfaces together is that none of them need your laptop. The work lives in a sandbox you reach from an issue comment, PR comment, or a Slack message, so you can kick off a fix from your phone and read the PR later, wherever you are. The aim is to provide more surfaces for real work, and more places to start it from.
The shape is consistent across surfaces. You note the work where you noticed it, and the next thing you see is a result to review: a PR, a summary, an answer, instead of a session to babysit.
Part II: How Squire works
You have seen what Squire does. The rest of the post is how it does it, written to be reusable: if you are assembling your own background agent, this is the design we would hand you. The organising idea carries over from Part I. Make the dangerous things structurally impossible, then let everything else hang off that.
How we got here
The clearest published reference for this class of system is Ramp's, in "Why We Built Our Own Background Agent." Their agent, Inspect, runs each session in a Modal sandbox with a full dev environment wired into the tools an engineer uses, and Ramp published a spec with an open invitation to copy it. We took the Modal-sandbox half and skipped the parts we did not need yet: multiplayer, a hosted VS Code, a Chrome extension.
Our MVP was deliberately small. A GitHub webhook fires a slash command, an orchestrator spins up a Modal sandbox, the sandbox clones the repo and runs a coding agent, and the result comes back as a pull request. One constraint was fixed from the first design note: no GitHub write credentials ever enter the sandbox. Everything dangerous happens outside it, and that rule shaped every decision after. From there the system grew in legible steps:
- The rebrand. The plain
/agentcommand became Squire, the agent that does the prep work so the engineer can show up to the joust. Two modes split out:/squire-shipfor fire-and-forget headless runs, and/squirefor an interactive session you can steer. - Per-repo governance. A
.squire/workflow.mdfile carries each repo's setup commands, verification hooks, and timeouts. It is read before a sandbox is created, so a parse error fails fast with a comment and burns no compute. - The interactive rewrite. The first interactive mode drove a terminal UI over
ttydand rendered badly, with flicker and line-by-line redraws. We replaced the whole path with a streaming-JSON chat UI fed by the agent CLI's event stream. - Slack. Slash commands, plus an
@Squiremention that routes through a fast classifier to decide which repo to target and whether to run headless or interactive. - A no-repo research mode. A vanilla sandbox with no repository, no GitHub credentials, and Slack as its only output. This is what answers questions and summarises links.
- An egress-proxy hardening pass. Outbound traffic from a sandbox now flows through an allowlisting proxy, closing the easy accidental-exfiltration paths.
None of these relaxed the founding constraint. Every "could the agent just do X?" resolved the same way: the orchestrator does X, outside the sandbox, with a credential the agent never sees. The system today is a single app with a small number of parts:
- An orchestrator that receives GitHub webhooks and Slack events, checks authorisation, gathers context (issue, thread, repo guidance), dispatches sessions, and holds the GitHub write credentials for the replay step.
- Ephemeral Modal sandboxes, one per session, started from periodically refreshed per-repo snapshots so a run begins from a recent checkout rather than a cold clone.
- An agent process inside each sandbox, given task context up front and a read-only repo token, producing local commits for code work or messages for research and Slack output.
- The verify-and-replay path in the orchestrator, the deterministic bridge from local commits to a pull request.
- An egress proxy that confines outbound traffic to an allowlist.
- Trigger plumbing: GitHub and Slack slash commands, plus the
@Squiremention router.
A bit more on the Modal layer, since it carries most of the infrastructure. The whole system is one Modal app: a long-lived FastAPI orchestrator, the per-session sandboxes it spawns, the egress proxy, and the snapshot job all live in the same deployment and ship with a single command. Keeping them together means one place for secrets and networking instead of credentials stitched across clouds, and the orchestrator's cold start of a couple of seconds is invisible to a webhook, which simply retries if the first call is slow.
Each session runs in its own Modal Sandbox: an ephemeral container created when a run starts and torn down when it ends. The image is built ahead of time and carries everything a run needs, with Python, Node, the coding-agent CLI, git, and a secret scanner already in place. Nothing is installed at request time, so a session is ready to work the moment it boots.
Fast starts come from snapshots. A scheduled job keeps a warm per-repo image with dependencies installed and the repository already cloned, so a run resumes from a small git fetch instead of a cold clone of a large codebase. The result is a startup measured in seconds rather than minutes, which disappears into the gap between firing a command and looking back for the result.
Coordination uses Modal's own primitives. Dicts hold session state and de-duplicate webhook retries, so a redelivered event does not kick off a second run. For interactive sessions, the agent CLI runs in streaming-JSON mode and a per-session Queue carries its events out to the browser, which is how the chat UI stays live without the sandbox ever talking to the client directly. Secrets follow the same discipline as the rest of the design: minted per run, scoped to what that run needs, and short-lived, so nothing long-lived is baked into an image.
The security spine: a sandbox that cannot write
Two properties carry most of Squire's security. This section covers the first: no write-capable GitHub token ever enters the sandbox, which bounds what the agent can change. The egress proxy is the second, covered under defense in depth, and it bounds what can leave the box. The agent can read the repository it works on and write freely to its own local working tree, but it has no credential that can mutate anything in the GitHub organisation. It cannot push, open a PR, comment, or touch another repo.
This narrows what a compromised agent can do to one thing: produce commits in its own working tree. It cannot push them, open a pull request, or touch main, because it holds no write credential. The orchestrator does the writing, and it only ever creates the run's own branch and opens a pull request against the protected base. So a prompt injection that makes the agent write bad code does not bypass anything. The replay loop carries those commits faithfully, but they land in a pull request and meet the same review gate as any human's, and main is reached only when a person merges it. The calm answer to "what happens if the agent is fully compromised?" is that the attacker gets a single-repo, read-only, short-lived token, a throwaway container, and a pull request somebody still has to approve.
Everything that writes to GitHub lives in the orchestrator, outside the sandbox. The agent produces commits locally; the orchestrator turns them into a pull request through a deterministic replay loop.
The replay loop, step by step
When the agent finishes, its commits exist only inside the sandbox. Getting them onto GitHub without ever handing the sandbox a push token works like this:
- The agent commits locally. Standard git, normal commits, on a branch with a fixed, validated prefix. No network writes are involved or possible.
- The sandbox exports a deterministic bundle. Everything since the pristine base the session started from is packed into a
git bundlewith delta compression disabled, so the same commits always produce the same bytes. That determinism is what makes the next step auditable: the same input always yields the same, inspectable output. - The orchestrator verifies the bundle. Outside the sandbox, the bundle is parsed and checked before anything touches GitHub. The branch name must match the expected shape, the parent chain must walk cleanly back to the known base commit, and there are caps on commit count, total size, and per-blob size. A bundle that touches CI workflow files is rejected, because the GitHub App has no permission to write them; an agent cannot rewrite the pipeline that will judge its own code.
- The orchestrator replays the commits via the Git Data API. It walks the verified commits oldest first and reconstructs them through GitHub's low-level Git Data API: upload blobs, assemble trees, create commit objects, move the branch ref. This is the only component with GitHub write credentials, it is small and deterministic, and it takes no instructions from the model.
- It opens or updates the pull request. The open is idempotent. An existing PR for the branch is amended; otherwise a new one is created. Re-running a task never spawns duplicate PRs.
The split is the whole game. The unpredictable, model-driven part of the system runs where it can do no external harm. The part that can mutate your repository is a deterministic, reviewable function of verified inputs. The trust boundary sits exactly between them.
Defense in depth
The zero-write property is the spine, and a few more layers sit around it.
Scope-down, read-only tokens. The token the agent does get is minted per session, scoped to a single repository, limited to read permissions, and expires in about an hour. It is enough to run read-only inspections (view the PR, read a diff, look at an issue) and nothing more. Even though the GitHub App may be installed across the whole org, the agent never sees more than the one repo it was dispatched against.
Hooks run with no secrets. Repos can define setup and verification hooks in .squire/workflow.md, but those hooks execute with no credentials mounted at all. The reasoning is that repository content is poisonable: a malicious change to a checked-in script must not be able to read a token and phone it home. So hooks are held to the agent's threat model. They can do what the agent can do, and no more.
The egress proxy is a polite wall, and we say so. Outbound traffic from a sandbox flows through an allowlisting proxy that permits only the handful of hosts a session legitimately needs: the model API, the package registries, the web-fetch provider, GitHub for reads. It reliably stops the easy, accidental paths: a confused agent running curl at some exfiltration endpoint, or a compromised package's post-install script calling home. We are deliberately honest that it is not a wall against a determined, tailored attacker; a sufficiently creative process could still find a covert channel. The proxy is there to stop accidents and confused tools. Containment is the credential model's job: the sandbox can read one repository and write nothing, so a leak is bounded to that repository's source, while pushing into the org or reaching another repo stays impossible.
What we took from Ramp and Symphony
Two published systems shaped Squire. Here is what we borrowed from each and where we diverged, especially on security, where the differences are the point.
From Ramp's Inspect we took the architectural backbone, and they deserve the credit: Modal sandboxes with per-repo image snapshots; a GitHub App minting a fresh installation token per clone, with git author identity set at commit time; a Slack repo-routing classifier with an explicit "unknown" fallback so the bot can ask when it is unsure; and a webhook to track PR state. Where we diverged: we run Claude Code rather than OpenCode (a billing and terms consideration for us specifically); we skipped Cloudflare Durable Objects and multiplayer, coordinating session state with Modal's primitives instead; and, the substantive one, we drew the containment boundary in a different place.
Some of that divergence is about audience. Ramp built Inspect partly to bring people who do not usually write code into the loop, which is why multiplayer, a hosted IDE, and a Chrome extension are central for them. Squire points the other way: it is for people who already write code, and the goal is to make their dispatch cheaper and to open more channels to do it from, so we invested in trigger surfaces and containment rather than collaborative editing.
Ramp's post is explicit about its model. In their words, their setup is "to have the sandbox push the changes ... and then send an event to the API with the branch name," and the API then "use[s] the user's GitHub token to call GitHub's pull request API." Moving the PR-open to the user's token is, they say, to avoid letting "any user ... approve their own changes." That is sound approval hygiene, but notice what it implies: in their documented architecture the sandbox holds a push-capable token and pushes the branch directly. Only the final pull-request call is moved out. We took the containment angle further: no write-capable token enters our sandbox, and all writes (the branch itself, not just the PR open) happen outside it via replay. The practical difference is where the guarantee lives. Ramp's containment rests on GitHub branch protection, a per-repo policy that has to be set and kept correct; ours rests on the absence of any push credential, so a forgotten or misconfigured branch-protection rule cannot expose main through the agent, because there was never a credential to misuse. The two designs aim at different objectives, and both are reasonable: their split optimises for review integrity, ours for agent containment.
OpenAI's Symphony (its published SPEC.md) was the conceptual model for several Squire features: the repo-owned workflow file, lifecycle hooks, and the operator state dashboard all trace to it. It is also the cleanest contrast we can draw, because the difference is documented on both sides. Symphony's threat model trusts repository content; ours treats it as poisonable. Symphony's hooks run with whatever authentication the orchestrator has; ours run with none, by design. Same surface feature, repo-controlled hooks, opposite trust assumption underneath. Most of our hardening decisions are the result of taking a pattern we liked and re-deriving it under a hostile-repo assumption.
Sandbox holds a push token and pushes the branch. The PR-open is moved to the user’s token for approval hygiene.
Trusts repo content; hooks run with the orchestrator’s auth. Strong on ergonomics, permissive on trust.
No write token in the sandbox at all. All writes happen outside it via deterministic replay; hooks run with no secrets.
Where we sit
Squire began as a convenience, a way to fire a task and get a PR back. What it turned into is an argument. We build the containment in so deliberately because of the behaviour change from Part I: the containment is what makes full offload rational. You can only take your hand off the wheel if the car cannot drive off a cliff. Once the dangerous actions are structurally impossible, fully offloading the work the agent is good at is simply the obviously correct allocation of a scarce resource: your own attention.
That is the loop we are trying to close. Cage the agent well enough that you can stop watching it, and spend the attention you get back on the problems where you are still the best thinker available.
References
- Eric J. Ma, "Calibration Is Synchronizing Feedback Loops With Neural Throughput"
- Ramp, "Why We Built Our Own Background Agent" (Inspect)
- OpenAI, Symphony