Key takeaways
- Start from the boundary, not the tool. A coding agent crosses five trust boundaries every session — prompt to plan, plan to shell, shell to network, agent to repo, repo to production. Pick which boundary holds the human-in-the-loop and design the rest around that single decision.
- The 2025/2026 wave of agent risk is dominated by prompt injection from indirect inputs — a fetched web page, an issue body, a README, a stray Markdown comment in a dependency. Treat any text the agent reads from outside your repo as adversarial by default.
- For small teams the highest-ROI move is a strict allow-list of shell commands and network destinations. A 20-line allow-list beats a 200-page policy document every time, because it is the only control the agent itself cannot ignore.
- Never let a coding agent run with both write access to your repo and arbitrary network egress at the same time. If you need both, split them across two agents and a review gate; that single rule eliminates the majority of credible exfiltration paths.
- Auto-merge is the danger surface most teams underprice. Require a human reviewer for any PR that touches dependencies, CI workflows, IAM, infra-as-code, or anything under .github/ — regardless of how small the diff looks.
- The most important secret-hygiene move in 2026 is not rotating keys faster; it is making sure the agent never sees them in the first place. Scoped tokens, ephemeral OIDC-issued credentials, and a separate sandbox identity are worth more than any redaction filter.
Why this decision matters more than it looks
The coding-agent category crossed an important line in 2025: agents are no longer just autocomplete in a sidebar, they run shells, fetch URLs, edit files, and open pull requests on their own. The OWASP GenAI Security Project responded by publishing the Top 10 for Agentic Applications 2026 — a peer-reviewed list of the failure modes that show up once a model can “plan, persist, and delegate across tools and systems.” The list reads less like classical AppSec and more like a checklist for an over-eager intern who has root, internet, and your AWS keys at the same time.
The hard part for a small team is that none of the major coding agents — Claude Code, GitHub Copilot coding agent, OpenAI Agents SDK builds, Cursor, Codex CLI — are insecure out of the box. They are over-permissioned out of the box. The defaults assume a single trusted developer on a clean machine. The risks live in the spread between that assumption and your actual setup: a junior dev, a shared dev box, a forked PR from outside the org, an MCP server someone installed in five seconds.
The five trust boundaries a coding agent always crosses
Strip the marketing and a coding agent always crosses the same five boundaries. Naming them out loud is the difference between a policy you can apply on Monday morning and a policy that lives only in a Notion doc.
- Prompt → Plan. The agent reads input — your prompt, an issue, a fetched page, a tool output, a README — and turns it into a plan. Anything the agent reads from outside your repo is untrusted; treat it the same way you would treat a user-submitted form field.
- Plan → Shell. The agent runs commands. This is the boundary where allow-lists and sandboxing pay for themselves — a single curl | bash or rm -rf can finish what a misread prompt started.
- Shell → Network. The agent calls APIs, fetches URLs, and talks to MCP servers. This is the exfiltration surface; egress that is not on an allow-list is the single most reliable signal of compromise.
- Agent → Repo. The agent edits files, stages commits, and opens PRs. The danger is not the edit, it is the quiet edit — a swapped dependency version, a new GitHub Actions step, a tweaked Dockerfile — that slips past a tired reviewer.
- Repo → Production. CI runs the agent’s code with real secrets. This is the boundary that decides whether a bad PR is a code-review event or a paging event.
The 2026 risk matrix mapped to actual controls
- Boundary — 2026 risk (OWASP Agentic Top 10 + practice) — Concrete control a small team can apply this week
- Prompt → Plan — Indirect prompt injection from fetched web pages, issue bodies, READMEs, and tool outputs (the “trusted-looking text” class) — Disable autonomous WebFetch in Claude Code or restrict it to an allow-list of domains; in OpenAI Agents SDK, wire an input guardrail that flags inputs containing instruction-like text; never paste raw external text into a system prompt without quoting and labelling it as untrusted.
- Plan → Shell — Excessive agency — the agent has shell, root, and your dotfiles by default; one bad plan can rewrite history or delete data — Run the agent in a container or VM with no host filesystem mount beyond the project directory; use Claude Code’s deny rules (Bash(rm *), Bash(git push –force*), Bash(curl *|*sh*)); in OpenAI Agents SDK, wrap tools with a tool guardrail that pattern-matches the command before execution.
- Shell → Network — Tool poisoning and data exfiltration via attacker-controlled APIs or malicious MCP servers — Strict outbound allow-list at the OS or container level; review every MCP server before installing it (treat it like a Chrome extension that ships with your laptop password); never run unsigned community MCP servers from an environment that holds real secrets.
- Agent → Repo — Silent dependency / workflow tampering, auto-approve PRs, backdoored helper code — Require human review on any PR touching package.json, requirements.txt, lockfiles, .github/, IAM, or infra-as-code; turn off auto-merge for AI-authored PRs; sign your commits and reject unsigned agent commits.
- Repo → Production — Agent-authored CI step exfiltrates GITHUB_TOKEN, deploys with wrong env, or bypasses required reviewers — Use environment protection rules with required reviewers; scope GITHUB_TOKEN to least privilege; issue ephemeral OIDC credentials for cloud deploys; never store long-lived production keys in dev secrets.
Two non-obvious entries deserve flagging. First, the MCP-server install path is the most under-rated supply-chain risk of 2026 — a server installed in five seconds can read every prompt, every tool output, and every file the agent touches. Treat the MCP install list the same way you treat your npm dependency list, with the same review and the same lockfile mindset. Second, “auto-approve PRs” is not just a GitHub setting — many teams have effectively auto-approved PRs by routing them to a tired on-call rotation. A reviewer who clicks “Approve” in three seconds is functionally an automation.
A one-page checklist that fits the way small teams actually work
- If your situation is… — Apply this first — Reason
- You let Claude Code or Codex CLI run shell on your laptop — Add deny rules for destructive Bash patterns; move risky work to a container with no host secrets — Claude Code evaluates rules in deny → ask → allow order; a single deny rule for Bash(curl *|*sh*) and Bash(git push --force*) eliminates the two most common “agent ran a script from the internet” incident patterns.
- You use the OpenAI Agents SDK to build internal agents — Wrap every tool with an input guardrail and a tool guardrail; treat guardrails as layered defence — The SDK gives you input, output, and tool guardrails as first-class concepts; layered guardrails are the documented mitigation pattern for agentic risks, and they run concurrently with the main agent so latency is mostly hidden.
- You use the GitHub Copilot coding agent for cloud-side PRs — Restrict to repository write-access users; require human approval on workflow runs; turn on content exclusions — GitHub’s own documentation requires that any GitHub Actions workflow triggered by a Copilot-raised PR needs explicit approval from a write-access user before it runs — that gate is the single most important risk control on the cloud side.
- You let agents call external MCP servers — Pin the MCP server list; review each one; never connect MCP servers from an environment with real production secrets — MCP servers see every prompt and tool output by design. An unreviewed community server is an undocumented data egress channel that no firewall will catch.
- You ship agent-authored code to production — Required reviewer for .github/, dependencies, IAM, and infra-as-code; ephemeral OIDC creds for deploys — These four paths are how a clean-looking PR turns into a production incident; gating them protects you even when you misread the diff at 6pm on a Friday.
- You want a written policy to align the team on — Anchor to NIST AI RMF + OWASP Agentic Top 10; record control owners by name — NIST AI RMF gives you the Govern / Map / Measure / Manage frame, and the OWASP Agentic Top 10 gives you the specific failure modes to map to. Recorded owners are what turns a policy into a control.
Mistakes to skip on the way
- Trusting any text the agent reads. Prompt injection is now overwhelmingly “the agent fetched a page and the page told it what to do.” If your agent reads anything from outside your repo — a search result, an issue body, a doc page, a tool output — treat that text the same way you would treat a form field on a public site.
- Confusing “sandbox” with “container.” A container that mounts your home directory and shares your network namespace is a fancy chroot, not a sandbox. The agent can still read your dotfiles, your SSH keys, your local CLI sessions, and your AWS credentials. Either drop those mounts or accept that your laptop is in scope.
- Granting both write access and arbitrary egress to the same agent. This is the single combination that turns mistakes into incidents. Split them: one agent has write access but no internet, another has the internet but only read access to your code. Wire the handoff through a human review or a checked-in pipeline.
- Treating MCP servers as plugins. They are not plugins; they are a new dependency surface with no lockfile by default. Pin them, review them, and keep a separate list of which servers may run in a secrets-bearing environment.
- Auto-merging anything an agent touched. Even the best models silently drift on dependency versions, GitHub Actions steps, and shell scripts. Reserve auto-merge for boilerplate changes the agent did not author — not for the agent’s own PRs.
- Hoping a redaction filter will save you. It will not. The right move is upstream: the agent should never see the secret. Use scoped tokens, ephemeral OIDC, and a sandbox identity that has nothing worth stealing.
Sources
- OWASP Top 10 for Agentic Applications 2026 — OWASP GenAI Security Project — used as the spine of the risk matrix: peer-reviewed framing of agentic risk as “what happens when models can plan, persist, and delegate across tools and systems,” and the categories that map onto the five trust boundaries in this guide.
- Configure permissions — Claude Code documentation — used for the Claude Code rows: the deny → ask → allow evaluation order, the difference between a bare-tool deny rule (e.g. Bash) and a scoped pattern (e.g. Bash(rm *)), and the layered model that separates permissions from OS-level sandboxing.
- Guardrails — OpenAI Agents SDK documentation — used for the OpenAI Agents SDK row: input guardrails, output guardrails, and tool guardrails as first-class concepts; the “optimistic execution + concurrent guardrails” pattern; and the explicit recommendation to treat guardrails as a layered defence rather than a single check.
- Risks and mitigations for GitHub Copilot cloud agent — GitHub Docs — used for the Copilot coding-agent row: the requirement that PRs raised by the cloud agent only run GitHub Actions workflows after explicit approval from a user with repository write access, plus content-exclusion controls for sensitive files.
- NIST AI Risk Management Framework — used for the policy-anchoring recommendation: the Govern / Map / Measure / Manage frame that lets a small team turn the OWASP risk list into named owners and concrete controls rather than a one-off document.
Related reading
- LLM Eval Frameworks for Small Teams in 2026: A Practical Buyer’s Guide
- AI Agent Orchestration Frameworks for Small Teams in 2026: A Practical Buyer’s Guide
- AI Agent Observability for Small Teams in 2026: A Practical Buyer’s Guide
- Agent Harnesses for Coding Agents in Small Teams (2026)
- Spec-Driven Development for Small Teams in 2026 — When It Pays Off, When It’s Overkill
How to use this guide
LumoMate turns complex technical topics into judgment you can act on. Read the key takeaways first, then follow the source links below and verify the details before you make a decision.
Editorial standards: this guide was researched from primary sources, drafted with AI assistance, and reviewed by a human editor for accuracy and clarity. We update it when the facts change. More on how we research and review.