When Every File is an Attack Vector: Why Agent Governance Must Live Outside the Runtime

The LiteLLM supply chain compromise on March 24 was a pip package that stole credentials, such as SSH keys, cloud provider credentials, Kubernetes configs, database passwords, and API keys. That is a serious incident, and the architectural lessons it exposed are significant. But the harder conversation is about what comes next.

AI agents in 2026 do not just call APIs. They read your filesystem. They process your documents, configs, and code. They maintain persistent memory across sessions. They spawn sub-agents. They act on your behalf at machine speed, often without a human reviewing each step.

The LiteLLM attack exploited the Python package supply chain. The next class of attacks will exploit something far larger: the entire input surface of the agent itself.

Your Filesystem Is the New Codebase

When a coding agent operates with filesystem access, every file it reads becomes a potential instruction. A poisoned PDF in a shared folder can carry hidden prompt injections. A modified .env file can redirect agent behavior. A malicious markdown document in a project directory can instruct an agent to exfiltrate data, modify code, or propagate itself to other files the agent touches.

This is not theoretical. Microsoft's security team has documented how indirect prompt injection works in practice: an attacker never talks to the agent directly. Instead, they poison the data sources the agent reads. When the agent retrieves that content through tool calls (emails, documents, support tickets, web pages, database entries), the malicious instructions ride along, invisible to human reviewers, fully legible to the model.

The structural problem is that LLMs process instructions and data as tokens in the same context window. The agent cannot reliably distinguish between a system prompt from its operator and an instruction embedded in a document it just ingested. As researchers in the "Agents of Chaos" study (MIT, Harvard, Stanford, CMU) put it: prompt injection is a structural feature, not a fixable bug.

In a vibe coding workflow, where developers describe intent in natural language and agents generate entire applications, this surface expands further. The agent reads project files, configs, dependencies, and documentation as context for its work. Every one of those files is now part of the instruction set. Every text file can carry a payload. The attack surface is not the chat interface. It is the entire data environment.

Vibe Coding Accelerates the Problem

The rise of vibe coding has made this structural vulnerability operationally urgent.

A CodeRabbit analysis of 470 open-source GitHub pull requests found that AI co-authored code contained 2.74x more security vulnerabilities than human-written code, along with 75% more misconfigurations. Veracode's GenAI Code Security Report found that 45% of AI-generated code introduces security vulnerabilities. These are not edge cases. They are base rates.

The Moltbook incident in February 2026 demonstrated the consequences. The AI-agent social network was built entirely through vibe coding, with no human-written code. Wiz discovered a misconfigured Supabase database exposing 1.5 million authentication tokens and 35,000 email addresses to the public internet. The root cause was not a sophisticated attack. It was the absence of a security review in a process optimized for speed.

There is a pattern here. Coding agents optimize for making code run, not making code safe. Research from Columbia University documented how agents routinely remove validation checks, relax database policies, and disable authentication flows to resolve runtime errors. The constraint causing the error is sometimes the security guard, and the agent removes it to satisfy the prompt.

When agents generate code at this velocity and with these failure modes, treating the agent's runtime as a trusted environment becomes untenable. The code it writes may be insecure. The files it reads may be poisoned. The dependencies it pulls may be compromised (as LiteLLM demonstrated). Every layer of the agent's input and output is potentially adversarial.

The Impersonation Problem

There is a dimension to this that goes beyond credential theft.

An AI agent with access to your filesystem, communication history, code repositories, and working documents has enough context to impersonate you. Not in the crude phishing sense of faking an email header, but in the deeper sense of replicating your communication patterns, decision-making style, and domain knowledge across every platform you use.

A compromised agent does not just steal your credentials. It becomes a distorted version of your digital identity, capable of acting on your accounts, writing in your voice, and making decisions that appear to be yours. The persistent memory that makes agents useful (remembering your preferences, your projects, your colleagues) is the same memory that makes compromise catastrophic. The agent knows enough to be you, and it operates faster than you can intervene.

This is why agent authorization cannot be implemented in the agent's runtime. If the runtime is compromised, every control embedded in it, including identity verification, access policies, and behavioral guardrails, is compromised with it.

Claws Need Shells

The security community is converging on a principle that infrastructure engineers have understood for decades: enforcement must be independent of the thing being enforced.

A firewall does not run inside the application it protects. A network policy does not depend on the container it restricts. In the same way, agent governance cannot live inside the agent runtime. The runtime is what gets compromised, whether through supply-chain attacks (LiteLLM), poisoned context (indirect prompt injection), or insecure generated code (vibe coding failures). Controls that depend on the integrity of that runtime are not controls. They are assumptions.

Defense in depth for AI agents requires multiple independent enforcement layers, each operating in its own trust boundary:

At the API layer: Rate limiting, authentication, and data loss prevention detect anomalous behavior at the network edge. An agent suddenly exfiltrating credential files to an unfamiliar domain triggers enforcement that the agent itself cannot disable.

At the AI interaction layer: Content safety, prompt defense, and PII filtering govern what goes into and comes out of the LLM. These controls run at the infrastructure layer, outside the application process, so they remain intact even when the runtime is compromised.

At the tool access layer: Task-scoped, tool-level authorization ensures that a compromised agent cannot pivot laterally to systems beyond its permitted scope. Authorization is cryptographically attested by an identity provider and enforced at the gateway, not granted by application code that the agent controls.

The critical property is independence. Each layer operates in its own trust boundary. Compromising the agent's runtime (through a poisoned file, a malicious dependency, or a prompt injection) does not disable enforcement at the other layers. This is what separates architecture from aspiration.

The Governance Gap Will Not Close Itself

The data on this is stark. According to Okta's AI at Work report, 91% of organizations are already deploying AI agents, yet only 10% have a well-developed strategy for managing non-human identities. The CrowdStrike 2026 Global Threat Report documented an 89% year-over-year increase in AI-enabled adversary attacks, with average breakout time falling to just 29 minutes and the fastest observed breakout occurring in 27 seconds. In one intrusion, data exfiltration began within four minutes of initial access.

A Dark Reading poll found that 48% of cybersecurity professionals now identify agentic AI as the top attack vector heading into 2026, outranking deepfakes, board-level cyber recognition, and passwordless adoption.

Application-level controls are necessary but insufficient. When the agent's runtime is the attack surface (not just the network or the endpoint, but the code, the context window, and the filesystem), governance must operate at a layer the agent cannot reach. That means infrastructure-layer enforcement: independent of the agent framework, below the application, governing every API call, LLM interaction, and tool invocation as it crosses the network.

Three Questions Every Team Should Ask

If a poisoned file entered your agent's context window, which controls would survive? If every guardrail runs inside the same runtime that processes the file, the answer may be none.
Can your agent disable its own governance? If access control, content safety, and tool authorization are implemented as libraries in the agent's process, a prompt injection or compromised dependency can override them. Infrastructure-layer enforcement operates in a separate trust boundary that the agent cannot modify.
How many independent enforcement layers exist between a compromised agent and your most sensitive systems? The LiteLLM breach showed what happens when the answer is zero. The vibe coding era, where agents generate code, read untrusted files, and spawn sub-agents at machine speed, makes this question existential.

The LiteLLM attack compromised a library. The next wave of attacks will compromise the files agents read, the code agents write, and the memory agents accumulate. The question is not whether your agent will encounter adversarial input. It is whether your architecture survives when it does.

Traefik Hub's Triple Gate Pattern (API Gateway, AI Gateway, MCP Gateway with TBAC) implements the independent, infrastructure-layer enforcement described in this post. For a technical walkthrough of how the composable safety pipeline works across these layers, see From Regex to GPU: Building a Multi-Vendor AI Safety Pipeline.