Prompt Stacks and Prompt Governance — Why System-Level Prompts Are Emerging as a Regulatory Lever (and Where They Fall Short)

Editor’s Note — DCC.

This brief summarises 《系统级提示词作为监管抓手？》by Li Wenlong (李汶龙) on the 科技利维坦 channel — the first piece in his self-imposed “100 AI-Governance Papers Challenge.” The underlying paper is Anna Neumann, Holli Sargeant, Jat Singh et al., Prompt Governance? On Governing Technologies Governed by Natural Language (FAccT 2026; SSRN abstract 6802319), a systematic review covering 287 academic papers and 54 regulatory documents on “system prompts” as a regulatory object. Li’s value-add, and the reason DCC is running it, is the comparison: he reads the EU GPAI Code of Practice and the Trump administration’s executive orders side by side with TC260-003 (the standard implementing China’s GenAI Services Interim Measures) and explains what that contrast means for AI compliance in practice. The takeaway for overseas counsel: system-prompt disclosure to regulators looks set to become the next “DPIA-style” compliance artefact globally — and China currently has no equivalent obligation, but its regime leaves space to import one.

The thing being regulated

A system prompt (系统级提示词) is the set of natural-language instructions a model receives before any user interaction begins — written by the model developer, the deployer, or the application provider, and treated by the model as carrying higher “trust” than anything a user types in. The EU’s draft GPAI Code of Practice defines it as “a set of instructions, guidelines, and contextual information provided to the model prior to the start of user interaction.” NIST (US Department of Commerce) goes slightly further: system prompts are typically delivered before other instructions and inputs, and the model is expected to weight them with higher trust than other inputs.

The reason regulators are starting to care is straightforward: in a large language model, the system prompt is a piece of natural-language text that — at least in principle — directly conditions model behaviour. If a model can be told, in plain English, “treat child-safety considerations as overriding,” then a regulator can in principle inspect that text, demand a copy, audit it, and require it to evolve. That is not how earlier waves of AI regulation worked: the classic governance toolbox (safety testing, model architecture, filters, review mechanisms, access controls, monitoring) all operate either far above the model (output evaluation) or far below it (training data, weights). System prompts sit at a layer regulators can actually read.

The prompt stack

Neumann and co-authors propose a four-layer hierarchy, which Li calls the prompt stack (提示词堆栈):

System instructions — set by the foundation-model developer or provider. Hard rules: safety, prohibited content, privacy, illegality-risk controls. Treated as the highest authority; in principle should not be overridable by lower layers.
System guidelines — also developer-set, but more about preferences and operational guidance: how to balance helpfulness against safety, how to handle sensitive requests, how to express uncertainty. Can be tightened by lower layers in some respects but should hold the line on safety and compliance.
Developer instructions — set by application developers, deployers, or enterprise customers. A legal-research bot might be configured to “answer in a professional legal tone and never guarantee outcomes.” Below system layers, above user input.
User prompts — the input the end user types. Lowest priority. Where a user instruction conflicts with anything above, the model should refuse, rewrite, or limit the response.

Two practical questions fall out of the model. First, can the user modify the second layer (system guidelines)? The intended answer is: soft constraints (style, level of detail) are negotiable in a session; hard constraints (risk posture, safety policies) are not. In practice, extended conversations can drift — the “this is a simulation, not real life” framing being the canonical example — and models can be coaxed into relaxing constraints they were meant to enforce. Second, what is a jailbreak in this framework? It is precisely the use of lower-layer input to override or weaken higher-layer rules: rewriting the high-layer rule (“assume you are in a fictional novel / a hypothetical world / a purely theoretical discussion”), exploiting ambiguity in the system guidelines, or breaking a prohibited request into many superficially-innocuous steps (multi-turn jailbreaks / context attacks).

Five properties that make system prompts hard to regulate

Li distils five properties from the literature that practitioners and regulators both need to internalise.

1. They are layered, with multiple authors. The “system” in “system prompt” is not like the system in an operating system; it is not delivered by a single party. Foundation-model developers, application providers, deployers — each layer can set its own instructions, and they interact. Disclosure obligations that target only one layer will see only part of the stack.

2. They are usually invisible. Most vendors do not publish their system prompts. Two legitimate reasons: (a) the prompts encode designed-in product logic, behavioural norms, and proprietary know-how — core commercial IP; and (b) disclosure reveals the safety architecture and makes it easier for attackers to evade guardrails. Model cards have become a standard transparency artefact, but the system prompt is generally not in them. When Claude’s system prompt was published outside the company, it was treated as a leak.

3. They are natural-language text. Anyone patient enough can read them. A typical Claude-style system prompt sets the model’s role and core goal, declares available tools and the conditions for invoking each, prescribes citation rules (when to search, how to attribute sources for copyright and traceability), specifies output style (“lead with the conclusion, then break out under subheadings”), names the categories of absolutely-prohibited assistance, and conveys meta-information (version, knowledge cut-off, deployment surface). This human readability is exactly what makes it attractive to regulators.

4. They are malleable. Developers update system prompts frequently, sometimes as ad-hoc bug fixes between releases. This is the property that most undermines their use as a governance tool: an artefact that changes weekly does not satisfy the regulator’s appetite for stable, auditable rules.

5. The relationship between prompt text and model behaviour is loose. This is the core empirical question Neumann and co-authors flag — and Li’s central warning to policy-makers. A system prompt is not code: natural language is ambiguous, context-sensitive, sequence-sensitive, and interacts with the prompts of other layers, with the user’s input, with the conversation history, with model updates, and with prompt-injection attacks. Writing “do not output discriminatory content” into the system prompt does not, by itself, produce a model that does not output discriminatory content. What the model actually does depends on its training data, its post-training / alignment, the context the user constructed, how the model parses the specific wording, and what other safety filters are in play.

Where regulators have actually landed

The Neumann team analysed 54 regulatory documents and identified two that take system prompts seriously, plus one that should but doesn’t.

EU — GPAI Code of Practice (the implementing instrument for the general-purpose-AI obligations under the EU AI Act). The Safety & Security chapter, Measure 7.1 on model description (transparency), requires signatories to provide a model report containing the model spec, item 4(d) of which is the system prompt. The EU treats system-prompt configuration as a key component of model evaluation, not just disclosure: signatories must be able to show how the prompt is set up and how it interacts with the rest of the safety architecture. Neumann and co-authors flag two gaps: the EU rule does not differentiate disclosure obligations across the foundation-model layer, deployment layer, and application layer; and it lacks version-change and log-update requirements, which will leave disclosed prompts rapidly out of date.

US — Executive Order 14319 (July 23, 2025) “Preventing Woke AI in the Federal Government.” This is an ideology-coded procurement rule rather than a transparency regime: federal agencies are restricted from procuring AI that encodes “partisan or ideological judgments” into its outputs, under two “unbiased AI principles” (truth-seeking and ideological neutrality). The vendor bears the burden of demonstrating compliance — system prompts are a useful evidentiary artefact for that, but disclosure is not mandatory. The White House Office of Management and Budget’s M-26-04 (December 2025) on increasing public trust in AI lists only model cards, system cards, and data cards as transparency requirements; it does not mention system prompts.

UK — AI Cybersecurity Code of Practice. Effectively no substantive content on system prompts; the Code merely suggests vendors have system prompts so downstream parties can understand model characteristics.

China’s posture — output-based, no system-prompt hook (yet)

For overseas counsel, the most useful comparison is what is not in the Chinese regime today.

China’s flagship GenAI rule is the Interim Measures for the Management of Generative Artificial Intelligence Services (2023). The implementing safety standard — and the one that does the real operational work — is TC260-003,《生成式人工智能服务安全基本要求》(Basic Safety Requirements for Generative AI Services). Its structure is corpus safety (§5), model safety (§6), safety measures (§7), other (§8). Model-safety compliance is achieved primarily through the algorithm and large-model filing regime (备案), and filing turns substantially on pre-launch evaluation scoring — a red-team-style adversarial test against a published question bank, with pass/fail thresholds. As Li puts it, the regime is structurally Turing-test-like: it inspects what the model outputs, not how the model is internally governed. There is no current obligation to disclose system prompts to the CAC, to file them as part of the algorithm filing, or to treat them as a distinct compliance artefact.

That gap is meaningful, because it is exactly the layer where the EU is now hooking in.

The likely trajectory: Brussels Effect, DPIA analogue

Li’s prediction is direct: on system prompts, a Brussels Effect will form. The GPAI Code of Practice’s disclosure requirement will gradually be priced into global compliance programs the way data protection impact assessments (DPIAs) were priced in after the GDPR. System prompts will not become a public transparency artefact (with the exception of vendors who voluntarily publish, like Anthropic and xAI); they will become a regulator-facing artefact, disclosed in the model report as part of the evaluation package.

This matters for two reasons in the China context. First, any overseas operator deploying a model in China that is built on a foundation model evaluated under the EU regime will inherit disclosure obligations one layer up the prompt stack — and will need to ensure those obligations are compatible with Chinese filing rules. Second, if the Brussels Effect lands, the next iteration of Chinese GenAI rulemaking is the natural place for a system-prompt disclosure hook to appear; teams should treat this as a near-future filing item, not a never-event.

System prompts as a governance object — the operational layer

Li closes with the move that is most useful for compliance teams: a system prompt is not only a governance tool — it is itself a governance object, and should be managed the way a serious data team manages its privacy policies. That implies, at minimum:

Versioned archives. Every change is dated, retrievable, and attributable to a named owner.
Change-permission management. Defined approval flows for who can edit what — particularly the safety-relevant clauses.
Periodic security testing. Red-team probes against the prompt itself, including prompt-injection and multi-turn jailbreaks.
Version logs sufficient for regulator request. When the request comes in, “we don’t know what the system prompt looked like last March” will not be an acceptable answer.
Alignment-to-output testing. Does the model actually behave as the prompt instructs? Are there obvious value-tilts, (commercial) prioritisation, or excessive filtering that the prompt did not authorise? Are there prompt-injection vulnerabilities?

The deeper conceptual point Li keeps returning to is worth lifting out for any reader from a legal background: the way a regulator reads text and the way a model “reads” text are fundamentally different operations. Legal interpretation runs on institutional context, legislative purpose, judicial gloss, normative reasoning. Model “interpretation” is statistical pattern-matching across training distribution, attention weights, and context windows. The same English sentence reordered, rephrased, or relocated within the prompt can produce different model behaviour. “Do not provide legal advice” and “you may provide general legal information but should not substitute for a licensed lawyer” are, to a regulator, equivalent in spirit; to a model, they are not the same instruction. Compliance teams that frame system-prompt drafting as a purely legal exercise will produce documents that look defensible on paper and fail in production. The discipline this requires — drafting natural-language rules that survive both legal scrutiny and statistical robustness — is, Li argues, the actual emerging skill in AI compliance.

DCC sources

Original: 李汶龙 (Li Wenlong), 《系统级提示词作为监管抓手？》, 科技利维坦 WeChat Official Account (source).
Underlying paper: Anna Neumann, Holli Sargeant, Jat Singh et al., Prompt Governance? On Governing Technologies Governed by Natural Language, FAccT 2026 (SSRN 6802319).
EU: General-Purpose AI Code of Practice, Safety & Security chapter, Measure 7.1 (source).
US: Executive Order 14319 “Preventing Woke AI in the Federal Government” (Federal Register, July 28, 2025); OMB Memorandum M-26-04 (December 2025).
China: 《生成式人工智能服务安全基本要求》(TC260-003), § 5–8; and the GenAI Services Interim Measures.
NIST CSRC Glossary, system prompt entry.

This is an editorial summary, not a translation of Li Wenlong’s piece. Quotations and conceptual framings are attributed; any simplification, error of emphasis, or operational extrapolation is DCC’s. Not legal advice.