AI prompt classification taxonomy: building the label set your gateway enforces against
AI prompt classification is the labelling step that produces the inputs a policy engine evaluates. The label set has to cover four dimensions: data sensitivity (PII, PHI, PCI, IP, public), intent (query, generation, code execution, agent action), risk surface (egress, lateral, instruction injection), and regulatory scope (EU AI Act high-risk, HIPAA PHI, GDPR Article 22). The policy decision joins the four dimensions against the per-user role and the per-route rule. The taxonomy is the artefact the regulator inspects when the gateway answers an audit question about why a given prompt was redacted, blocked or allowed.

OWASP LLM Top 10 lists LLM01 prompt injection and LLM06 sensitive information disclosure among the top failure modes for production AI systems. Both modes share a precondition: the gateway has to know what kind of prompt it is looking at before it can decide what to do. The label is what a classification taxonomy produces. A taxonomy with four dimensions (data sensitivity, intent, risk surface, regulatory scope) gives the policy engine the inputs it joins against the per-user role and the per-route rule. The labels are deterministic at the gateway, recorded in the audit log and referenced by the policy version that governed the decision.
I want to walk through the four dimensions of the taxonomy, why each one is necessary on its own, what the joined policy decision looks like, how the labels map back to OWASP LLM01 and LLM06, and how the taxonomy is recorded so the audit trail is reviewable at year three.
The four dimensions and what they each measure
The taxonomy is the label set against which the policy engine evaluates each prompt. Each dimension answers a different question. Data sensitivity answers what the prompt contains. Intent answers what the prompt is asking the model to do. Risk surface answers where the failure would land. Regulatory scope answers which legal regime applies.
Each dimension produces a single value or a small set of values per prompt. The classifier evaluates each dimension independently. The combined output is the four-tuple (sensitivity, intent, risk_surface, regulatory_scope) that the policy engine consumes.
Data sensitivity: what the prompt contains
The first dimension classifies the prompt content against a sensitivity scale. The scale aligns to the regulated data categories the deployer has obligations against. PII covers data points that identify a natural person under GDPR Article 4(1). PHI covers the 18 HIPAA identifiers under 45 CFR 164.514(b)(2)(i). PCI covers cardholder data under PCI DSS 4.0 Section 3.4. IP covers trade secrets under the EU Trade Secrets Directive 2016/943 and the US Defend Trade Secrets Act of 2016.
The classifier is a chain: a regex pass for structured patterns (credit card numbers under the Luhn check, Social Security Numbers, IBANs), a named-entity recognition pass for unstructured patterns (names, dates of birth, medical record numbers) and a context check on the surrounding tokens. The output is the highest-sensitivity label found in the prompt. The label is recorded with the policy version applied at the moment of evaluation.
Intent: what the prompt is asking the model to do
The second dimension is the action the prompt is requesting. A query asks the model to retrieve information. A generation asks the model to produce content. A code execution asks the model to produce code that the application then runs, which carries a different risk than code shown to a developer for review. An agent action asks the model to call a tool, write to a system or change external state.
The intent labels are not exclusive: a prompt that asks "write the SQL to extract all PHI for patient ID 12345 and execute it" carries both generation and code_exec and agent_action. The classifier emits the set. The policy engine evaluates the most restrictive label first. The regulator inspects the recorded intent set when the audit question asks why an agent action was allowed.
Risk surface: where a failure would land
The third dimension is the operational risk the prompt would expose. Egress covers data leaving the perimeter to an external LLM provider. Lateral covers a prompt that crosses a trust boundary inside the perimeter, such as a customer-service prompt that retrieves data from a finance-system tool. Injection covers the OWASP LLM01 case where the prompt contains an attempt to override the system instructions. Disclosure covers the OWASP LLM06 case where the prompt is structured to extract sensitive information the model is not supposed to return.
The classifier here is signature-based for the structural patterns and behavioural for the contextual ones. The injection detector looks for the prompt patterns enumerated in OWASP LLM01 guidance: instruction overrides, role-play prompts, encoded instructions, indirect injection through retrieved context. The disclosure detector looks for the extraction patterns enumerated in LLM06: training data extraction, embedded credential extraction, system prompt extraction.
Regulatory scope: which legal regime applies
The fourth dimension is the regulatory regime that governs the prompt. An EU resident calling a high-risk Annex III deployment triggers the EU AI Act high-risk scope. A US hospital prompt that touches PHI triggers HIPAA. A prompt that produces a solely automated decision with legal effect triggers GDPR Article 22. A financial-services prompt under DORA scope triggers the Article 19 ICT-incident reporting path on failure. A prompt inside a SOX 404 internal-control boundary triggers the SOX evidence-retention rules.
The regulatory scope is not classified from the prompt content directly. The scope is attached at the route level: the gateway knows which deployment the prompt is hitting and which regulatory regime applies to that deployment. The classifier reads the route metadata, attaches the scope and records it on the per-decision log. The scope is what the regulator inspects when the audit question asks under which legal basis the decision was made.
How the policy decision joins the four dimensions
The policy engine consumes the four-tuple together with the per-user role and the per-route rule. The join produces a single decision: pass, redact or block. The decision is contemporaneous and recorded with the policy version. A prompt classified as (phi, code_exec, egress, hipaa_phi) from a role that is not authorised for PHI egress is blocked at the gateway with the HIPAA scope recorded as the basis. A prompt classified as (pii, query, egress, gdpr_art22) from a role authorised for the route is redacted at the PII tokens before the request reaches the model.
The decision logic is documented as a rule set, the rule set is version-controlled, and the rule version is bound to the per-decision log at the moment of evaluation. The audit trail at year three lets the auditor reconstruct why the gateway returned the decision it did, against the policy version live at that moment.
DeepInspect
This is the prompt classification layer DeepInspect operates against every HTTP request. DeepInspect sits as a stateless proxy between authenticated users or agents and any LLM endpoint. The classifier evaluates the four dimensions in line, the four-tuple is committed to the per-decision log, and the policy engine joins the four-tuple against the verified identity, the role context and the per-route rule before the model receives the request.
The classification result is recorded with the policy version that governed the decision and the cryptographic signature that prevents post-hoc modification. The recorded labels are reviewable at any point inside the retention window. A regulator inspecting a decision at year three sees the four-tuple, the role, the rule version and the decision outcome together, against the policy artefact that was live at the moment of the request.
Book a demo today.
Frequently asked questions
- Why four dimensions and not a single sensitivity label?
A single sensitivity label collapses the decision space. A prompt with PHI sent as a query from an authorised role for an internal retrieval task carries a different policy outcome than the same PHI sent as an agent action that writes to an external system. The intent and the risk surface change the answer. Four dimensions keep the decision space intact and let the policy engine evaluate each axis against the appropriate rule.
- How does the taxonomy map to OWASP LLM01 prompt injection?
LLM01 is a risk-surface label inside the taxonomy. The classifier evaluates the prompt for injection patterns enumerated in the OWASP guidance: instruction overrides, role-play overrides, encoded instructions, indirect injection through retrieved context. A prompt that triggers the injection label is blocked at the gateway with the policy version that governed the decision recorded on the log. The OWASP reference is recorded in the rule set so the audit trail explains the basis.
- How does the taxonomy map to OWASP LLM06 sensitive information disclosure?
LLM06 is the disclosure label inside the risk_surface dimension. The classifier evaluates the prompt for extraction patterns: training-data extraction probes, embedded-credential extraction probes, system-prompt extraction probes. A prompt that triggers the disclosure label is blocked at the gateway. The behaviour is paired with the sensitivity dimension: a disclosure-labelled prompt against a route carrying PHI or PCI scope is treated as a higher-severity event than the same prompt against a public-data route.
- Who maintains the taxonomy over time?
The taxonomy is maintained as a versioned artefact alongside the policy rules. The data sensitivity dimension is updated when a new regulated category is added (the EU AI Act biometric category, for example). The intent dimension is updated when a new model capability is exposed (a new agent-action class, a new code-exec mode). The risk surface and regulatory scope dimensions are updated when OWASP releases a new top-10 revision or when a new regulation enters scope. The version of the taxonomy is bound to each per-decision log.
- How does the classification result interact with the policy version?
The four-tuple is the input. The policy version is the rule set that consumes the input. Both are recorded on the per-decision log together with the decision outcome. The auditor at year three reads the four-tuple, reads the policy version, replays the rule against the input and reproduces the decision. The reproducibility is what the EU AI Act Article 12 traceability obligation requires.
- What happens when the classifier is uncertain?
The default decision under uncertainty is deny. The classifier emits the four-tuple with a confidence score per dimension. Where a confidence score falls below the configured threshold, the policy engine treats the dimension as the most restrictive label available and applies the corresponding rule. The denial is recorded with the uncertainty as the basis. The fail-closed posture aligns with the zero-trust principle applied to LLM traffic and with the EU AI Act expectation that high-risk systems default to safe behaviour under uncertainty.