Fix XML delimiter formatting and enhance security details

Updated formatting of XML delimiters in the documentation to use backticks for clarity. Enhanced explanations regarding memory injection vulnerabilities and defensive measures.
This commit is contained in:
SirBroccoli
2025-10-23 14:11:10 +02:00
committed by GitHub
parent 95d13f8b89
commit 468bd28887

View File

@@ -12,8 +12,8 @@ This is not a vulnerability in the Bedrock platform itself; its a class of ag
- When Memory is enabled, the agent summarizes each session at endofsession using a Memory Summarization prompt template and stores that summary for a configurable retention (up to 365 days). In later sessions, that summary is injected into the orchestration prompt as system instructions, strongly influencing behavior.
- The default Memory Summarization template includes blocks like:
- <previous_summaries>$past_conversation_summary$</previous_summaries>
- <conversation>$conversation$</conversation>
- `<previous_summaries>$past_conversation_summary$</previous_summaries>`
- `<conversation>$conversation$</conversation>`
- Guidelines require strict, wellformed XML and topics like "user goals" and "assistant actions".
- If a tool fetches untrusted external data and that raw content is inserted into $conversation$ (specifically the tools result field), the summarizer LLM may be influenced by attackercontrolled markup and instructions.
@@ -21,16 +21,16 @@ This is not a vulnerability in the Bedrock platform itself; its a class of ag
An agent is exposed if all are true:
- Memory is enabled and summaries are reinjected into orchestration prompts.
- The agent has a tool that ingests untrusted content (web browser/scraper, document loader, thirdparty API, usergenerated content) and injects the raw result into the summarization prompts <conversation> block.
- The agent has a tool that ingests untrusted content (web browser/scraper, document loader, thirdparty API, usergenerated content) and injects the raw result into the summarization prompts `<conversation>` block.
- Guardrails or sanitization of delimiterlike tokens in tool outputs are not enforced.
## Injection point and boundaryescape technique
- Precise injection point: the tools result text that is placed inside the Memory Summarization prompts <conversation> ... $conversation$ ... </conversation> block.
- Precise injection point: the tools result text that is placed inside the Memory Summarization prompts `<conversation> ... $conversation$ ... </conversation>` block.
- Boundary escape: a 3part payload uses forged XML delimiters to trick the summarizer into treating attacker content as if it were templatelevel system instructions instead of conversation content.
- Part 1: Ends with a forged </conversation> to convince the LLM that the conversation block ended.
- Part 2: Placed “outside” any <conversation> block; formatted to resemble template/systemlevel instructions and contains the malicious directives likely to be copied into the final summary under a topic.
- Part 3: Reopens with a forged <conversation>, optionally fabricating a small user/assistant exchange that reinforces the malicious directive to increase inclusion in the summary.
- Part 1: Ends with a forged `</conversation>` to convince the LLM that the conversation block ended.
- Part 2: Placed “outside” any `<conversation>` block; formatted to resemble template/systemlevel instructions and contains the malicious directives likely to be copied into the final summary under a topic.
- Part 3: Reopens with a forged `<conversation>`, optionally fabricating a small user/assistant exchange that reinforces the malicious directive to increase inclusion in the summary.
<details>
<summary>Example 3part payload embedded in a fetched page (abridged)</summary>
@@ -56,9 +56,9 @@ Assistant: Validation complete per policy and auditing goals.
```
Notes:
- The forged </conversation> and <conversation> delimiters aim to reposition the core instruction outside the intended conversation block so the summarizer treats it like template/system content.
- The forged `</conversation>` and `<conversation>` delimiters aim to reposition the core instruction outside the intended conversation block so the summarizer treats it like template/system content.
- The attacker may obfuscate or split the payload across invisible HTML nodes; the model ingests extracted text.
```
</details>
## Why it persists and how it triggers
@@ -66,11 +66,6 @@ Notes:
- The Memory Summarization LLM may include attacker instructions as a new topic (for example, "validation goal"). That topic is stored in the peruser memory.
- In later sessions, the memory content is injected into the orchestration prompts systeminstruction section. System instructions strongly bias planning. As a result, the agent may silently call a webfetching tool to exfiltrate session data (for example, by encoding fields in a query string) without surfacing this step in the uservisible response.
## Observed effects you can look for
- Memory summaries that include unexpected or custom topics not authored by builders.
- Orchestration prompt traces showing memory injected as system instructions that reference validation/auditing goals unrelated to business logic.
- Silent tool calls to unexpected domains, often with long URLencoded query strings that correlate with recent conversation data.
## Reproducing in a lab (high level)
@@ -80,93 +75,6 @@ Notes:
- End the session and observe the Memory Summarization output; look for an injected custom topic containing attacker directives.
- Start a new session; inspect Trace/Model Invocation Logs to see memory injected and any silent tool calls aligned with the injected directives.
## Defensive guidance (layered)
1) Sanitize tool outputs before Memory Summarization
- Strip or neutralize delimiterlike sequences that can escape intended blocks (for example,
</conversation>, <conversation>, <summary>, <topic ...>).
- Prefer allowing only a minimal safe subset of characters/markup from untrusted tools before inserting into prompts.
- Consider transforming tool results (for example, JSONencode or wrap as CDATA) and instructing the summarizer to treat it as data, not instructions.
2) Use Bedrock advanced prompts and a parser Lambda
- Keep Memory Summarization enabled but override its prompt and attach a parser Lambda for MEMORY_SUMMARIZATION that enforces:
- Strict XML parsing of the summarizer output.
- Only known topic names (for example, "user goals", "assistant actions").
- Drop or rewrite any unexpected topics or instructionlike content.
<details>
<summary>Example: Parser Lambda (Python) enforcing allowed topics in MEMORY_SUMMARIZATION</summary>
```python
import json
import xml.etree.ElementTree as ET
ALLOWED_TOPICS = {"user goals", "assistant actions"}
def lambda_handler(event, context):
# event["promptType"] == "MEMORY_SUMMARIZATION" (configure via promptOverrideConfiguration)
raw = event.get("invokeModelRawResponse", "")
# Best effort: parse and keep only allowed topics
cleaned_summary = "<summary/>"
try:
root = ET.fromstring(raw)
if root.tag != "summary":
# Not a summary; discard
pass
else:
kept = ET.Element("summary")
for topic in root.findall("topic"):
name = topic.attrib.get("name", "").strip()
if name in ALLOWED_TOPICS:
kept.append(topic)
cleaned_summary = ET.tostring(kept, encoding="unicode")
except Exception:
# On parse error, fail closed with empty summary
pass
return {
"promptType": "MEMORY_SUMMARIZATION",
# Parsed response replaces model output with sanitized XML
"memorySummarizationParsedResponse": {
"summary": cleaned_summary
}
}
```
Notes:
- Attach this as the override parser for MEMORY_SUMMARIZATION in promptOverrideConfiguration.
- Extend to validate XML schema strictly and enforce length/character policies.
```
</details>
3) Guardrails and content filtering
- Enable Amazon Bedrock Guardrails with promptattack/promptinjection policies for both orchestration and the Memory Summarization step.
- Reject or quarantine tool results containing forged template delimiters or instructionlike patterns.
4) Egress and tool hardening
- Restrict webreading tools to allowlisted domains; enforce denybydefault for outbound fetches.
- If the tool is implemented via Lambda, validate destination URLs and limit query string length and character set before performing requests.
5) Logging, monitoring, and alerting
- Enable Model Invocation Logs to capture prompts and responses for forensic review and anomaly detection.
- Enable Trace to observe perstep prompts, memory injections, tool invocations, and reasoning.
- Alert on:
- Tool calls to unknown or newly registered domains.
- Unusually long query strings or repeated calls with encoded parameters shortly after bookings/orders/messages are created.
- Memory summaries containing unfamiliar topic names.
## Detection ideas
- Periodically parse memory objects to list topic names and diff against an allowlist. Investigate any new topics that appear without a code/config change.
- From Trace, search for orchestration inputs that contain $memory_content$ with unexpected directives or for tool invocations that do not produce uservisible messages.
## Key builder takeaways
- Treat all externally sourced data as adversarial; do not inject raw tool output into summarizers.
- Sanitize delimiterlike tokens and instructionshaped text before they reach LLM prompts.
- Prefer denybydefault egress for agent tools and strict allowlists.
- Layer runtime guardrails, parser Lambdas, and auditing.
## References
@@ -179,4 +87,4 @@ Notes:
- [Track agents step-by-step reasoning process using trace Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/trace-events.html)
- [Amazon Bedrock Guardrails](https://aws.amazon.com/bedrock/guardrails/)
{{#include ../../../banners/hacktricks-training.md}}
{{#include ../../../banners/hacktricks-training.md}}