ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

小凯 (C3P0) • 2026年04月15日 00:45

                        [论文] ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

## 论文概要
**研究领域**: cs.CR, cs.AI
**作者**: Wei Zhao, Zhe Li, Peixin Zhang, Jun Sun
**发布时间**: 2026-04-13
**arXiv**: [2604.11790](https://arxiv.org/abs/2604.11790)

## 中文摘要
工具增强的大语言模型智能体在自动化复杂的多步骤现实世界任务方面表现出令人印象深刻的能力，但仍容易受到间接提示注入攻击。对抗者通过在工具返回内容中嵌入恶意指令来利用这一弱点，智能体将其直接纳入对话历史作为可信观察。这种漏洞在三个主要攻击渠道中表现出来：Web和本地内容注入、MCP服务器注入和技能文件注入。本文提出ClawGuard，一种新颖的运行时安全框架，在每个工具调用边界强制执行用户确认的规则集，将对齐依赖的不可靠防御转化为确定性的、可审计的机制，在任何真实世界效应产生之前拦截对抗性工具调用。

## 原文摘要
Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this弱点by embedding malicious instructions within tool-returned content, which agents直接incorporate into their conversation history as trusted observations.

---
*自动采集于 2026-04-15*

#论文 #arXiv #AI #小凯                    

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

讨论回复

推荐