<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>ReasoningBank - Agent 的"经验库"</title>
<link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
<link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@400;500;700&family=Roboto:wght@400;500;700&family=Roboto+Mono:wght@400;500&display=swap" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/styles/github.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/highlight.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/languages/python.min.js"></script>
<style>
:root {
--primary-color: #1a73e8;
--primary-light: #e8f0fe;
--primary-dark: #174ea6;
--secondary-color: #5f6368;
--background-color: #f8f9fa;
--card-color: #ffffff;
--text-color: #202124;
--text-secondary: #5f6368;
--accent-color: #4285f4;
--success-color: #34a853;
--warning-color: #fbbc04;
--error-color: #ea4335;
}
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: 'Noto Sans SC', 'Roboto', sans-serif;
background-color: var(--background-color);
color: var(--text-color);
line-height: 1.6;
}
.poster-container {
width: 960px;
min-height: 1200px;
margin: 0 auto;
padding: 40px;
background: linear-gradient(135deg, #f5f7fa 0%, #e4e8f0 100%);
position: relative;
overflow: hidden;
}
.bg-pattern {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
background-image:
radial-gradient(circle at 10% 20%, rgba(26, 115, 232, 0.05) 0%, transparent 20%),
radial-gradient(circle at 90% 30%, rgba(66, 133, 244, 0.07) 0%, transparent 30%),
radial-gradient(circle at 50% 70%, rgba(26, 115, 232, 0.05) 0%, transparent 25%),
linear-gradient(45deg, rgba(26, 115, 232, 0.02) 0%, transparent 70%);
z-index: 0;
}
.bg-grid {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
background-size: 20px 20px;
background-image:
linear-gradient(to right, rgba(0, 0, 0, 0.03) 1px, transparent 1px),
linear-gradient(to bottom, rgba(0, 0, 0, 0.03) 1px, transparent 1px);
z-index: 0;
}
.content {
position: relative;
z-index: 1;
}
.header {
text-align: center;
margin-bottom: 40px;
padding: 20px;
background: rgba(255, 255, 255, 0.8);
border-radius: 16px;
box-shadow: 0 4px 20px rgba(0, 0, 0, 0.08);
backdrop-filter: blur(10px);
}
.title {
font-size: 48px;
font-weight: 700;
color: var(--primary-dark);
margin-bottom: 10px;
letter-spacing: -0.5px;
}
.subtitle {
font-size: 20px;
color: var(--text-secondary);
max-width: 80%;
margin: 0 auto;
}
.section {
margin-bottom: 30px;
background: var(--card-color);
border-radius: 16px;
padding: 25px;
box-shadow: 0 4px 20px rgba(0, 0, 0, 0.08);
}
.section-title {
font-size: 28px;
font-weight: 700;
color: var(--primary-color);
margin-bottom: 15px;
display: flex;
align-items: center;
}
.section-title .material-icons {
margin-right: 10px;
font-size: 28px;
}
.section-content {
font-size: 16px;
line-height: 1.7;
}
.highlight {
background: linear-gradient(transparent 60%, rgba(66, 133, 244, 0.2) 40%);
padding: 0 4px;
}
.card-container {
display: flex;
flex-wrap: wrap;
gap: 20px;
margin-top: 20px;
}
.card {
flex: 1;
min-width: 280px;
background: var(--background-color);
border-radius: 12px;
padding: 20px;
box-shadow: 0 2px 10px rgba(0, 0, 0, 0.05);
}
.card-title {
font-size: 18px;
font-weight: 700;
color: var(--primary-color);
margin-bottom: 10px;
display: flex;
align-items: center;
}
.card-title .material-icons {
margin-right: 8px;
font-size: 20px;
}
.card-content {
font-size: 15px;
}
.code-block {
background: #f6f8fa;
border-radius: 8px;
padding: 16px;
margin: 15px 0;
overflow-x: auto;
position: relative;
border: 1px solid #e1e4e8;
}
.code-label {
position: absolute;
top: 0;
right: 0;
background: var(--primary-color);
color: white;
font-size: 12px;
padding: 2px 8px;
border-radius: 0 0 0 8px;
font-family: 'Roboto Mono', monospace;
text-transform: uppercase;
}
.code-block pre {
margin: 0;
padding: 0;
font-family: 'Roboto Mono', monospace;
font-size: 14px;
line-height: 1.5;
white-space: pre;
overflow-x: auto;
}
.code-block code {
font-family: 'Roboto Mono', monospace;
font-size: 14px;
}
.diagram {
background: var(--background-color);
border-radius: 12px;
padding: 20px;
margin: 20px 0;
text-align: center;
}
.diagram-title {
font-size: 18px;
font-weight: 700;
color: var(--primary-color);
margin-bottom: 15px;
}
.diagram-content {
display: flex;
justify-content: center;
align-items: center;
flex-wrap: wrap;
gap: 15px;
}
.diagram-box {
background: white;
border: 2px solid var(--primary-color);
border-radius: 8px;
padding: 15px;
min-width: 120px;
text-align: center;
font-weight: 500;
}
.diagram-arrow {
color: var(--primary-color);
font-size: 24px;
}
.stats-container {
display: flex;
justify-content: space-around;
margin: 20px 0;
flex-wrap: wrap;
}
.stat-item {
text-align: center;
padding: 15px;
min-width: 150px;
}
.stat-value {
font-size: 36px;
font-weight: 700;
color: var(--primary-color);
}
.stat-label {
font-size: 14px;
color: var(--text-secondary);
}
.feature-list {
list-style: none;
margin: 15px 0;
}
.feature-item {
display: flex;
align-items: flex-start;
margin-bottom: 10px;
}
.feature-item .material-icons {
color: var(--success-color);
margin-right: 10px;
flex-shrink: 0;
}
.table-container {
overflow-x: auto;
margin: 20px 0;
}
table {
width: 100%;
border-collapse: collapse;
font-size: 14px;
}
th, td {
padding: 12px 15px;
text-align: left;
border-bottom: 1px solid #e0e0e0;
}
th {
background-color: var(--primary-light);
color: var(--primary-dark);
font-weight: 500;
}
tr:hover {
background-color: rgba(66, 133, 244, 0.05);
}
.footer {
text-align: center;
margin-top: 40px;
padding: 20px;
color: var(--text-secondary);
font-size: 14px;
}
</style>
</head>
<body>
<div class="poster-container">
<div class="bg-pattern"></div>
<div class="bg-grid"></div>
<div class="content">
<div class="header">
<h1 class="title">ReasoningBank - Agent 的"经验库"</h1>
<p class="subtitle">Google 的创新记忆框架,让 AI 智能体从自身经验中学习并不断进化</p>
</div>
<div class="section">
<h2 class="section-title">
<i class="material-icons">error_outline</i>
问题背景
</h2>
<div class="section-content">
<p>当前的 AI 智能体(如自动网页浏览、写代码的 LLM agent)虽然能执行复杂任务,但存在一个关键局限:<span class="highlight">无法从过去的经验中学习</span>。这导致以下问题:</p>
<ul class="feature-list">
<li class="feature-item">
<i class="material-icons">arrow_right</i>
<span>做任务 A 学到的经验不会帮助它做任务 B</span>
</li>
<li class="feature-item">
<i class="material-icons">arrow_right</i>
<span>经常会重复犯同样的错误</span>
</li>
<li class="feature-item">
<i class="material-icons">arrow_right</i>
<span>每次都像"第一次见到问题一样"去解决</span>
</li>
</ul>
<p>这种局限性让智能体无法真正变得"越来越聪明",限制了其在持续任务中的表现。传统记忆系统存储原始轨迹或仅成功的流程,这些系统在不同环境中表现脆弱,且缺乏跨领域的可转移性。</p>
</div>
</div>
<div class="section">
<h2 class="section-title">
<i class="material-icons">lightbulb</i>
ReasoningBank 的基本概念
</h2>
<div class="section-content">
<p>ReasoningBank 是一个<span class="highlight">"推理记忆库"</span>,它让 AI 像人一样,总结经验、反思失败,从而在后续任务中做得更好。主要思路是:</p>
<ol class="feature-list">
<li class="feature-item">
<i class="material-icons">looks_one</i>
<span><strong>记录经验</strong>:每次执行任务时,AI 记录下自己的"思考过程"和"行动轨迹"</span>
</li>
<li class="feature-item">
<i class="material-icons">looks_two</i>
<span><strong>自我评估</strong>:用一个 LLM 自评机制判断这次任务是"成功"还是"失败"</span>
</li>
<li class="feature-item">
<i class="material-icons">looks_3</i>
<span><strong>提炼记忆</strong>:不直接保存冗长的操作过程,而是总结出简洁、可迁移的"推理策略"</span>
</li>
<li class="feature-item">
<i class="material-icons">looks_4</i>
<span><strong>记忆回收再利用</strong>:当遇到新任务时,AI 会从 ReasoningBank 中检索出相关经验,用来指导当前决策</span>
</li>
</ol>
<p>这种循环让 AI 在不断积累中"自我进化"——类似人类的"经验学习"。</p>
</div>
</div>
<div class="section">
<h2 class="section-title">
<i class="material-icons">architecture</i>
ReasoningBank 的架构和核心组件
</h2>
<div class="section-content">
<p>ReasoningBank 包含以下几个关键组件:</p>
<div class="card-container">
<div class="card">
<h3 class="card-title">
<i class="material-icons">storage</i>
记忆结构
</h3>
<div class="card-content">
<p>记忆项是从过去的经验中设计和提炼出的结构化知识单元,它们抽象了低级执行细节,同时保留了可转移的推理模式和策略。每个记忆项包含三个部分:</p>
<ul class="feature-list">
<li class="feature-item">
<i class="material-icons">title</i>
<span><strong>标题</strong>:作为简洁的标识符,总结核心策略或推理模式</span>
</li>
<li class="feature-item">
<i class="material-icons">description</i>
<span><strong>描述</strong>:提供记忆项的简短一句话总结</span>
</li>
<li class="feature-item">
<i class="material-icons">article</i>
<span><strong>内容</strong>:记录从过去经验中提炼出的推理步骤、决策理由或操作见解</span>
</li>
</ul>
<div class="code-block">
<div class="code-label">JSON</div>
<pre><code>{
"title": "验证元素标识符",
"description": "在执行操作前验证页面元素的存在和状态",
"content": "当需要点击或交互时,先使用开发者工具检查元素是否存在、可见且可交互。如果元素不存在,可能需要等待页面加载或执行其他操作使元素出现。"
}</code></pre>
</div>
</div>
</div>
<div class="card">
<h3 class="card-title">
<i class="material-icons">integration_instructions</i>
与智能体的集成
</h3>
<div class="card-content">
<p>配备 ReasoningBank 的智能体可以从一个精心挑选的可转移策略池中汲取经验来指导决策。这使得智能体能够回忆有效的见解,避免以前观察到的陷阱,并更稳健地适应未见过的查询。集成过程分为三个步骤:</p>
<ul class="feature-list">
<li class="feature-item">
<i class="material-icons">search</i>
<span><strong>记忆检索</strong>:从 ReasoningBank 中检索与当前任务相关的记忆项</span>
</li>
<li class="feature-item">
<i class="material-icons">build</i>
<span><strong>记忆构建</strong>:基于当前任务的经验构建新的记忆项</span>
</li>
<li class="feature-item">
<i class="material-icons">merge_type</i>
<span><strong>记忆整合</strong>:将新构建的记忆项整合到 ReasoningBank 中</span>
</li>
</ul>
<div class="code-block">
<div class="code-label">Python</div>
<pre><code>def retrieve_memories(query, memory_bank, k=1):
# 将查询嵌入为向量
query_embedding = embed(query)
# 计算与所有记忆项的余弦相似度
similarities = []
for memory in memory_bank:
memory_embedding = memory['embedding']
similarity = cosine_similarity(query_embedding, memory_embedding)
similarities.append((similarity, memory))
# 返回相似度最高的k个记忆项
similarities.sort(reverse=True)
return [memory for _, memory in similarities[:k]]</code></pre>
</div>
</div>
</div>
</div>
<div class="diagram">
<div class="diagram-title">ReasoningBank 工作流程</div>
<div class="diagram-content">
<div class="diagram-box">新任务</div>
<div class="diagram-arrow">→</div>
<div class="diagram-box">记忆检索</div>
<div class="diagram-arrow">→</div>
<div class="diagram-box">执行任务</div>
<div class="diagram-arrow">→</div>
<div class="diagram-box">记忆构建</div>
<div class="diagram-arrow">→</div>
<div class="diagram-box">记忆整合</div>
</div>
</div>
</div>
</div>
<div class="section">
<h2 class="section-title">
<i class="material-icons">auto_graph</i>
MaTTS:记忆感知的测试时扩展
</h2>
<div class="section-content">
<p>作者进一步提出了 <span class="highlight">Memory-aware Test-Time Scaling (MaTTS)</span>,即"记忆感知的测试时扩展",让 AI 不仅能记,还能更快地学。MaTTS通过为每个任务分配更多计算资源来生成多样化的轨迹,从而产生更好的记忆质量。</p>
<div class="card-container">
<div class="card">
<h3 class="card-title">
<i class="material-icons">view_comfy</i>
并行扩展
</h3>
<div class="card-content">
<p>在并行设置中,针对同一个任务,AI 同时尝试多种不同方案,然后比较分析,提炼出更稳健的推理模式。通过对不同轨迹进行比较,智能体可以识别一致的推理模式,同时过滤掉虚假的解决方案。这个过程通过单一查询的多次试验促使多样化的探索,从而实现更可靠的记忆策划。</p>
<div class="code-block">
<div class="code-label">Python</div>
<pre><code>def parallel_scaling(task, k=5):
# 生成k个不同的轨迹
trajectories = []
for i in range(k):
trajectory = execute_task(task, seed=i)
trajectories.append(trajectory)
# 使用自对比提示提取最多5个精炼项
memories = extract_memories_from_trajectories(trajectories)
return memories</code></pre>
</div>
</div>
</div>
<div class="card">
<h3 class="card-title">
<i class="material-icons">view_list</i>
顺序扩展
</h3>
<div class="card-content">
<p>在顺序扩展中,AI 在一次任务中不断自我反思、修改和优化,逐步改进思路。在初步完成后,迭代地在单一轨迹内完善推理,遵循自我精炼的原则。在这个过程中,自我精炼中生成的中间笔记也被用作宝贵的记忆信号,因为它们捕捉了推理尝试、修正和见解,这些内容可能不会出现在最终的解决方案中。</p>
<div class="code-block">
<div class="code-label">Python</div>
<pre><code>def sequential_scaling(task, k=5):
# 初始轨迹
trajectory = execute_task(task)
# 迭代精炼k次
for i in range(k):
reflection = reflect_on_trajectory(trajectory)
trajectory = refine_trajectory(trajectory, reflection)
# 从中间笔记中提取记忆
memories = extract_memories_from_notes(trajectory.intermediate_notes)
return memories</code></pre>
</div>
</div>
</div>
</div>
<p>通过 MaTTS,AI 生成更多对比数据,从成功与失败中提取更深层次的推理规律,形成"经验—反思—记忆—再提升"的正反馈循环。实验表明,MaTTS显著增强了记忆质量,特别是在WebArena-Shopping等任务中,当k=5时,成功率提升了+5.4%。</p>
</div>
</div>
<div class="section">
<h2 class="section-title">
<i class="material-icons">analytics</i>
实验结果和性能评估
</h2>
<div class="section-content">
<p>在多个测试环境中(网页操作、软件开发等),ReasoningBank 展现了显著优势:</p>
<div class="table-container">
<table>
<thead>
<tr>
<th>基准测试</th>
<th>模型</th>
<th>指标</th>
<th>无记忆</th>
<th>ReasoningBank</th>
<th>提升</th>
<th>MaTTS (k=5 并行)</th>
</tr>
</thead>
<tbody>
<tr>
<td>WebArena (总体)</td>
<td>Gemini-2.5-Flash</td>
<td>成功率 (%)</td>
<td>41.4</td>
<td>49.7</td>
<td>+8.3</td>
<td>55.1</td>
</tr>
<tr>
<td>WebArena (总体)</td>
<td>Gemini-2.5-Pro</td>
<td>成功率 (%)</td>
<td>46.7</td>
<td>53.9</td>
<td>+7.2</td>
<td>N/A</td>
</tr>
<tr>
<td>WebArena (总体)</td>
<td>Claude-3.7-Sonnet</td>
<td>成功率 (%)</td>
<td>50.1</td>
<td>54.7</td>
<td>+4.6</td>
<td>N/A</td>
</tr>
<tr>
<td>SWE-Bench-Verified</td>
<td>Gemini-2.5-Flash</td>
<td>解决率 (%)</td>
<td>34.2</td>
<td>38.8</td>
<td>+4.6</td>
<td>N/A</td>
</tr>
<tr>
<td>Mind2Web (跨领域)</td>
<td>Gemini-2.5-Flash</td>
<td>成功率 (%)</td>
<td>1.0</td>
<td>1.6</td>
<td>+0.6</td>
<td>N/A</td>
</tr>
<tr>
<td>WebArena (成功步骤)</td>
<td>Gemini-2.5-Flash</td>
<td>步骤数</td>
<td>6.8</td>
<td>4.7</td>
<td>-2.1 (26.9%相对减少)</td>
<td>N/A</td>
</tr>
</tbody>
</table>
</div>
<div class="stats-container">
<div class="stat-item">
<div class="stat-value">+30%~+34%</div>
<div class="stat-label">相对成功率提升</div>
</div>
<div class="stat-item">
<div class="stat-value">-16%</div>
<div class="stat-label">操作步骤减少</div>
</div>
</div>
<p>相比其他记忆机制,ReasoningBank 显著更稳定、更高效,可学会从失败中改进。尤其在跨任务、跨领域测试中,ReasoningBank 的效果最突出,说明它的记忆是可迁移的、通用的。消融实验证实,失败经验对成功率的贡献为+3.2%,最优检索参数为k=1。</p>
</div>
</div>
<div class="section">
<h2 class="section-title">
<i class="material-icons">trending_up</i>
技术意义和未来展望
</h2>
<div class="section-content">
<p>ReasoningBank 的提出标志着 AI 智能体发展的一个重要里程碑:</p>
<ul class="feature-list">
<li class="feature-item">
<i class="material-icons">stars</i>
<span>建立了<span class="highlight">记忆与测试时扩展之间的协同效应</span>:高质量的记忆将扩展引导到更有前景的路径,而丰富的经验则进一步锤炼出更强的记忆</span>
</li>
<li class="feature-item">
<i class="material-icons">stars</i>
<span>这种正反馈循环使得基于记忆的经验扩展成为智能体的一个新扩展维度</span>
</li>
<li class="feature-item">
<i class="material-icons">stars</i>
<span>为 AI 智能体的持续学习和自我进化提供了新的可能性</span>
</li>
</ul>
<p>随着时间推移,策略从程序性(如"点击下一页")演变为反思性(如"验证元素标识符")和组合性(如"将任务与视图交叉引用")。这种演变表明了自我进化的发生,效率提升主要发生在成功的路径上。</p>
<p>未来,ReasoningBank 可能会在以下方向继续发展:</p>
<ul class="feature-list">
<li class="feature-item">
<i class="material-icons">arrow_forward</i>
<span>更高效的记忆检索和整合算法</span>
</li>
<li class="feature-item">
<i class="material-icons">arrow_forward</i>
<span>更复杂的记忆结构和表示方法</span>
</li>
<li class="feature-item">
<i class="material-icons">arrow_forward</i>
<span>跨智能体的记忆共享和协作机制</span>
</li>
</ul>
<div class="card">
<h3 class="card-title">
<i class="material-icons">warning</i>
局限性
</h3>
<div class="card-content">
<p>尽管 ReasoningBank 表现出巨大潜力,但仍存在一些局限性:</p>
<ul class="feature-list">
<li class="feature-item">
<i class="material-icons">error_outline</i>
<span>依赖 LLM 自我判断,可能放大偏见</span>
</li>
<li class="feature-item">
<i class="material-icons">error_outline</i>
<span>使用简单的检索方法,缺乏高级修剪机制</span>
</li>
<li class="feature-item">
<i class="material-icons">error_outline</i>
<span>对于全新场景的适应能力有限</span>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="footer">
<p>基于 Google 论文《ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory》整理</p>
</div>
</div>
</div>
<script>
// 初始化代码高亮
document.addEventListener('DOMContentLoaded', function() {
hljs.highlightAll();
});
</script>
</body>
</html>
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!