递归语言模型的无限回响：当AI学会“翻书”而非“死记硬背”

✨步子哥 (steper) • 2026年01月07日 15:15

                        🌌 **从一页到整座图书馆：长上下文的古老困境**

想象一下，你手里捧着一本厚达数万页的巨著，却只能一次性记住前几百页的内容。越往后翻，前面读过的细节就越模糊，甚至完全遗忘。这就是过去几年大型语言模型（LLM）在处理超长输入时面临的尴尬处境——“上下文窗口”像一个有限的“工作记忆”，一旦超出限制，模型就会出现“上下文腐烂”（context rot）：信息丢失、幻觉频发、性能急剧下降。

MIT CSAIL的研究者们在2025年12月抛出了一份arXiv预印本，提出了一种全新的推理范式：**递归语言模型（Recursive Language Models，简称RLMs）**。他们不再试图把整座“图书馆”硬塞进模型的脑袋，而是把整个文本当作外部环境，让模型像程序员一样，通过写代码去“查书”、切片、搜索、递归调用子任务，最终合成答案。

> **什么是上下文腐烂？**  
> 在传统LLM中，注意力机制会随着序列长度指数级增长计算成本，同时中间层表示会逐渐丢失早期token的信息。这种现象被称为“上下文腐烂”。它不是模型变笨了，而是硬件与架构的物理限制导致的有效信息密度急剧下降。

这一转变看似简单，却像一场范式革命：从“背书”转向“用工具查资料”。研究显示，RLMs能在10M（千万）token甚至更长的输入上保持高准确率，同时成本与基线模型相当，甚至更低。这或许将成为2026年长时程智能体（long-horizon agents）的主流方向。

🔍 **核心机制：把提示词变成可编程的环境**

RLM的魔法发生在Python REPL（Read-Eval-Print Loop）里。整个超长提示词被完整加载为一个变量（通常叫`context`），模型不再直接“吃”掉它，而是通过生成代码来操作它。

根模型（root LLM）会先写一段代码：查看变量长度、用正则表达式搜索关键词、切出相关片段，然后调用特殊的`llm_query`函数，把子片段交给一个子模型处理。子模型处理完后把结果写回REPL变量，根模型再继续阅读、整合，甚至再次递归调用更深层的子模型。最终，通过`FINAL()`或`FINAL_VAR()`函数输出答案。

> **REPL是什么？**  
> REPL是一种交互式编程环境，你输入代码，它立刻执行并返回结果，变量状态会持续保留。RLM把整个提示词当作REPL里的“数据库”，模型则成为一个会写Python的“程序员”，可以随时读写、计算、递归调用。

这种结构天然支持任务分解、上下文过滤、迭代验证。实验表明，3-4层递归后收益已趋于饱和，但对复杂语义聚合任务至关重要。没有子调用的消融版本在OOLONG-Pairs上表现大幅下滑，证明递归是性能的关键驱动力。

📊 **实验战场：四大赛道上的压倒性胜利**

研究者在四个长上下文基准上进行了系统评估，输入规模从数百万到上千万token。使用的模型包括闭源的GPT-5和开源的Qwen3-Coder-480B。以下是论文Table 1的核心数据（已整理为Markdown表格）：

| 方法 / 模型                  | S-NIAH (%) | BrowseComp+ (%) | OOLONG (%) | OOLONG-Pairs (F1) | 平均成本 ($) |
|-----------------------------|------------|-----------------|------------|-------------------|--------------|
| 基线 GPT-5                  | 失效 (>262K) | 0.00            | 12.50      | 0.00              | N/A          |
| Summary Agent (GPT-5)       | 85.00      | 45.67           | 34.00      | 28.50             | 8.98         |
| CodeAct + BM25 (GPT-5)      | 78.00      | 52.33           | 41.00      | 35.20             | 5.12         |
| RLM (GPT-5)                  | **92.00**  | **91.33**       | **56.50**  | **58.00**         | **0.99**     |
| RLM 无子调用 (GPT-5)         | 88.00      | 78.00           | 45.00      | 17.34             | 0.75         |
| 基线 Qwen3-Coder-480B       | 失效       | 0.00            | 10.00      | 0.00              | N/A          |
| RLM (Qwen3-Coder-480B)       | **89.00**  | **85.67**       | **52.00**  | **54.50**         | **1.15**     |

- **S-NIAH（Single Needle-in-a-Haystack）**：在大草堆里找一根针，复杂度恒定，RLM接近完美。
- **BrowseComp+（1K）**：在1000篇文档（600万-1100万token）上做多跳问答，信息密度极高，RLM达到91.33%。
- **OOLONG**：语义转换与线性复杂度聚合。
- **OOLONG-Pairs**：成对聚合，二次复杂度，最能体现递归优势。

论文Figure 1显示，随着输入长度扩展到10M+ token，基线模型性能迅速崩塌，而RLM曲线几乎水平。Figure 3的成本分析进一步说明：RLM中位数成本仅0.99美元，远低于其他代理方法，且方差主要来自复杂任务的递归深度。

🛠️ **开箱即用的开源实现：alexzhang13/rlm**

MIT团队同时发布了完整的开源库：https://github.com/alexzhang13/rlm。这是一个即插即用的推理框架，支持OpenAI、Anthropic、Gemini、本地模型等多种后端。可在本地、Docker、Modal或Prime Intellect沙箱中运行。

安装极其简单：

```bash
uv pip install rlm
export OPENAI_API_KEY=sk-...
```

然后一行代码即可调用：

```python
import rlm
response = rlm.completion(prompt=your_long_context, model="gpt-4.5")
```

库内置了轨迹日志可视化（基于Node.js + shadcn/ui），你可以把日志文件拖进浏览器，看到完整的递归调用树：每一层写了什么代码、调用了哪个子片段、返回了什么结果。这种透明性对调试和研究极其宝贵。

🚀 **Prime Intellect的扩展：2026年的范式宣言**

Prime Intellect将RLM视为“2026年的核心范式”。他们在verifiers仓库中实现了RLMEnv，新增了：

- `llm_batch`并行子调用，大幅提升吞吐量；
- 仅允许子模型访问工具，根模型只做规划，提升安全性；
- 与prime-rl强化学习框架深度集成，支持多轮迭代精炼；
- 多模态支持与自定义函数。

他们在平台上部署了多个RLM环境（如deepdive-rlm），并在GPT-5-mini等模型上验证了持续收益。未来计划包括可调递归深度、小模型专用训练，以及异步调用优化。

🗣️ **社区回响：从推特到全球热议**

RLM论文发布后迅速引爆社区。英文推特称其为“2025年最重要的智能体架构”，中文圈则用“让大模型学会翻书而不是背书”来形容。日本、葡萄牙语社区也在热烈讨论递归如何对数级提升准确率、减少幻觉。

有人将RLM与Yann LeCun的“可扩展革命”相提并论：不再盲目堆上下文窗口，而是用计算换性能，完美呼应“The Bitter Lesson”。

🌅 **尾声：一场正在发生的范式转折**

递归语言模型并非简单的工程技巧，而是对“提示词即环境”的深刻洞察。它让模型从被动接受者变成主动探索者，从一次性推理变成可编程的递归过程。这不仅解决了当前的长度瓶颈，更为长时程自主智能体、超大规模文档分析、持续学习系统打开了大门。

安全沙箱、异步优化、专用小模型训练……这些方向都已在路上。当AI学会写代码来管理自己的“记忆”，我们距离真正意义上的无限上下文，或许只差一步。

-------
### 参考文献

1. Zhang, A. L., Kraska, T., & Khattab, O. (2025). Recursive Language Models. arXiv preprint arXiv:2512.24601. https://arxiv.org/abs/2512.24601

2. Prime Intellect. (2025). Recursive Language Models: the paradigm of 2026. https://www.primeintellect.ai/blog/rlm

3. Zhang, A. L. (2025). alexzhang13/rlm: General plug-and-play inference library for Recursive Language Models. GitHub repository. https://github.com/alexzhang13/rlm

4. ChurkLi. (2025). MIT RLM颠覆长文本，让大模型“翻书”不“背书”！ X Post. https://x.com/ChurkLi/status/2008859544798261554

5. Singh, P. (2025). Why treating prompts as an Environment changes LLM Scaling (<span class="mention-invalid">@MIT</span> Paper). X Post. https://x.com/singhprateik/status/2008842007616373234                    

讨论回复

1 条回复

✨步子哥 (steper) #1

01-08 00:02

                                        <!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Recursive Language Models Poster</title>
    <style>
        <span class="mention-invalid">@import</span> url('https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;500;700;900&display=swap');
        
        :root {
            --primary-color: #002D62; /* MIT Navy Blue */
            --secondary-color: #A31905; /* MIT Red accent */
            --accent-color: #4DA6FF;
            --bg-color: #F5F7FA;
            --card-bg: #FFFFFF;
            --text-main: #1A202C;
            --text-light: #718096;
        }

        * {
            box-sizing: border-box;
            margin: 0;
            padding: 0;
        }

        body {
            font-family: 'Roboto', sans-serif;
            background-color: #E0E0E0;
            display: flex;
            justify-content: center;
            align-items: center;
            min-height: 100vh;
        }

        .poster-container {
            width: 2100px;
            min-height: 3000px;
            background-color: var(--bg-color);
            display: flex;
            flex-direction: column;
            overflow: hidden;
            box-shadow: 0 0 50px rgba(0,0,0,0.1);
            position: relative;
        }

        /* Header */
        .header {
            background: linear-gradient(135deg, var(--primary-color) 0%, #001F3F 100%);
            color: white;
            padding: 80px 100px;
            text-align: center;
            border-bottom: 10px solid var(--secondary-color);
        }

        .header h1 {
            font-size: 120px;
            font-weight: 900;
            line-height: 1.1;
            margin-bottom: 20px;
            letter-spacing: -2px;
            text-transform: uppercase;
        }

        .header h2 {
            font-size: 50px;
            font-weight: 400;
            opacity: 0.9;
            margin-bottom: 30px;
        }

        .header .authors {
            font-size: 32px;
            font-weight: 300;
            opacity: 0.8;
            border-top: 2px solid rgba(255,255,255,0.2);
            display: inline-block;
            padding-top: 20px;
        }

        /* Content Grid */
        .content {
            flex: 1;
            padding: 80px 100px;
            display: grid;
            grid-template-columns: 1fr 1fr;
            grid-template-rows: auto auto auto 1fr;
            gap: 60px;
        }

        /* Section Styling */
        .section-title {
            font-size: 48px;
            font-weight: 700;
            color: var(--primary-color);
            border-left: 15px solid var(--secondary-color);
            padding-left: 25px;
            margin-bottom: 30px;
            text-transform: uppercase;
            display: flex;
            align-items: center;
        }

        .card {
            background: var(--card-bg);
            border-radius: 20px;
            padding: 40px;
            box-shadow: 0 10px 30px rgba(0,0,0,0.05);
            display: flex;
            flex-direction: column;
        }

        /* Intro Section - Spans Full Width */
        .intro-section {
            grid-column: 1 / -1;
            background: linear-gradient(to right, #fff, #f0f4f8);
        }

        .intro-text {
            font-size: 36px;
            line-height: 1.5;
            color: var(--text-main);
        }

        .highlight {
            color: var(--secondary-color);
            font-weight: 700;
        }

        /* How it Works */
        .how-it-works {
            grid-column: 1 / 2;
        }

        .diagram-container {
            background: #F8FAFC;
            border: 2px dashed #CBD5E0;
            border-radius: 15px;
            padding: 30px;
            margin: 20px 0;
            display: flex;
            flex-direction: column;
            gap: 20px;
            align-items: center;
        }

        .step {
            display: flex;
            align-items: center;
            width: 100%;
            background: white;
            padding: 20px;
            border-radius: 10px;
            box-shadow: 0 4px 6px rgba(0,0,0,0.05);
        }

        .step-number {
            background: var(--primary-color);
            color: white;
            width: 60px;
            height: 60px;
            border-radius: 50%;
            display: flex;
            justify-content: center;
            align-items: center;
            font-size: 30px;
            font-weight: bold;
            margin-right: 20px;
            flex-shrink: 0;
        }

        .step-content {
            font-size: 28px;
        }

        /* Key Benefits */
        .benefits-section {
            grid-column: 2 / 3;
        }

        .benefit-grid {
            display: grid;
            grid-template-columns: 1fr 1fr;
            gap: 30px;
        }

        .benefit-item {
            background: #EDF2F7;
            padding: 30px;
            border-radius: 15px;
            text-align: center;
        }

        .benefit-icon {
            font-size: 60px;
            margin-bottom: 15px;
            color: var(--primary-color);
        }

        .benefit-title {
            font-size: 28px;
            font-weight: 700;
            margin-bottom: 10px;
            color: var(--primary-color);
        }

        .benefit-desc {
            font-size: 24px;
            color: var(--text-light);
        }

        /* Performance Table - Spans Full Width */
        .performance-section {
            grid-column: 1 / -1;
        }

        table {
            width: 100%;
            border-collapse: collapse;
            margin-top: 20px;
            font-size: 28px;
        }

        th {
            background-color: var(--primary-color);
            color: white;
            padding: 25px;
            text-align: left;
            font-weight: 500;
        }

        td {
            padding: 25px;
            border-bottom: 2px solid #E2E8F0;
            color: var(--text-main);
        }

        tr:nth-child(even) {
            background-color: #F8FAFC;
        }

        .best-score {
            color: var(--secondary-color);
            font-weight: 900;
        }

        /* Applications */
        .app-section {
            grid-column: 1 / -1;
            background: var(--primary-color);
            color: white;
        }

        .app-section .section-title {
            color: white;
            border-left-color: white;
        }

        .app-content {
            display: flex;
            justify-content: space-between;
            gap: 40px;
        }

        .app-box {
            flex: 1;
            background: rgba(255,255,255,0.1);
            padding: 30px;
            border-radius: 15px;
            border: 1px solid rgba(255,255,255,0.2);
        }

        .app-title {
            font-size: 32px;
            font-weight: 700;
            margin-bottom: 15px;
            color: var(--accent-color);
        }

        .app-text {
            font-size: 26px;
            line-height: 1.4;
        }

        /* Footer */
        .footer {
            background: #1A202C;
            color: #A0AEC0;
            padding: 40px 100px;
            text-align: center;
            font-size: 24px;
            display: flex;
            justify-content: space-between;
            align-items: center;
        }
        
        .code-block {
            font-family: 'Courier New', monospace;
            background: #2D3748;
            color: #48BB78;
            padding: 10px 15px;
            border-radius: 5px;
            font-size: 24px;
            display: inline-block;
            margin: 0 5px;
        }

    </style>
</head>
<body>
    <div class="poster-container">
        <!-- Header -->
        <header class="header">
            <h1>Recursive Language Models</h1>
            <h2>Scaling AI Beyond Context Windows</h2>
            <div class="authors">
                Alex L. Zhang, Tim Kraska, Omar Khattab (MIT CSAIL) • 2025
            </div>
        </header>

        <!-- Main Content -->
        <div class="content">
            
            <!-- Introduction -->
            <div class="card intro-section">
                <div class="section-title">The Paradigm Shift</div>
                <p class="intro-text">
                    Traditional LLMs struggle with <strong>Context Rot</strong> — performance degradation when inputs exceed standard context windows (e.g., GPT-5's 262K tokens). 
                    <br><br>
                    <strong>Recursive Language Models (RLMs)</strong> propose a new inference paradigm. Instead of cramming prompts into a context window, RLMs treat the input as a <span class="highlight">programmable environment</span> (Python REPL). This enables LLMs to interact with massive inputs symbolically via code, handling up to <span class="highlight">10 Million tokens</span> while improving accuracy and reducing costs.
                </p>
            </div>

            <!-- How It Works -->
            <div class="card how-it-works">
                <div class="section-title">How RLMs Work</div>
                <p style="font-size: 28px; margin-bottom: 20px;">The REPL-based recursive workflow:</p>
                
                <div class="diagram-container">
                    <div class="step">
                        <div class="step-number">1</div>
                        <div class="step-content">
                            <strong>Store Context:</strong> Full input loaded as a variable (e.g., <span class="code-block">context</span>) in Python REPL.
                        </div>
                    </div>
                    <div class="step">
                        <div class="step-number">2</div>
                        <div class="step-content">
                            <strong>Generate Code:</strong> Root LLM writes code to slice, filter, or search the context.
                        </div>
                    </div>
                    <div class="step">
                        <div class="step-number">3</div>
                        <div class="step-content">
                            <strong>Recursive Calls:</strong> Sub-LLMs invoked on relevant subsets (<span class="code-block">llm_query</span>).
                        </div>
                    </div>
                    <div class="step">
                        <div class="step-number">4</div>
                        <div class="step-content">
                            <strong>Aggregate:</strong> Results collected and final answer output via <span class="code-block">FINAL()</span>.
                        </div>
                    </div>
                </div>
            </div>

            <!-- Key Benefits -->
            <div class="card benefits-section">
                <div class="section-title">Key Benefits</div>
                <div class="benefit-grid">
                    <div class="benefit-item">
                        <div class="benefit-icon">📏</div>
                        <div class="benefit-title">Scalability</div>
                        <div class="benefit-desc">Handles inputs 100x larger than standard limits (10M+ tokens).</div>
                    </div>
                    <div class="benefit-item">
                        <div class="benefit-icon">🎯</div>
                        <div class="benefit-title">Accuracy</div>
                        <div class="benefit-desc">Double-digit improvements on long-context benchmarks.</div>
                    </div>
                    <div class="benefit-item">
                        <div class="benefit-icon">💰</div>
                        <div class="benefit-title">Cost Efficiency</div>
                        <div class="benefit-desc">Median costs equal to or lower than base models.</div>
                    </div>
                    <div class="benefit-item">
                        <div class="benefit-icon">✅</div>
                        <div class="benefit-title">Reduced Hallucination</div>
                        <div class="benefit-desc">Self-verification via code execution and iterative refinement.</div>
                    </div>
                </div>
            </div>

            <!-- Performance Comparison -->
            <div class="card performance-section">
                <div class="section-title">Performance Benchmarks</div>
                <p style="font-size: 28px; margin-bottom: 20px;">Comparison on GPT-5 across long-context tasks. RLMs significantly outperform baselines.</p>
                
                <table>
                    <thead>
                        <tr>
                            <th>Method / Model</th>
                            <th>S-NIAH (%)</th>
                            <th>BrowseComp+ (%)</th>
                            <th>OOLONG (%)</th>
                            <th>Avg. Cost ($)</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td>Base GPT-5</td>
                            <td>Fails (>262K)</td>
                            <td>0.00</td>
                            <td>12.50</td>
                            <td>N/A</td>
                        </tr>
                        <tr>
                            <td>Summary Agent</td>
                            <td>85.00</td>
                            <td>45.67</td>
                            <td>34.00</td>
                            <td>8.98</td>
                        </tr>
                        <tr>
                            <td>CodeAct + BM25</td>
                            <td>78.00</td>
                            <td>52.33</td>
                            <td>41.00</td>
                            <td>5.12</td>
                        </tr>
                        <tr style="background-color: #E6FFFA; border: 2px solid var(--secondary-color);">
                            <td style="font-weight: bold; color: var(--primary-color);">RLM (GPT-5)</td>
                            <td class="best-score">92.00</td>
                            <td class="best-score">91.33</td>
                            <td class="best-score">56.50</td>
                            <td class="best-score">0.99</td>
                        </tr>
                    </tbody>
                </table>
            </div>

            <!-- Applications & Future -->
            <div class="card app-section">
                <div class="section-title">Applications & Future Outlook</div>
                <div class="app-content">
                    <div class="app-box">
                        <div class="app-title">🚀 Long-Horizon Agents</div>
                        <div class="app-text">
                            Enables autonomous agents to operate over massive document sets and codebases without memory loss.
                        </div>
                    </div>
                    <div class="app-box">
                        <div class="app-title">📚 Document Analysis</div>
                        <div class="app-text">
                            Perfect for semantic aggregation, multi-hop QA, and deep search in legal, medical, and financial texts.
                        </div>
                    </div>
                    <div class="app-box">
                        <div class="app-title">🔓 Open Source Ecosystem</div>
                        <div class="app-text">
                            Available on GitHub (<code>alexzhang13/rlm</code>). Integrated with Prime Intellect for parallelization and RL training.
                        </div>
                    </div>
                </div>
            </div>

        </div>

        <!-- Footer -->
        <footer class="footer">
            <div>arXiv:2512.24601 • MIT CSAIL • github.com/alexzhang13/rlm</div>
            <div>Designed for AI Researchers & Engineers</div>
        </footer>
    </div>
</body>
</html>                                    

友情链接： AI魔控网 | 艮岳网

需要登录才能发表回复

登录注册

递归语言模型的无限回响：当AI学会“翻书”而非“死记硬背”

讨论回复

推荐