Loading...
正在加载...
请稍候

Agentic Context Engineering Evolving Contexts for Self-Improving Language Models

✨步子哥 (steper) 2025年12月11日 08:10
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models</title> <link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;500;700&family=Roboto+Slab:wght@400;500;700&display=swap" rel="stylesheet"> <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet"> <style> * { margin: 0; padding: 0; box-sizing: border-box; } body { font-family: 'Roboto', sans-serif; background-color: #f8f9fa; color: #1a237e; line-height: 1.6; } .poster-container { width: 920px; min-height: 960px; margin: 0 auto; background: linear-gradient(135deg, #e8eaf6 0%, #c5cae9 100%); padding: 40px; position: relative; overflow: hidden; } .poster-container::before { content: ""; position: absolute; top: -150px; right: -150px; width: 400px; height: 400px; border-radius: 50%; background: linear-gradient(45deg, rgba(63, 81, 181, 0.1), rgba(103, 58, 183, 0.1)); z-index: 0; } .poster-container::after { content: ""; position: absolute; bottom: -100px; left: -100px; width: 300px; height: 300px; border-radius: 50%; background: linear-gradient(45deg, rgba(63, 81, 181, 0.1), rgba(103, 58, 183, 0.1)); z-index: 0; } .header { text-align: center; margin-bottom: 30px; position: relative; z-index: 1; } .title { font-family: 'Roboto Slab', serif; font-size: 42px; font-weight: 700; color: #303f9f; margin-bottom: 10px; line-height: 1.2; } .subtitle { font-size: 22px; font-weight: 500; color: #5c6bc0; margin-bottom: 20px; } .section { background-color: rgba(255, 255, 255, 0.85); border-radius: 12px; padding: 20px; margin-bottom: 25px; box-shadow: 0 4px 12px rgba(0, 0, 0, 0.08); position: relative; z-index: 1; } .section-title { font-family: 'Roboto Slab', serif; font-size: 28px; font-weight: 700; color: #3949ab; margin-bottom: 15px; display: flex; align-items: center; } .section-title .material-icons { margin-right: 10px; color: #5c6bc0; } .section-content { font-size: 18px; } .highlight { background: linear-gradient(transparent 40%, rgba(124, 77, 255, 0.2) 40%, rgba(124, 77, 255, 0.2) 85%, transparent 85%); padding: 0 2px; } .problem-box { background-color: rgba(239, 83, 80, 0.1); border-left: 4px solid #ef5350; padding: 15px; margin: 15px 0; border-radius: 0 8px 8px 0; } .solution-box { background-color: rgba(76, 175, 80, 0.1); border-left: 4px solid #4caf50; padding: 15px; margin: 15px 0; border-radius: 0 8px 8px 0; } .architecture { display: flex; justify-content: space-between; margin: 20px 0; } .component { flex: 1; background-color: rgba(63, 81, 181, 0.08); border-radius: 8px; padding: 15px; margin: 0 5px; text-align: center; box-shadow: 0 2px 6px rgba(0, 0, 0, 0.05); } .component-title { font-weight: 700; color: #3949ab; margin-bottom: 10px; font-size: 20px; } .component-desc { font-size: 16px; } .arrow { display: flex; align-items: center; justify-content: center; color: #5c6bc0; } .results-container { display: flex; justify-content: space-between; margin: 20px 0; } .result-box { flex: 1; background-color: rgba(63, 81, 181, 0.08); border-radius: 8px; padding: 15px; margin: 0 5px; text-align: center; } .result-number { font-size: 36px; font-weight: 700; color: #3949ab; } .result-label { font-size: 16px; } .efficiency-table { width: 100%; border-collapse: collapse; margin: 15px 0; } .efficiency-table th, .efficiency-table td { padding: 10px; text-align: left; border-bottom: 1px solid #e0e0e0; } .efficiency-table th { background-color: rgba(63, 81, 181, 0.1); color: #3949ab; } .context-image { width: 100%; border-radius: 8px; margin: 15px 0; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); } .footer { text-align: center; margin-top: 30px; font-size: 14px; color: #5c6bc0; position: relative; z-index: 1; } .bullet-list { list-style-type: none; padding-left: 0; } .bullet-list li { position: relative; padding-left: 25px; margin-bottom: 8px; } .bullet-list li::before { content: "•"; position: absolute; left: 0; color: #5c6bc0; font-weight: bold; } </style> </head> <body> <div class="poster-container"> <div class="header"> <h1 class="title">Agentic Context Engineering</h1> <h2 class="subtitle">Evolving Contexts for Self-Improving Language Models</h2> </div> <div class="section"> <h3 class="section-title"> <i class="material-icons">info</i> Introduction </h3> <div class="section-content"> <p>Large Language Model applications increasingly rely on <span class="highlight">context adaptation</span> rather than weight updates. Current approaches suffer from two critical limitations:</p> <div class="problem-box"> <p><strong>Brevity bias:</strong> Over-prioritizing concise summaries at the expense of detailed domain insights</p> <p><strong>Context collapse:</strong> Iterative rewriting erodes details over time, leading to performance drops</p> </div> <p>ACE treats contexts as <span class="highlight">evolving playbooks</span> that accumulate, refine, and organize strategies through a modular process.</p> </div> </div> <div class="section"> <h3 class="section-title"> <i class="material-icons">architecture</i> Three-Role Architecture </h3> <div class="section-content"> <div class="architecture"> <div class="component"> <div class="component-title">Generator</div> <div class="component-desc">Produces reasoning trajectories for new queries, surfacing effective strategies and pitfalls</div> </div> <div class="arrow"> <i class="material-icons">arrow_forward</i> </div> <div class="component"> <div class="component-title">Reflector</div> <div class="component-desc">Critiques generated traces, distilling insights from successes and errors</div> </div> <div class="arrow"> <i class="material-icons">arrow_forward</i> </div> <div class="component"> <div class="component-title">Curator</div> <div class="component-desc">Synthesizes insights into structured "delta entries" and integrates them into existing context</div> </div> </div> </div> </div> <div class="section"> <h3 class="section-title"> <i class="material-icons">lightbulb</i> Key Innovations </h3> <div class="section-content"> <div class="solution-box"> <h4 style="color: #3949ab; margin-bottom: 10px;">Incremental Delta Updates</h4> <ul class="bullet-list"> <li>Contexts represented as structured, itemized "bullets" with metadata and content</li> <li>Small, localized edits preserve prior knowledge while accumulating new insights</li> <li>Non-LLM logic for deterministic merging, de-duplication, and pruning</li> </ul> </div> <div class="solution-box"> <h4 style="color: #3949ab; margin-bottom: 10px;">Grow-and-Refine Mechanism</h4> <ul class="bullet-list"> <li>Balances context expansion with periodic refinement</li> <li>Maintains relevance and prevents unbounded growth</li> <li>Enables efficient, parallel merging crucial for scalability</li> </ul> </div> </div> </div> <div class="section"> <h3 class="section-title"> <i class="material-icons">trending_up</i> Performance Results </h3> <div class="section-content"> <p>ACE consistently outperforms strong baselines across agent and domain-specific benchmarks:</p> <div class="results-container"> <div class="result-box"> <div class="result-number">+10.6%</div> <div class="result-label">Agent Tasks (AppWorld)</div> </div> <div class="result-box"> <div class="result-number">+8.6%</div> <div class="result-label">Financial Analysis (FiNER + XBRL)</div> </div> </div> <p>Matches top-ranked production-level agent on AppWorld leaderboard using smaller open-source model.</p> <img src="https://sfile.chatglm.cn/moeSlide/image/2d/2dc07c66.jpg" alt="Context-Quality Curve" class="context-image"> </div> </div> <div class="section"> <h3 class="section-title"> <i class="material-icons">speed</i> Efficiency Gains </h3> <div class="section-content"> <p>ACE achieves significant efficiency improvements compared to existing methods:</p> <table class="efficiency-table"> <tr> <th>Metric</th> <th>Offline vs GEPA</th> <th>Online vs Dynamic Cheatsheet</th> </tr> <tr> <td>Latency Reduction</td> <td>82.3%</td> <td>91.5%</td> </tr> <tr> <td>Rollout/Token Cost Reduction</td> <td>75.1%</td> <td>83.6%</td> </tr> </table> <p>Adapts effectively <span class="highlight">without labeled supervision</span> by leveraging natural execution feedback.</p> </div> </div> <div class="section"> <h3 class="section-title"> <i class="material-icons">insights</i> Implications </h3> <div class="section-content"> <ul class="bullet-list"> <li>Enables scalable, efficient, and self-improving LLM systems with low overhead</li> <li>Provides interpretable contexts and lower overhead compared to fine-tuning</li> <li>Offers a flexible approach for online and continuous learning</li> <li>Particularly valuable for specialized domains and long-context applications</li> </ul> </div> </div> <div class="footer"> <p>arXiv:2510.04618 | Code available at github.com/ace-agent/ace</p> </div> </div> </body> </html>

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!