<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>REFRAG:Meta与新加坡国立大学合作的高效解码框架</title>
<link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
<link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@400;500;700&display=swap" rel="stylesheet">
<style>
:root {
--primary: #0b57d0;
--primary-light: #d3e3fd;
--secondary: #ff6d00;
--secondary-light: #ffab91;
--background: #f8f9fa;
--card-bg: #ffffff;
--text-primary: #1f1f1f;
--text-secondary: #5f6368;
--border-radius: 16px;
--shadow: 0 4px 8px rgba(0,0,0,0.1);
}
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: 'Noto Sans SC', sans-serif;
background: var(--background);
color: var(--text-primary);
line-height: 1.6;
}
.poster-container {
width: 720px;
min-height: 960px;
margin: 0 auto;
padding: 40px 20px;
background: linear-gradient(135deg, #e8f0fe 0%, #ffffff 100%);
position: relative;
overflow: hidden;
}
.bg-shape {
position: absolute;
border-radius: 50%;
opacity: 0.1;
z-index: 0;
}
.shape-1 {
width: 300px;
height: 300px;
background: var(--primary);
top: -100px;
right: -100px;
}
.shape-2 {
width: 200px;
height: 200px;
background: var(--secondary);
bottom: 100px;
left: -50px;
}
.grid-texture {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
background-image:
linear-gradient(rgba(255,255,255,0.1) 1px, transparent 1px),
linear-gradient(90deg, rgba(255,255,255,0.1) 1px, transparent 1px);
background-size: 20px 20px;
z-index: 0;
}
.content {
position: relative;
z-index: 1;
}
.header {
text-align: center;
margin-bottom: 30px;
padding-bottom: 20px;
border-bottom: 2px solid var(--primary-light);
}
.title {
font-size: 36px;
font-weight: 700;
color: var(--primary);
margin-bottom: 10px;
}
.subtitle {
font-size: 20px;
color: var(--secondary);
font-weight: 500;
}
.section {
margin-bottom: 30px;
background: var(--card-bg);
border-radius: var(--border-radius);
padding: 20px;
box-shadow: var(--shadow);
}
.section-title {
font-size: 24px;
font-weight: 700;
color: var(--primary);
margin-bottom: 15px;
display: flex;
align-items: center;
}
.section-title .material-icons {
margin-right: 10px;
color: var(--primary);
}
.point-list {
list-style-type: none;
padding-left: 10px;
}
.point-list li {
margin-bottom: 10px;
padding-left: 25px;
position: relative;
}
.point-list li:before {
content: "";
position: absolute;
left: 0;
top: 8px;
width: 8px;
height: 8px;
background-color: var(--secondary);
border-radius: 50%;
}
.highlight {
background-color: var(--secondary-light);
padding: 2px 5px;
border-radius: 4px;
font-weight: 500;
}
.stages {
display: flex;
justify-content: space-between;
margin-top: 20px;
}
.stage {
flex: 1;
padding: 15px;
background: var(--primary-light);
border-radius: var(--border-radius);
margin: 0 5px;
text-align: center;
}
.stage:first-child {
margin-left: 0;
}
.stage:last-child {
margin-right: 0;
}
.stage-title {
font-weight: 700;
color: var(--primary);
margin-bottom: 10px;
}
.results-grid {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 15px;
margin-top: 20px;
}
.result-card {
background: var(--primary-light);
padding: 15px;
border-radius: var(--border-radius);
text-align: center;
}
.result-value {
font-size: 28px;
font-weight: 700;
color: var(--secondary);
margin-bottom: 5px;
}
.result-label {
font-size: 14px;
color: var(--text-secondary);
}
.applications {
display: flex;
flex-wrap: wrap;
gap: 15px;
margin-top: 20px;
}
.app-card {
flex: 1 0 45%;
background: var(--primary-light);
padding: 15px;
border-radius: var(--border-radius);
display: flex;
align-items: center;
}
.app-icon {
margin-right: 10px;
color: var(--primary);
}
.footer {
margin-top: 30px;
padding-top: 20px;
border-top: 1px solid var(--primary-light);
font-size: 14px;
color: var(--text-secondary);
text-align: center;
}
</style>
</head>
<body>
<div class="poster-container">
<div class="bg-shape shape-1"></div>
<div class="bg-shape shape-2"></div>
<div class="grid-texture"></div>
<div class="content">
<header class="header">
<h1 class="title">REFRAG:Meta与新加坡国立大学合作的高效解码框架</h1>
<p class="subtitle">革命性三阶段优化方法,实现30.85倍加速与16倍上下文扩展</p>
</header>
<section class="section">
<h2 class="section-title">
<i class="material-icons">warning</i>
RAG场景中的长上下文挑战
</h2>
<ul class="point-list">
<li><span class="highlight">时间成本</span>:注意力机制计算复杂度随序列长度呈平方级增长,导致首Token延迟(TTFT)极高</li>
<li><span class="highlight">空间成本</span>:需要缓存巨大的键值对(KV Cache),内存需求随序列长度线性增长,限制了批处理大小和吞吐量</li>
<li><span class="highlight">信息稀疏</span>:检索返回的几十篇文档里,仅极少数片段与当前query真正相关;其余token对生成几乎无贡献,却仍要参与全部注意力计算</li>
</ul>
</section>
<section class="section">
<h2 class="section-title">
<i class="material-icons">insights</i>
块对角注意力模式与计算冗余
</h2>
<ul class="point-list">
<li>RAG场景中检索到的多个段落之间往往语义相关性较低,在注意力机制的模型生成token时会呈现<span class="highlight">"块对角"的稀疏特性</span>(Block-Diagonal Sparsity Pattern)</li>
<li>一个段落内部的token之间会有很强的关联,但不同段落之间的token关联度非常弱</li>
<li>这意味着,将大量原始Token全部输入LLM进行计算是不必要的且低效的</li>
</ul>
</section>
<section class="section">
<h2 class="section-title">
<i class="material-icons">architecture</i>
压缩-感知-扩展三阶段核心设计
</h2>
<div class="stages">
<div class="stage">
<h3 class="stage-title">压缩</h3>
<p>使用轻量级编码器(如RoBERTa)将文本块压缩为单个embedding向量,大幅减少需要处理的序列长度</p>
</div>
<div class="stage">
<h3 class="stage-title">感知</h3>
<p>通过投影层(MLP)将编码器输出的embedding向量映射到主LLM的词向量空间,实现两个模型间的"语言"对齐</p>
</div>
<div class="stage">
<h3 class="stage-title">扩展</h3>
<p>使用强化学习策略智能选择需要展开的关键信息块,确保关键细节(如精确数字、日期)不被压缩丢失</p>
</div>
</div>
</section>
<section class="section">
<h2 class="section-title">
<i class="material-icons">speed</i>
实测效果:显著加速与性能保持
</h2>
<div class="results-grid">
<div class="result-card">
<div class="result-value">30.85×</div>
<div class="result-label">首词生成加速(k=32)</div>
</div>
<div class="result-card">
<div class="result-value">16×</div>
<div class="result-label">上下文长度扩展</div>
</div>
<div class="result-card">
<div class="result-value">6.78×</div>
<div class="result-label">吞吐量提升</div>
</div>
<div class="result-card">
<div class="result-value">~k×</div>
<div class="result-label">KV Cache内存减少</div>
</div>
</div>
<ul class="point-list" style="margin-top: 15px;">
<li>在16项RAG任务上准确率与使用完整上下文的LLaMA模型相当或更高</li>
<li>在Book、Arxiv等数据集上,困惑度(PPL)相比基线模型(CEPE)平均降低9.3%</li>
</ul>
</section>
<section class="section">
<h2 class="section-title">
<i class="material-icons">apps</i>
广泛适用的高效RAG解决方案
</h2>
<div class="applications">
<div class="app-card">
<i class="material-icons app-icon">business</i>
<div>
<strong>企业知识库问答</strong>
<p>支持大规模文档检索与高效响应</p>
</div>
</div>
<div class="app-card">
<i class="material-icons app-icon">forum</i>
<div>
<strong>多轮对话</strong>
<p>无需截断历史,保持上下文连贯性</p>
</div>
</div>
<div class="app-card">
<i class="material-icons app-icon">description</i>
<div>
<strong>长文档摘要</strong>
<p>处理书籍、报告等超长文档</p>
</div>
</div>
<div class="app-card">
<i class="material-icons app-icon">smart_toy</i>
<div>
<strong>Agent应用</strong>
<p>支持复杂推理与工具使用</p>
</div>
</div>
</div>
</section>
<footer class="footer">
<p>论文作者:新加坡国立大学博士在读生林晓强等</p>
<p>合作机构:Meta Superintelligence Labs、新加坡国立大学、莱斯大学</p>
<p>论文链接:<a href="https://arxiv.org/abs/2509.01092">https://arxiv.org/abs/2509.01092</a></p>
</footer>
</div>
</div>
</body>
</html>
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!