<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Monet: Reasoning in Latent Visual Space</title>
<link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@300;400;700;900&family=Roboto:wght@400;700&display=swap" rel="stylesheet">
<link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
<style>
:root {
--bg-gradient: linear-gradient(135deg, #0f0c29 0%, #302b63 50%, #24243e 100%);
--card-bg: rgba(255, 255, 255, 0.08);
--card-border: 1px solid rgba(255, 255, 255, 0.15);
--text-primary: #ffffff;
--text-secondary: #b3b3b3;
--accent-color: #00d2ff;
--accent-secondary: #9d50bb;
--chart-color-1: 'rgba(255, 159, 64, 0.7)';
--chart-color-2: 'rgba(75, 192, 192, 0.7)';
--chart-color-3: 'rgba(13, 110, 253, 0.8)';
}
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}
body {
font-family: "Noto Sans SC", sans-serif;
background: var(--bg-gradient);
color: var(--text-primary);
width: 720px;
min-height: 960px;
margin: 0 auto;
overflow-x: hidden;
display: flex;
flex-direction: column;
}
.poster-container {
padding: 30px;
display: flex;
flex-direction: column;
gap: 20px;
flex-grow: 1;
}
/* Header */
header {
text-align: left;
border-bottom: 2px solid var(--accent-color);
padding-bottom: 15px;
margin-bottom: 10px;
}
h1 {
font-size: 36px;
font-weight: 900;
background: linear-gradient(to right, #fff, #00d2ff);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
margin-bottom: 8px;
line-height: 1.2;
}
.subtitle {
font-size: 16px;
color: var(--text-secondary);
display: flex;
align-items: center;
gap: 5px;
}
.affiliation {
font-size: 12px;
margin-top: 5px;
opacity: 0.8;
font-family: 'Roboto', sans-serif;
color: var(--accent-color);
}
/* Grid Layout */
.main-grid {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 20px;
flex-grow: 1;
}
.full-width {
grid-column: 1 / -1;
}
/* Cards */
.card {
background: var(--card-bg);
backdrop-filter: blur(12px);
-webkit-backdrop-filter: blur(12px);
border: var(--card-border);
border-radius: 12px;
padding: 20px;
display: flex;
flex-direction: column;
box-shadow: 0 8px 32px 0 rgba(0, 0, 0, 0.3);
}
.card-title {
font-size: 18px;
font-weight: 700;
color: var(--accent-color);
margin-bottom: 12px;
display: flex;
align-items: center;
gap: 8px;
border-bottom: 1px solid rgba(255,255,255,0.1);
padding-bottom: 8px;
}
.card-content {
font-size: 13px;
line-height: 1.6;
color: #e0e0e0;
flex-grow: 1;
}
.highlight-text {
font-weight: 700;
color: #fff;
}
/* Image Styles */
.img-container {
width: 100%;
height: 140px;
overflow: hidden;
border-radius: 8px;
margin-bottom: 12px;
position: relative;
}
.img-container img {
width: 100%;
height: 100%;
object-fit: cover;
transition: transform 0.3s;
}
.img-overlay {
position: absolute;
bottom: 0;
left: 0;
width: 100%;
background: linear-gradient(transparent, rgba(0,0,0,0.8));
padding: 8px;
font-size: 10px;
color: rgba(255,255,255,0.9);
}
/* List Styles */
ul.feature-list {
list-style: none;
padding-left: 5px;
}
ul.feature-list li {
margin-bottom: 8px;
padding-left: 15px;
position: relative;
}
ul.feature-list li::before {
content: "•";
color: var(--accent-color);
position: absolute;
left: 0;
font-weight: bold;
}
/* Chart Area */
.chart-container {
height: 220px;
width: 100%;
position: relative;
}
/* Application Grid */
.app-grid {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 15px;
}
.app-item {
position: relative;
height: 120px;
border-radius: 8px;
overflow: hidden;
}
.app-item img {
width: 100%;
height: 100%;
object-fit: cover;
filter: brightness(0.8);
}
.app-text {
position: absolute;
bottom: 0;
width: 100%;
background: rgba(0,0,0,0.6);
padding: 6px 10px;
font-size: 12px;
font-weight: 700;
}
/* Footer */
footer {
text-align: center;
font-size: 11px;
color: rgba(255, 255, 255, 0.5);
margin-top: auto;
padding-top: 10px;
border-top: 1px solid rgba(255, 255, 255, 0.1);
}
/* Tags */
.tag {
display: inline-block;
padding: 2px 8px;
border-radius: 4px;
font-size: 10px;
font-weight: bold;
margin-right: 5px;
}
.tag-sft { background: rgba(13, 110, 253, 0.3); color: #8ac4ff; }
.tag-rl { background: rgba(255, 99, 132, 0.3); color: #ffb3c1; }
.tag-theory { background: rgba(75, 192, 192, 0.3); color: #99ffeb; }
</style>
</head>
<body>
<div class="poster-container">
<header>
<h1>Monet: Reasoning in Latent Visual Space</h1>
<div class="subtitle">
<i class="material-icons" style="font-size:16px;">visibility</i>
<span>AI视觉推理在潜在空间的革命性突破</span>
</div>
<div class="affiliation">北京大学 | 快手 | MIT 联合团队</div>
</header>
<div class="main-grid">
<!-- Introduction & Concept -->
<div class="card full-width">
<div class="card-title">
<i class="material-icons">lightbulb</i>
核心概念:超越像素的"想象之眼"
</div>
<div class="card-content" style="display:flex; gap:20px; align-items:center;">
<div style="flex:1;">
<p>Monet旨在让多模态大模型(MLLM)摆脱"看图说话"的笨拙模式,真正拥有类似人类的"想象之眼"。它不再满足于简单的像素识别,而是在高维的<span class="highlight-text">"潜在视觉空间"</span>中进行连续的心理模拟。</p>
<p style="margin-top:10px;"><span class="tag tag-theory">流形假说</span> 数据在高维空间中集中在低维流形上。Monet如同在沙漠中找到了唯一的"绿洲之路",在低维流形上进行"心理模拟",避免了维度灾难。</p>
</div>
<div style="width:180px; flex-shrink:0;">
<img src="https://sfile.chatglm.cn/image/4a/4a7c67c7.jpg" style="width:100%; border-radius:8px; border:1px solid rgba(255,255,255,0.2);" alt="Manifold Visualization">
</div>
</div>
</div>
<!-- Methodology -->
<div class="card">
<div class="card-title">
<i class="material-icons">architecture</i>
核心技术架构
</div>
<div class="img-container">
<img src="https://sfile.chatglm.cn/image/e4/e47b8c1f.jpg" alt="Neural Network Structure">
<div class="img-overlay">SFT + RL 框架示意</div>
</div>
<div class="card-content">
<p style="margin-bottom:10px;"><span class="tag tag-sft">SFT (蒸馏微调)</span></p>
<ul class="feature-list">
<li><strong>阶段1:</strong> 热身适应图像-文本交错推理</li>
<li><strong>阶段2:</strong> 获取高质量目标潜在嵌入</li>
<li><strong>阶段3:</strong> 无辅助图像下自主生成嵌入</li>
</ul>
<p style="margin-top:10px; margin-bottom:5px;"><span class="tag tag-rl">VLPO (策略优化)</span></p>
<p>将连续潜变量纳入强化学习策略梯度,直接根据奖励信号优化"视觉直觉"。</p>
</div>
</div>
<!-- Experimental Results -->
<div class="card">
<div class="card-title">
<i class="material-icons">bar_chart</i>
实验结果与性能
</div>
<div class="card-content">
<p style="margin-bottom:10px;">Monet在常规推理任务和<span class="highlight-text">分布外 (OOD)</span>抽象任务上均显著超越基线模型(如GPT-4V)。</p>
<div class="chart-container">
<canvas id="monetChart"></canvas>
</div>
</div>
</div>
<!-- Applications -->
<div class="card full-width">
<div class="card-title">
<i class="material-icons">rocket_launch</i>
未来展望与应用
</div>
<div class="card-content">
<div class="app-grid">
<div class="app-item">
<img src="https://sfile.chatglm.cn/image/4a/4a1c44e9.jpg" alt="Robot Rescue">
<div class="app-text">机器人救灾:模拟复杂环境,规划安全路径</div>
</div>
<div class="app-item">
<img src="https://sfile.chatglm.cn/image/13/133d6267.jpg" alt="Medical AI">
<div class="app-text">医疗预测:模拟病情演变,辅助诊疗决策</div>
</div>
</div>
<p style="margin-top:15px;">当机器拥有"心智模型",它们将像人类一样在脑海中预演行动后果,开启AI在物理世界应用的新篇章。</p>
</div>
</div>
</div>
<footer>
© 2025 Monet Research Team | Visual Reasoning Revolution
</footer>
</div>
<script>
const ctx = document.getElementById('monetChart').getContext('2d');
new Chart(ctx, {
type: 'bar',
data: {
labels: ['常规推理任务', 'OOD 抽象推理'],
datasets: [
{
label: '基线 (SFT+GRPO)',
data: [48.5, 22.0],
backgroundColor: 'rgba(255, 159, 64, 0.6)',
borderColor: 'rgba(255, 159, 64, 1)',
borderWidth: 1
},
{
label: 'Monet (VLPO)',
data: [54.5, 33.7],
backgroundColor: 'rgba(0, 210, 255, 0.6)',
borderColor: 'rgba(0, 210, 255, 1)',
borderWidth: 1
}
]
},
options: {
responsive: true,
maintainAspectRatio: false,
plugins: {
legend: {
labels: { color: '#e0e0e0', font: { size: 10 } }
}
},
scales: {
x: {
ticks: { color: '#e0e0e0', font: { size: 10 } },
grid: { display: false }
},
y: {
beginAtZero: true,
max: 60,
ticks: { color: '#e0e0e0', font: { size: 10 } },
grid: { color: 'rgba(255,255,255,0.1)' },
title: { display: true, text: '准确率 (%)', color: '#b3b3b3', font: { size: 10 } }
}
}
}
});
</script>
</body>
</html>
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!