<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Asking LLMs to Verify First is Almost Free Lunch</title>
<link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
<link href="https://fonts.googleapis.com/css2?family=Futura:wght@400;500;700&display=swap" rel="stylesheet">
<style>
<span class="mention-invalid">@font</span>-face {
font-family: 'DingTalk JinBuTi';
src: local('DingTalk JinBuTi');
}
<span class="mention-invalid">@font</span>-face {
font-family: 'HarmonyOS Sans SC';
src: local('HarmonyOS Sans SC');
}
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: 'HarmonyOS Sans SC', sans-serif;
background: linear-gradient(135deg, #f5f7fa 0%, #e4eaf5 100%);
color: #1d3557;
line-height: 1.6;
}
.poster {
width: 720px;
min-height: 960px;
margin: 0 auto;
padding: 40px;
background: linear-gradient(145deg, #ffffff 0%, #f0f4f8 100%);
box-shadow: 0 10px 30px rgba(0, 0, 0, 0.1);
position: relative;
overflow: hidden;
}
.background-shape {
position: absolute;
border-radius: 50%;
filter: blur(80px);
z-index: 0;
opacity: 0.4;
}
.shape-1 {
width: 300px;
height: 300px;
background: #4361ee;
top: -100px;
right: -100px;
}
.shape-2 {
width: 250px;
height: 250px;
background: #3f37c9;
bottom: 100px;
left: -100px;
}
.grid-texture {
position: absolute;
top: 0;
left: 0;
right: 0;
bottom: 0;
background-image:
linear-gradient(rgba(255, 255, 255, 0.05) 1px, transparent 1px),
linear-gradient(90deg, rgba(255, 255, 255, 0.05) 1px, transparent 1px);
background-size: 20px 20px;
z-index: 0;
}
.content {
position: relative;
z-index: 1;
}
.header {
text-align: center;
margin-bottom: 30px;
padding-bottom: 20px;
border-bottom: 2px solid #4361ee;
}
.title {
font-family: 'DingTalk JinBuTi', sans-serif;
font-size: 40px;
font-weight: bold;
color: #1d3557;
margin-bottom: 10px;
letter-spacing: -1px;
}
.authors {
font-size: 18px;
margin-bottom: 5px;
color: #3a506b;
}
.affiliation {
font-size: 16px;
color: #3a506b;
font-style: italic;
}
.section {
margin-bottom: 30px;
background: rgba(255, 255, 255, 0.8);
border-radius: 12px;
padding: 20px;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.05);
}
.section-title {
font-family: 'DingTalk JinBuTi', sans-serif;
font-size: 28px;
color: #1d3557;
margin-bottom: 15px;
display: flex;
align-items: center;
}
.section-title .material-icons {
margin-right: 10px;
color: #4361ee;
}
.abstract {
font-size: 16px;
line-height: 1.6;
}
.insights {
display: flex;
flex-wrap: wrap;
gap: 20px;
margin-top: 15px;
}
.insight-card {
flex: 1;
min-width: 280px;
background: rgba(255, 255, 255, 0.9);
border-radius: 10px;
padding: 15px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.05);
border-left: 4px solid #4361ee;
}
.insight-title {
font-weight: bold;
margin-bottom: 8px;
color: #3a506b;
font-size: 18px;
}
.method-comparison {
display: flex;
gap: 20px;
margin-top: 20px;
}
.method-card {
flex: 1;
background: rgba(255, 255, 255, 0.9);
border-radius: 10px;
padding: 15px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.05);
}
.method-title {
font-weight: bold;
margin-bottom: 10px;
color: #3a506b;
font-size: 18px;
text-align: center;
}
.method-content {
font-size: 14px;
padding: 10px;
background: rgba(67, 97, 238, 0.05);
border-radius: 8px;
font-family: monospace;
}
.results-container {
display: flex;
flex-direction: column;
gap: 20px;
}
.chart-container {
background: rgba(255, 255, 255, 0.9);
border-radius: 10px;
padding: 15px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.05);
height: 200px;
display: flex;
align-items: center;
justify-content: center;
}
.chart-placeholder {
width: 100%;
height: 100%;
background: linear-gradient(135deg, #f0f4f8 0%, #e4eaf5 100%);
border-radius: 8px;
display: flex;
align-items: center;
justify-content: center;
flex-direction: column;
}
.bar {
height: 30px;
margin: 10px 0;
border-radius: 5px;
display: flex;
align-items: center;
padding-left: 10px;
color: white;
font-weight: bold;
}
.cot-bar {
background: #4cc9f0;
width: 70%;
}
.vf-bar {
background: #4361ee;
width: 85%;
}
.cost-table {
width: 100%;
border-collapse: collapse;
margin-top: 15px;
}
.cost-table th, .cost-table td {
padding: 10px;
text-align: left;
border-bottom: 1px solid #e0e0e0;
}
.cost-table th {
background: rgba(67, 97, 238, 0.1);
color: #3a506b;
}
.findings {
list-style-type: none;
}
.findings li {
margin-bottom: 10px;
padding-left: 30px;
position: relative;
}
.findings li:before {
content: "\e876";
font-family: 'Material Icons';
position: absolute;
left: 0;
color: #4361ee;
}
.conclusion {
font-size: 18px;
font-style: italic;
text-align: center;
margin-top: 20px;
padding: 15px;
background: rgba(67, 97, 238, 0.1);
border-radius: 10px;
}
.image-container {
display: flex;
justify-content: center;
margin: 20px 0;
}
.image-container img {
max-width: 100%;
border-radius: 10px;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);
}
.highlight {
background: linear-gradient(transparent 50%, rgba(67, 97, 238, 0.2) 50%);
padding: 0 4px;
}
</style>
</head>
<body>
<div class="poster">
<div class="background-shape shape-1"></div>
<div class="background-shape shape-2"></div>
<div class="grid-texture"></div>
<div class="content">
<div class="header">
<h1 class="title">Asking LLMs to Verify First is Almost Free Lunch</h1>
<p class="authors">Shiguang Wu, Quanming Yao</p>
<p class="affiliation">Department of Electronic Engineering, Tsinghua University</p>
</div>
<div class="section">
<h2 class="section-title">
<span class="material-icons">description</span>
Abstract
</h2>
<p class="abstract">
To enhance the reasoning capabilities of Large Language Models (LLMs) without high costs of training, nor extensive test-time sampling, we introduce <span class="highlight">Verification-First (VF)</span>, a strategy that prompts models to verify a provided candidate answer—even a trivial or random one—before generating a solution. This approach triggers a "reverse reasoning" process that is cognitively easier and complementary to standard forward Chain-of-Thought (CoT), effectively invoking the model's critical thinking to reduce logical errors.
</p>
</div>
<div class="section">
<h2 class="section-title">
<span class="material-icons">psychology</span>
Theoretical Foundations
</h2>
<div class="insights">
<div class="insight-card">
<div class="insight-title">Logical Insight</div>
<p>Verifying an answer is easier than generating a correct answer, providing complementary information to standard CoT</p>
</div>
<div class="insight-card">
<div class="insight-title">Psychological Insight</div>
<p>Asking one to criticize an answer from others can invoke critical thinking by overcoming egocentrism</p>
</div>
</div>
<div class="image-container">
<img src="https://sfile.chatglm.cn/moeSlide/image/13/13eb0d6e.jpg" alt="Brain and circuit combination showing chain-of-thought reasoning" width="500">
</div>
</div>
<div class="section">
<h2 class="section-title">
<span class="material-icons">lightbulb</span>
Methodology
</h2>
<div class="method-comparison">
<div class="method-card">
<div class="method-title">Standard CoT Prompting</div>
<div class="method-content">
"Q: [Problem Statement]<br>
A: Let's think step by step..."
</div>
</div>
<div class="method-card">
<div class="method-title">Verification-First Prompting</div>
<div class="method-content">
"Q: [Problem Statement]<br>
A possible answer of Q is A'. First verify if A' is correct, then think step by step to find the answer."
</div>
</div>
</div>
<div class="image-container">
<img src="https://sfile.chatglm.cn/moeSlide/image/13/1390913c.jpg" alt="Chain-of-thought process visualization" width="500">
</div>
<p><strong>Iter-VF Process:</strong> Iterative application of VF using the model's previous answer, creating a Markovian refinement loop that avoids context length and error propagation issues.</p>
</div>
<div class="section">
<h2 class="section-title">
<span class="material-icons">analytics</span>
Experimental Results
</h2>
<div class="results-container">
<div class="chart-container">
<div class="chart-placeholder">
<div class="bar cot-bar">Standard CoT: 70% Accuracy</div>
<div class="bar vf-bar">Verification-First: 85% Accuracy</div>
<p>Performance comparison across various benchmarks</p>
</div>
</div>
<table class="cost-table">
<tr>
<th>Method</th>
<th>Output Tokens (Relative to CoT)</th>
<th>Performance Gain</th>
</tr>
<tr>
<td>Standard CoT</td>
<td>100%</td>
<td>Baseline</td>
</tr>
<tr>
<td>Verification-First</td>
<td>120-150%</td>
<td>+15% to +25%</td>
</tr>
<tr>
<td>Self-Consistency (N=5)</td>
<td>500%</td>
<td>+10% to +20%</td>
</tr>
</table>
</div>
</div>
<div class="section">
<h2 class="section-title">
<span class="material-icons">stars</span>
Key Findings
</h2>
<ul class="findings">
<li>VF with random answers consistently outperforms standard CoT with minimal computational overhead</li>
<li>Iter-VF outperforms existing TTS strategies under limited computational budgets</li>
<li>VF is effective even with thought-hidden commercial LLM services</li>
<li>Verification process is the key driver of improvement, not the quality of the candidate answer</li>
</ul>
<div class="conclusion">
Verification-First represents a simple, universal, and powerful method for enhancing LLM reasoning capabilities with minimal additional cost - a "free lunch" in terms of cost versus benefit.
</div>
</div>
</div>
</div>
</body>
</html>
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!