Asking LLMs to Verify First is Almost Free Lunch

✨步子哥 (steper) • 2025年12月07日 04:10
                        <!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Asking LLMs to Verify First is Almost Free Lunch</title>
    <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
    <link href="https://fonts.googleapis.com/css2?family=Futura:wght@400;500;700&display=swap" rel="stylesheet">
    <style>
        <span class="mention-invalid">@font</span>-face {
            font-family: 'DingTalk JinBuTi';
            src: local('DingTalk JinBuTi');
        }
        
        <span class="mention-invalid">@font</span>-face {
            font-family: 'HarmonyOS Sans SC';
            src: local('HarmonyOS Sans SC');
        }
        
        * {
            margin: 0;
            padding: 0;
            box-sizing: border-box;
        }
        
        body {
            font-family: 'HarmonyOS Sans SC', sans-serif;
            background: linear-gradient(135deg, #f5f7fa 0%, #e4eaf5 100%);
            color: #1d3557;
            line-height: 1.6;
        }
        
        .poster {
            width: 720px;
            min-height: 960px;
            margin: 0 auto;
            padding: 40px;
            background: linear-gradient(145deg, #ffffff 0%, #f0f4f8 100%);
            box-shadow: 0 10px 30px rgba(0, 0, 0, 0.1);
            position: relative;
            overflow: hidden;
        }
        
        .background-shape {
            position: absolute;
            border-radius: 50%;
            filter: blur(80px);
            z-index: 0;
            opacity: 0.4;
        }
        
        .shape-1 {
            width: 300px;
            height: 300px;
            background: #4361ee;
            top: -100px;
            right: -100px;
        }
        
        .shape-2 {
            width: 250px;
            height: 250px;
            background: #3f37c9;
            bottom: 100px;
            left: -100px;
        }
        
        .grid-texture {
            position: absolute;
            top: 0;
            left: 0;
            right: 0;
            bottom: 0;
            background-image: 
                linear-gradient(rgba(255, 255, 255, 0.05) 1px, transparent 1px),
                linear-gradient(90deg, rgba(255, 255, 255, 0.05) 1px, transparent 1px);
            background-size: 20px 20px;
            z-index: 0;
        }
        
        .content {
            position: relative;
            z-index: 1;
        }
        
        .header {
            text-align: center;
            margin-bottom: 30px;
            padding-bottom: 20px;
            border-bottom: 2px solid #4361ee;
        }
        
        .title {
            font-family: 'DingTalk JinBuTi', sans-serif;
            font-size: 40px;
            font-weight: bold;
            color: #1d3557;
            margin-bottom: 10px;
            letter-spacing: -1px;
        }
        
        .authors {
            font-size: 18px;
            margin-bottom: 5px;
            color: #3a506b;
        }
        
        .affiliation {
            font-size: 16px;
            color: #3a506b;
            font-style: italic;
        }
        
        .section {
            margin-bottom: 30px;
            background: rgba(255, 255, 255, 0.8);
            border-radius: 12px;
            padding: 20px;
            box-shadow: 0 4px 12px rgba(0, 0, 0, 0.05);
        }
        
        .section-title {
            font-family: 'DingTalk JinBuTi', sans-serif;
            font-size: 28px;
            color: #1d3557;
            margin-bottom: 15px;
            display: flex;
            align-items: center;
        }
        
        .section-title .material-icons {
            margin-right: 10px;
            color: #4361ee;
        }
        
        .abstract {
            font-size: 16px;
            line-height: 1.6;
        }
        
        .insights {
            display: flex;
            flex-wrap: wrap;
            gap: 20px;
            margin-top: 15px;
        }
        
        .insight-card {
            flex: 1;
            min-width: 280px;
            background: rgba(255, 255, 255, 0.9);
            border-radius: 10px;
            padding: 15px;
            box-shadow: 0 2px 8px rgba(0, 0, 0, 0.05);
            border-left: 4px solid #4361ee;
        }
        
        .insight-title {
            font-weight: bold;
            margin-bottom: 8px;
            color: #3a506b;
            font-size: 18px;
        }
        
        .method-comparison {
            display: flex;
            gap: 20px;
            margin-top: 20px;
        }
        
        .method-card {
            flex: 1;
            background: rgba(255, 255, 255, 0.9);
            border-radius: 10px;
            padding: 15px;
            box-shadow: 0 2px 8px rgba(0, 0, 0, 0.05);
        }
        
        .method-title {
            font-weight: bold;
            margin-bottom: 10px;
            color: #3a506b;
            font-size: 18px;
            text-align: center;
        }
        
        .method-content {
            font-size: 14px;
            padding: 10px;
            background: rgba(67, 97, 238, 0.05);
            border-radius: 8px;
            font-family: monospace;
        }
        
        .results-container {
            display: flex;
            flex-direction: column;
            gap: 20px;
        }
        
        .chart-container {
            background: rgba(255, 255, 255, 0.9);
            border-radius: 10px;
            padding: 15px;
            box-shadow: 0 2px 8px rgba(0, 0, 0, 0.05);
            height: 200px;
            display: flex;
            align-items: center;
            justify-content: center;
        }
        
        .chart-placeholder {
            width: 100%;
            height: 100%;
            background: linear-gradient(135deg, #f0f4f8 0%, #e4eaf5 100%);
            border-radius: 8px;
            display: flex;
            align-items: center;
            justify-content: center;
            flex-direction: column;
        }
        
        .bar {
            height: 30px;
            margin: 10px 0;
            border-radius: 5px;
            display: flex;
            align-items: center;
            padding-left: 10px;
            color: white;
            font-weight: bold;
        }
        
        .cot-bar {
            background: #4cc9f0;
            width: 70%;
        }
        
        .vf-bar {
            background: #4361ee;
            width: 85%;
        }
        
        .cost-table {
            width: 100%;
            border-collapse: collapse;
            margin-top: 15px;
        }
        
        .cost-table th, .cost-table td {
            padding: 10px;
            text-align: left;
            border-bottom: 1px solid #e0e0e0;
        }
        
        .cost-table th {
            background: rgba(67, 97, 238, 0.1);
            color: #3a506b;
        }
        
        .findings {
            list-style-type: none;
        }
        
        .findings li {
            margin-bottom: 10px;
            padding-left: 30px;
            position: relative;
        }
        
        .findings li:before {
            content: "\e876";
            font-family: 'Material Icons';
            position: absolute;
            left: 0;
            color: #4361ee;
        }
        
        .conclusion {
            font-size: 18px;
            font-style: italic;
            text-align: center;
            margin-top: 20px;
            padding: 15px;
            background: rgba(67, 97, 238, 0.1);
            border-radius: 10px;
        }
        
        .image-container {
            display: flex;
            justify-content: center;
            margin: 20px 0;
        }
        
        .image-container img {
            max-width: 100%;
            border-radius: 10px;
            box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);
        }
        
        .highlight {
            background: linear-gradient(transparent 50%, rgba(67, 97, 238, 0.2) 50%);
            padding: 0 4px;
        }
    </style>
</head>
<body>
    <div class="poster">
        <div class="background-shape shape-1"></div>
        <div class="background-shape shape-2"></div>
        <div class="grid-texture"></div>
        
        <div class="content">
            <div class="header">
                <h1 class="title">Asking LLMs to Verify First is Almost Free Lunch</h1>
                <p class="authors">Shiguang Wu, Quanming Yao</p>
                <p class="affiliation">Department of Electronic Engineering, Tsinghua University</p>
            </div>
            
            <div class="section">
                <h2 class="section-title">
                    <span class="material-icons">description</span>
                    Abstract
                </h2>
                <p class="abstract">
                    To enhance the reasoning capabilities of Large Language Models (LLMs) without high costs of training, nor extensive test-time sampling, we introduce <span class="highlight">Verification-First (VF)</span>, a strategy that prompts models to verify a provided candidate answer—even a trivial or random one—before generating a solution. This approach triggers a "reverse reasoning" process that is cognitively easier and complementary to standard forward Chain-of-Thought (CoT), effectively invoking the model's critical thinking to reduce logical errors.
                </p>
            </div>
            
            <div class="section">
                <h2 class="section-title">
                    <span class="material-icons">psychology</span>
                    Theoretical Foundations
                </h2>
                <div class="insights">
                    <div class="insight-card">
                        <div class="insight-title">Logical Insight</div>
                        <p>Verifying an answer is easier than generating a correct answer, providing complementary information to standard CoT</p>
                    </div>
                    <div class="insight-card">
                        <div class="insight-title">Psychological Insight</div>
                        <p>Asking one to criticize an answer from others can invoke critical thinking by overcoming egocentrism</p>
                    </div>
                </div>
                <div class="image-container">
                    <img src="https://sfile.chatglm.cn/moeSlide/image/13/13eb0d6e.jpg" alt="Brain and circuit combination showing chain-of-thought reasoning" width="500">
                </div>
            </div>
            
            <div class="section">
                <h2 class="section-title">
                    <span class="material-icons">lightbulb</span>
                    Methodology
                </h2>
                <div class="method-comparison">
                    <div class="method-card">
                        <div class="method-title">Standard CoT Prompting</div>
                        <div class="method-content">
                            "Q: [Problem Statement]<br>
                            A: Let's think step by step..."
                        </div>
                    </div>
                    <div class="method-card">
                        <div class="method-title">Verification-First Prompting</div>
                        <div class="method-content">
                            "Q: [Problem Statement]<br>
                            A possible answer of Q is A'. First verify if A' is correct, then think step by step to find the answer."
                        </div>
                    </div>
                </div>
                
                <div class="image-container">
                    <img src="https://sfile.chatglm.cn/moeSlide/image/13/1390913c.jpg" alt="Chain-of-thought process visualization" width="500">
                </div>
                
                <p><strong>Iter-VF Process:</strong> Iterative application of VF using the model's previous answer, creating a Markovian refinement loop that avoids context length and error propagation issues.</p>
            </div>
            
            <div class="section">
                <h2 class="section-title">
                    <span class="material-icons">analytics</span>
                    Experimental Results
                </h2>
                <div class="results-container">
                    <div class="chart-container">
                        <div class="chart-placeholder">
                            <div class="bar cot-bar">Standard CoT: 70% Accuracy</div>
                            <div class="bar vf-bar">Verification-First: 85% Accuracy</div>
                            <p>Performance comparison across various benchmarks</p>
                        </div>
                    </div>
                    
                    <table class="cost-table">
                        <tr>
                            <th>Method</th>
                            <th>Output Tokens (Relative to CoT)</th>
                            <th>Performance Gain</th>
                        </tr>
                        <tr>
                            <td>Standard CoT</td>
                            <td>100%</td>
                            <td>Baseline</td>
                        </tr>
                        <tr>
                            <td>Verification-First</td>
                            <td>120-150%</td>
                            <td>+15% to +25%</td>
                        </tr>
                        <tr>
                            <td>Self-Consistency (N=5)</td>
                            <td>500%</td>
                            <td>+10% to +20%</td>
                        </tr>
                    </table>
                </div>
            </div>
            
            <div class="section">
                <h2 class="section-title">
                    <span class="material-icons">stars</span>
                    Key Findings
                </h2>
                <ul class="findings">
                    <li>VF with random answers consistently outperforms standard CoT with minimal computational overhead</li>
                    <li>Iter-VF outperforms existing TTS strategies under limited computational budgets</li>
                    <li>VF is effective even with thought-hidden commercial LLM services</li>
                    <li>Verification process is the key driver of improvement, not the quality of the candidate answer</li>
                </ul>
                
                <div class="conclusion">
                    Verification-First represents a simple, universal, and powerful method for enhancing LLM reasoning capabilities with minimal additional cost - a "free lunch" in terms of cost versus benefit.
                </div>
            </div>
        </div>
    </div>
</body>
</html>                    
讨论回复

0 条回复
还没有人回复，快来发表你的看法吧！
需要登录才能发表回复
登录注册
Asking LLMs to Verify First is Almost Free Lunch

讨论回复

推荐