Loading...
正在加载...
请稍候

Asking LLMs to Verify First is Almost Free Lunch

✨步子哥 (steper) 2025年12月07日 04:10
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Asking LLMs to Verify First is Almost Free Lunch</title> <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet"> <link href="https://fonts.googleapis.com/css2?family=Futura:wght@400;500;700&display=swap" rel="stylesheet"> <style> <span class="mention-invalid">@font</span>-face { font-family: 'DingTalk JinBuTi'; src: local('DingTalk JinBuTi'); } <span class="mention-invalid">@font</span>-face { font-family: 'HarmonyOS Sans SC'; src: local('HarmonyOS Sans SC'); } * { margin: 0; padding: 0; box-sizing: border-box; } body { font-family: 'HarmonyOS Sans SC', sans-serif; background: linear-gradient(135deg, #f5f7fa 0%, #e4eaf5 100%); color: #1d3557; line-height: 1.6; } .poster { width: 720px; min-height: 960px; margin: 0 auto; padding: 40px; background: linear-gradient(145deg, #ffffff 0%, #f0f4f8 100%); box-shadow: 0 10px 30px rgba(0, 0, 0, 0.1); position: relative; overflow: hidden; } .background-shape { position: absolute; border-radius: 50%; filter: blur(80px); z-index: 0; opacity: 0.4; } .shape-1 { width: 300px; height: 300px; background: #4361ee; top: -100px; right: -100px; } .shape-2 { width: 250px; height: 250px; background: #3f37c9; bottom: 100px; left: -100px; } .grid-texture { position: absolute; top: 0; left: 0; right: 0; bottom: 0; background-image: linear-gradient(rgba(255, 255, 255, 0.05) 1px, transparent 1px), linear-gradient(90deg, rgba(255, 255, 255, 0.05) 1px, transparent 1px); background-size: 20px 20px; z-index: 0; } .content { position: relative; z-index: 1; } .header { text-align: center; margin-bottom: 30px; padding-bottom: 20px; border-bottom: 2px solid #4361ee; } .title { font-family: 'DingTalk JinBuTi', sans-serif; font-size: 40px; font-weight: bold; color: #1d3557; margin-bottom: 10px; letter-spacing: -1px; } .authors { font-size: 18px; margin-bottom: 5px; color: #3a506b; } .affiliation { font-size: 16px; color: #3a506b; font-style: italic; } .section { margin-bottom: 30px; background: rgba(255, 255, 255, 0.8); border-radius: 12px; padding: 20px; box-shadow: 0 4px 12px rgba(0, 0, 0, 0.05); } .section-title { font-family: 'DingTalk JinBuTi', sans-serif; font-size: 28px; color: #1d3557; margin-bottom: 15px; display: flex; align-items: center; } .section-title .material-icons { margin-right: 10px; color: #4361ee; } .abstract { font-size: 16px; line-height: 1.6; } .insights { display: flex; flex-wrap: wrap; gap: 20px; margin-top: 15px; } .insight-card { flex: 1; min-width: 280px; background: rgba(255, 255, 255, 0.9); border-radius: 10px; padding: 15px; box-shadow: 0 2px 8px rgba(0, 0, 0, 0.05); border-left: 4px solid #4361ee; } .insight-title { font-weight: bold; margin-bottom: 8px; color: #3a506b; font-size: 18px; } .method-comparison { display: flex; gap: 20px; margin-top: 20px; } .method-card { flex: 1; background: rgba(255, 255, 255, 0.9); border-radius: 10px; padding: 15px; box-shadow: 0 2px 8px rgba(0, 0, 0, 0.05); } .method-title { font-weight: bold; margin-bottom: 10px; color: #3a506b; font-size: 18px; text-align: center; } .method-content { font-size: 14px; padding: 10px; background: rgba(67, 97, 238, 0.05); border-radius: 8px; font-family: monospace; } .results-container { display: flex; flex-direction: column; gap: 20px; } .chart-container { background: rgba(255, 255, 255, 0.9); border-radius: 10px; padding: 15px; box-shadow: 0 2px 8px rgba(0, 0, 0, 0.05); height: 200px; display: flex; align-items: center; justify-content: center; } .chart-placeholder { width: 100%; height: 100%; background: linear-gradient(135deg, #f0f4f8 0%, #e4eaf5 100%); border-radius: 8px; display: flex; align-items: center; justify-content: center; flex-direction: column; } .bar { height: 30px; margin: 10px 0; border-radius: 5px; display: flex; align-items: center; padding-left: 10px; color: white; font-weight: bold; } .cot-bar { background: #4cc9f0; width: 70%; } .vf-bar { background: #4361ee; width: 85%; } .cost-table { width: 100%; border-collapse: collapse; margin-top: 15px; } .cost-table th, .cost-table td { padding: 10px; text-align: left; border-bottom: 1px solid #e0e0e0; } .cost-table th { background: rgba(67, 97, 238, 0.1); color: #3a506b; } .findings { list-style-type: none; } .findings li { margin-bottom: 10px; padding-left: 30px; position: relative; } .findings li:before { content: "\e876"; font-family: 'Material Icons'; position: absolute; left: 0; color: #4361ee; } .conclusion { font-size: 18px; font-style: italic; text-align: center; margin-top: 20px; padding: 15px; background: rgba(67, 97, 238, 0.1); border-radius: 10px; } .image-container { display: flex; justify-content: center; margin: 20px 0; } .image-container img { max-width: 100%; border-radius: 10px; box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1); } .highlight { background: linear-gradient(transparent 50%, rgba(67, 97, 238, 0.2) 50%); padding: 0 4px; } </style> </head> <body> <div class="poster"> <div class="background-shape shape-1"></div> <div class="background-shape shape-2"></div> <div class="grid-texture"></div> <div class="content"> <div class="header"> <h1 class="title">Asking LLMs to Verify First is Almost Free Lunch</h1> <p class="authors">Shiguang Wu, Quanming Yao</p> <p class="affiliation">Department of Electronic Engineering, Tsinghua University</p> </div> <div class="section"> <h2 class="section-title"> <span class="material-icons">description</span> Abstract </h2> <p class="abstract"> To enhance the reasoning capabilities of Large Language Models (LLMs) without high costs of training, nor extensive test-time sampling, we introduce <span class="highlight">Verification-First (VF)</span>, a strategy that prompts models to verify a provided candidate answer—even a trivial or random one—before generating a solution. This approach triggers a "reverse reasoning" process that is cognitively easier and complementary to standard forward Chain-of-Thought (CoT), effectively invoking the model's critical thinking to reduce logical errors. </p> </div> <div class="section"> <h2 class="section-title"> <span class="material-icons">psychology</span> Theoretical Foundations </h2> <div class="insights"> <div class="insight-card"> <div class="insight-title">Logical Insight</div> <p>Verifying an answer is easier than generating a correct answer, providing complementary information to standard CoT</p> </div> <div class="insight-card"> <div class="insight-title">Psychological Insight</div> <p>Asking one to criticize an answer from others can invoke critical thinking by overcoming egocentrism</p> </div> </div> <div class="image-container"> <img src="https://sfile.chatglm.cn/moeSlide/image/13/13eb0d6e.jpg" alt="Brain and circuit combination showing chain-of-thought reasoning" width="500"> </div> </div> <div class="section"> <h2 class="section-title"> <span class="material-icons">lightbulb</span> Methodology </h2> <div class="method-comparison"> <div class="method-card"> <div class="method-title">Standard CoT Prompting</div> <div class="method-content"> "Q: [Problem Statement]<br> A: Let's think step by step..." </div> </div> <div class="method-card"> <div class="method-title">Verification-First Prompting</div> <div class="method-content"> "Q: [Problem Statement]<br> A possible answer of Q is A'. First verify if A' is correct, then think step by step to find the answer." </div> </div> </div> <div class="image-container"> <img src="https://sfile.chatglm.cn/moeSlide/image/13/1390913c.jpg" alt="Chain-of-thought process visualization" width="500"> </div> <p><strong>Iter-VF Process:</strong> Iterative application of VF using the model's previous answer, creating a Markovian refinement loop that avoids context length and error propagation issues.</p> </div> <div class="section"> <h2 class="section-title"> <span class="material-icons">analytics</span> Experimental Results </h2> <div class="results-container"> <div class="chart-container"> <div class="chart-placeholder"> <div class="bar cot-bar">Standard CoT: 70% Accuracy</div> <div class="bar vf-bar">Verification-First: 85% Accuracy</div> <p>Performance comparison across various benchmarks</p> </div> </div> <table class="cost-table"> <tr> <th>Method</th> <th>Output Tokens (Relative to CoT)</th> <th>Performance Gain</th> </tr> <tr> <td>Standard CoT</td> <td>100%</td> <td>Baseline</td> </tr> <tr> <td>Verification-First</td> <td>120-150%</td> <td>+15% to +25%</td> </tr> <tr> <td>Self-Consistency (N=5)</td> <td>500%</td> <td>+10% to +20%</td> </tr> </table> </div> </div> <div class="section"> <h2 class="section-title"> <span class="material-icons">stars</span> Key Findings </h2> <ul class="findings"> <li>VF with random answers consistently outperforms standard CoT with minimal computational overhead</li> <li>Iter-VF outperforms existing TTS strategies under limited computational budgets</li> <li>VF is effective even with thought-hidden commercial LLM services</li> <li>Verification process is the key driver of improvement, not the quality of the candidate answer</li> </ul> <div class="conclusion"> Verification-First represents a simple, universal, and powerful method for enhancing LLM reasoning capabilities with minimal additional cost - a "free lunch" in terms of cost versus benefit. </div> </div> </div> </div> </body> </html>

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!