《Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation》论文深度分析
✨步子哥 (steper)
•
2025年12月11日 07:32
•
0 次浏览
《Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation》论文深度分析
《Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation》论文深度分析
引言
随着大型语言模型(LLM)的崛起,利用自然语言提示(prompt)来执行推荐任务成为可能【1†source】。与传统基于协同过滤的方法相比,LLM驱动的推荐在冷启动、跨域推荐和零样本场景下展现出独特优势,同时支持灵活的输入格式并能够生成用户行为的解释【1†source】。然而,如何有效设计提示(即提示工程)以充分发挥LLM在推荐中的潜力,尚缺乏系统性的研究结论。为此,Kusano等人在论文《Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation》中进行了大规模的实验评估,旨在填补这一空白【1†source】。
《Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation》这篇论文通过大规模实验,为我们揭示了LLM个性化推荐中提示工程的最佳实践和潜在误区。研究证明,提示工程在单用户推荐场景中至关重要,不同提示对模型性能的影响显著且复杂【1†source】。关键结论包括:对于小模型,精心设计的提示(如重述指令、补充知识、简化推理)能大幅提升准确率;而对于大模型,简单提示往往既高效又经济【1†source】。此外,一些在NLP中流行的提示方法在推荐中并不奏效,提示设计需要贴合推荐任务的特点【1†source】。
Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation
Genki Kusano, Kosuke Akimoto, Kunihiro Takeoka
ACM RecSys 2025 • July 17, 2025
psychology
Research Context
Large Language Models (LLMs) can perform recommendation tasks using natural language prompts, offering advantages over traditional methods like collaborative filtering. This study focuses on single-user settings, particularly valuable for privacy-sensitive or data-limited applications where prompt engineering becomes crucial for controlling LLM outputs.
science
Methodology
We conducted a large-scale evaluation using statistical tests and linear mixed-effects models to assess both accuracy and inference cost.
23
Prompt Types
8
Public Datasets
12
LLMs Evaluated
lightbulb
Key Findings
savings
Cost-Efficient LLMs
Three prompt types proved especially effective:
Rephrased instructions
Background knowledge consideration
Clearer reasoning processes
speed
High-Performance LLMs
Simple prompts often outperformed complex ones while reducing cost. The most straightforward approaches yielded better results than elaborate prompting strategies.
trending_down
Ineffective Strategies
Common NLP prompting styles like step-by-step reasoning or the use of reasoning models frequently led to lower accuracy in recommendation tasks.
balance
Cost-Accuracy Trade-off
Our analysis revealed significant differences in the cost-accuracy balance across different LLMs and prompt types, highlighting the importance of strategic selection based on application requirements.
recommend
Practical Recommendations
check_circle
For cost-efficient LLMs: prioritize prompts with rephrased instructions, background knowledge, and clearer reasoning processes
check_circle
For high-performance LLMs: use simple, direct prompts to maximize accuracy while minimizing cost
check_circle
Avoid common NLP prompting styles like step-by-step reasoning for recommendation tasks
check_circle
Select LLMs based on the specific balance between accuracy requirements and computational constraints
Implications
This study provides the first large-scale systematic evaluation of prompt engineering techniques for LLM-based recommendation systems. Our findings challenge conventional wisdom about prompt engineering in NLP and offer practical guidance for developing more effective and efficient recommendation systems in single-user settings.