<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>大语言模型提示数据集:深入分析与见解</title>
<link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@300;400;500;700&family=Noto+Serif+SC:wght@400;700&display=swap" rel="stylesheet">
<link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
<style>
:root {
--primary: #1565C0;
--primary-light: #5e92f3;
--primary-dark: #003c8f;
--secondary: #26A69A;
--secondary-light: #64D8CB;
--secondary-dark: #00766C;
--text-on-primary: #ffffff;
--text-primary: #212121;
--text-secondary: #757575;
--background: #f5f7fa;
--card-bg: #ffffff;
--accent: #FF5722;
}
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: 'Noto Sans SC', sans-serif;
color: var(--text-primary);
background-color: var(--background);
line-height: 1.6;
}
.poster-container {
width: 720px;
min-height: 960px;
margin: 0 auto;
padding: 40px;
background: linear-gradient(135deg, #e3f2fd, #bbdefb);
position: relative;
overflow: hidden;
}
.poster-container::before {
content: "";
position: absolute;
top: -150px;
right: -150px;
width: 400px;
height: 400px;
border-radius: 50%;
background: radial-gradient(circle, rgba(38, 166, 154, 0.2) 0%, rgba(38, 166, 154, 0) 70%);
z-index: 0;
}
.poster-container::after {
content: "";
position: absolute;
bottom: -100px;
left: -100px;
width: 300px;
height: 300px;
border-radius: 50%;
background: radial-gradient(circle, rgba(21, 101, 192, 0.2) 0%, rgba(21, 101, 192, 0) 70%);
z-index: 0;
}
.header {
text-align: center;
margin-bottom: 30px;
position: relative;
z-index: 1;
}
.title {
font-family: 'Noto Serif SC', serif;
font-size: 40px;
font-weight: 700;
color: var(--primary-dark);
margin-bottom: 15px;
line-height: 1.2;
}
.authors {
font-size: 18px;
color: var(--text-secondary);
margin-bottom: 5px;
}
.affiliations {
font-size: 16px;
color: var(--text-secondary);
font-style: italic;
margin-bottom: 5px;
}
.date {
font-size: 16px;
color: var(--text-secondary);
}
.section {
background-color: var(--card-bg);
border-radius: 12px;
padding: 20px;
margin-bottom: 25px;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.08);
position: relative;
z-index: 1;
}
.section-title {
font-family: 'Noto Serif SC', serif;
font-size: 24px;
font-weight: 700;
color: var(--primary);
margin-bottom: 15px;
display: flex;
align-items: center;
}
.section-title .material-icons {
margin-right: 10px;
color: var(--primary);
}
.section-content {
font-size: 16px;
color: var(--text-primary);
}
.highlight {
background-color: rgba(255, 235, 59, 0.3);
padding: 0 3px;
border-radius: 3px;
}
.stat-highlight {
font-size: 22px;
font-weight: 700;
color: var(--secondary-dark);
margin-right: 5px;
}
.data-sources {
display: flex;
flex-wrap: wrap;
gap: 10px;
margin-top: 10px;
}
.source-item {
display: flex;
align-items: center;
background-color: rgba(38, 166, 154, 0.1);
padding: 8px 12px;
border-radius: 20px;
font-size: 14px;
}
.source-item .material-icons {
font-size: 18px;
margin-right: 5px;
color: var(--secondary);
}
.taxonomy-list {
display: flex;
flex-wrap: wrap;
gap: 10px;
margin-top: 10px;
}
.taxonomy-item {
background-color: rgba(21, 101, 192, 0.1);
padding: 8px 12px;
border-radius: 8px;
font-size: 14px;
border-left: 4px solid var(--primary);
}
.analysis-levels {
display: flex;
justify-content: space-between;
margin-top: 15px;
}
.analysis-level {
flex: 1;
text-align: center;
padding: 10px;
background-color: rgba(21, 101, 192, 0.05);
border-radius: 8px;
margin: 0 5px;
}
.analysis-level-title {
font-weight: 500;
color: var(--primary);
margin-bottom: 5px;
}
.findings-list {
margin-top: 10px;
}
.finding-item {
display: flex;
margin-bottom: 8px;
}
.finding-item .material-icons {
color: var(--secondary);
margin-right: 10px;
flex-shrink: 0;
}
.optimization-diagram {
display: flex;
align-items: center;
justify-content: space-between;
margin: 20px 0;
padding: 15px;
background-color: rgba(255, 255, 255, 0.7);
border-radius: 8px;
border: 1px dashed var(--primary-light);
}
.diagram-step {
text-align: center;
flex: 1;
}
.diagram-step .material-icons {
font-size: 36px;
color: var(--primary);
margin-bottom: 5px;
}
.diagram-arrow {
color: var(--primary);
font-size: 24px;
}
.resource-link {
display: inline-flex;
align-items: center;
background-color: var(--primary);
color: white;
padding: 10px 15px;
border-radius: 8px;
text-decoration: none;
margin-top: 10px;
font-weight: 500;
}
.resource-link .material-icons {
margin-right: 8px;
}
</style>
</head>
<body>
<div class="poster-container">
<!-- 标题部分 -->
<div class="header">
<h1 class="title">大语言模型提示数据集:深入分析与见解</h1>
<p class="authors">张元明*,林燕*,阿里吉特·汗†,万怀宇</p>
<p class="affiliations">北京交通大学,奥尔堡大学,鲍林格林州立大学</p>
<p class="date">2025年10月10日</p>
</div>
<!-- 摘要/引言部分 -->
<div class="section">
<h2 class="section-title">
<span class="material-icons">description</span>
摘要
</h2>
<div class="section-content">
提示是一种自然语言指令,为大语言模型(LLM)定义特定任务,并作为人机交互的主要界面。随着大语言模型的广泛部署,各种提示数据集正从GitHub和社交媒体等平台涌现。这些数据集涵盖广泛的应用和内容类型,促进了更广泛的大语言模型使用和改进的提示工程。
</div>
</div>
<!-- 数据收集部分 -->
<div class="section">
<h2 class="section-title">
<span class="material-icons">storage</span>
数据收集
</h2>
<div class="section-content">
<p>全面收集了 <span class="stat-highlight">1.22 TB</span> 的数据,包含来自 <span class="stat-highlight">129</span> 个异构来源的 <span class="stat-highlight">673M+</span> 提示实例:</p>
<div class="data-sources">
<div class="source-item">
<span class="material-icons">dataset</span>
数据集平台
</div>
<div class="source-item">
<span class="material-icons">school</span>
学术出版物
</div>
<div class="source-item">
<span class="material-icons">code</span>
公共存储库
</div>
<div class="source-item">
<span class="material-icons">forum</span>
社交媒体
</div>
</div>
</div>
</div>
<!-- 分类法部分 -->
<div class="section">
<h2 class="section-title">
<span class="material-icons">account_tree</span>
分类法
</h2>
<div class="section-content">
<p>大语言模型提示数据集的分层分类,按以下方面:</p>
<div class="taxonomy-list">
<div class="taxonomy-item">下游任务</div>
<div class="taxonomy-item">语言</div>
<div class="taxonomy-item">工程技术</div>
<div class="taxonomy-item">属性</div>
<div class="taxonomy-item">模态</div>
</div>
</div>
</div>
<!-- 分析方法部分 -->
<div class="section">
<h2 class="section-title">
<span class="material-icons">analytics</span>
分析方法
</h2>
<div class="section-content">
<p>对七个代表性数据集进行三个维度的多层次语言分析:</p>
<div class="analysis-levels">
<div class="analysis-level">
<div class="analysis-level-title">词汇层面</div>
<div>标记分布、词汇分析</div>
</div>
<div class="analysis-level">
<div class="analysis-level-title">句法层面</div>
<div>依存解析、词性标注、TF-IDF</div>
</div>
<div class="analysis-level">
<div class="analysis-level-title">语义层面</div>
<div>主题建模、语义相似度</div>
</div>
</div>
</div>
</div>
<!-- 主要发现部分 -->
<div class="section">
<h2 class="section-title">
<span class="material-icons">lightbulb</span>
主要发现
</h2>
<div class="section-content">
<div class="findings-list">
<div class="finding-item">
<span class="material-icons">check_circle</span>
<div>与其他文本语料库相比,提示表现出独特的组合模式</div>
</div>
<div class="finding-item">
<span class="material-icons">check_circle</span>
<div>不同应用中提示构建的领域特定变化</div>
</div>
<div class="finding-item">
<span class="material-icons">check_circle</span>
<div>独特的语言特性将提示与文学和网络内容区分开来</div>
</div>
<div class="finding-item">
<span class="material-icons">check_circle</span>
<div>提示往往比一般文本更具指导性和任务导向性</div>
</div>
</div>
</div>
</div>
<!-- 优化方法部分 -->
<div class="section">
<h2 class="section-title">
<span class="material-icons">tune</span>
优化方法
</h2>
<div class="section-content">
<p>利用句法嵌入的新颖提示优化方法:</p>
<div class="optimization-diagram">
<div class="diagram-step">
<span class="material-icons">text_fields</span>
<div>提取词性与依存特征</div>
</div>
<div class="diagram-arrow">→</div>
<div class="diagram-step">
<span class="material-icons">hub</span>
<div>识别质心表示</div>
</div>
<div class="diagram-arrow">→</div>
<div class="diagram-step">
<span class="material-icons">edit</span>
<div>指导大语言模型重写提示</div>
</div>
</div>
<p>提高了模型输出的意义和质量。</p>
</div>
</div>
<!-- 影响与应用部分 -->
<div class="section">
<h2 class="section-title">
<span class="material-icons">insights</span>
影响与应用
</h2>
<div class="section-content">
<div class="findings-list">
<div class="finding-item">
<span class="material-icons">star</span>
<div>首次全面编译提示数据集</div>
</div>
<div class="finding-item">
<span class="material-icons">star</span>
<div>为系统提示工程研究提供基础</div>
</div>
<div class="finding-item">
<span class="material-icons">star</span>
<div>实现更有效的提示选择和优化</div>
</div>
<div class="finding-item">
<span class="material-icons">star</span>
<div>促进大语言模型在各种应用中的广泛部署</div>
</div>
</div>
</div>
</div>
<!-- 资源部分 -->
<div class="section">
<h2 class="section-title">
<span class="material-icons">folder_open</span>
资源
</h2>
<div class="section-content">
<p>数据集和代码可供研究使用:</p>
<a href="https://anonymous.4open.science/r/LLM-Prompt-Datasets-7416" class="resource-link" target="_blank">
<span class="material-icons">link</span>
https://anonymous.4open.science/r/LLM-Prompt-Datasets-7416
</a>
<p style="margin-top: 10px;">超过1.22 TB的精选提示数据供研究使用</p>
</div>
</div>
</div>
</body>
</html>
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!