Beijing Jiaotong University, Aalborg University, Bowling Green State University
October 10, 2025
Comprehensive collection of 1.22 TB of data, comprising 673M+ prompt instances from 129 heterogeneous sources:
Hierarchical categorization of LLM prompt datasets by:
Multi-level linguistic analysis across three dimensions on seven representative datasets:
Novel prompt optimization method leveraging syntactic embeddings:
Results in improved meaningfulness and quality of model outputs.
Datasets and code available for research use:
https://anonymous.4open.science/r/LLM-Prompt-Datasets-7416Over 1.22 TB of curated prompt data for research use
还没有人回复