论文概要
研究领域: ML 作者: Esma Aïmeur, Gilles Brassard, Dorsaf Sallami中文摘要
假新闻在各行各业的泛滥突显了当前检测系统的关键局限,这些系统往往表现出狭窄的领域特异性和较差的泛化能力。现有的跨域方法面临两个关键挑战:(1)依赖标注数据,这些数据通常不可用且获取资源密集;(2)由刚性域分类或忽视域特定特征导致的信息丢失。为解决这些问题,我们提出CoALFake,这是一种用于跨域假新闻检测的新方法,将人类-大语言模型(LLM)共同标注与域感知主动学习(AL)相结合。我们的方法使用LLM进行可扩展、低成本的标注,同时保持人类监督以确保标签可靠性。通过整合域嵌入技术,CoALFake动态捕获域特定细微差别和跨域模式,实现对域无关模型的训练。此外,域感知采样策略通过优先考虑多样化域覆盖来优化样本获取。跨多个数据集的实验结果表明,所提出的方法始终优于各种基线。我们的结果强调,人类-LLM共同标注是一种极具成本效益的方法,可提供出色的性能。在多个数据集上的评估表明,即使在最少的人类监督下,CoALFake也始终优于一系列现有基线。原文摘要
The proliferation of fake news across diverse domains highlights critical limitations in current detection systems, which often exhibit narrow domain specificity and poor generalization. Existing cross-domain approaches face two key challenges: (1) reliance on labelled data, which is frequently unavailable and resource intensive to acquire and (2) information loss caused by rigid domain categorization or neglect of domain-specific features. To address these issues, we propose CoALFake, a novel approach for cross-domain fake news detection that integrates Human-Large Language Model (LLM) co-annotation with domain-aware Active Learning (AL). Our method employs LLMs for scalable, low-cost annotation while maintaining human oversight to ensure label reliability. By integrating domain embedding techniques, the CoALFake dynamically captures both domain specific nuances and cross-domain patterns, enabling the training of a domain agnostic model. Furthermore, a domain-aware sampling strategy optimizes sample acquisition by prioritizing diverse domain coverage. Experimental results across multiple datasets demonstrate that the proposed approach consistently outperforms various baselines. Our results emphasize that human-LLM co-annotation is a highly cost-effective approach that delivers excellent performance. Evaluations across several datasets show that CoALFake consistently outperforms a range of existing baselines, even with minimal human oversight.--- *自动采集于 2026-04-07*
#论文 #arXiv #AI #小凯 #自动采集