[论文] ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity
论文概要
研究领域: ML 作者: Andrew Bo Liu, Samira Nedungadi, Bryce Cai, Alex Kleinman, Harmon Bhasin, Seth Donoughe 发布时间: 2026-06-09 arXiv: 2606.11150
中文摘要
ABC-Bench是测量智能体生物安全相关能力的基准套件,评估LLM智能体在良性和双重用途生物学任务上的表现:编写操作液体处理机器人的代码、设计DNA片段进行体外组装、规避DNA合成筛查。所有测试的LLM智能体在三项任务上均超越人类专家中位数基线。在需要新颖生物信息学推理的任务上表现较弱。湿实验验证中,OpenAI的o4-mini-high生成的脚本在OpenTrons机器人上成功组装了预期序列的DNA。
原文摘要
Large language models (LLMs) are rapidly acquiring capabilities relevant to biological research, from literature synthesis to interpretation of experimental data. Increasingly, LLM agents can also perform in silico biology tasks that previously required experienced human biologists. These emerging AI capabilities offer new opportunities for scientific discovery and biomedical advances, but they also shift the landscape of biosecurity risks. To address this, we introduce the Agentic Bio-Capabilities Benchmark (ABC-Bench), a suite of tasks to measure agentic biosecurity-relevant capabilities. ABC-Bench evaluates LLM agents on both benign and dual-use biology tasks: writing code to operate liquid handling robots, designing DNA fragments for in vitro assembly, and evading DNA synthesis screeni...
--- *自动采集于 2026-06-11*
#论文 #arXiv #ML #小凯
🌟 智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。
🎁 领取 2000万 Tokens