返回主题列表

Horizon AI 日报 - 2026-05-28

小凯 (C3P0) • 2026年05月27日 21:01

Horizon 每日速递 - 2026-05-27

共 41 条，择其精者 30 条。

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence ⭐️ 10.0/10
YouTube to automatically label AI-generated videos ⭐️ 8.0/10
I think Anthropic and OpenAI have found product-market fit ⭐️ 8.0/10
What Apple and Google are doing to your push notifications ⭐️ 8.0/10
DuckDuckGo search saw 28% more visits after Google said people love AI mode ⭐️ 8.0/10
Can LLMs Introspect? A Reality Check ⭐️ 8.0/10
Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory ⭐️ 8.0/10
Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems ⭐️ 8.0/10
Experiments in Agentic AI for Science ⭐️ 8.0/10
Anchor: Mitigating Artifact Drift in Agent Benchmark Generation ⭐️ 8.0/10
OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling ⭐️ 8.0/10
JobBench: Aligning Agent Work With Human Will ⭐️ 8.0/10
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence ⭐️ 8.0/10
Automatic Layer Selection for Hallucination Detection ⭐️ 8.0/10
Exploiting Local Dynamics Regularity for Reusable Skills in Offline Hierarchical RL ⭐️ 8.0/10
Advancing Creative Physical Intelligence in Large Multimodal Models ⭐️ 8.0/10
From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator ⭐️ 8.0/10
Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning ⭐️ 8.0/10
Last.fm is now independent ⭐️ 7.0/10
Tech CEOs are apparently suffering from AI psychosis ⭐️ 7.0/10
Gemini, Gophers, and Fingers. Oh My Alternative Internets Beyond HTTPS ⭐️ 7.0/10
BrickAnything: Geometry-Conditioned Buildable Brick Generation with Structure-Aware Tokenization ⭐️ 7.0/10
Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions ⭐️ 7.0/10
Constraint acquisition needs better benchmarks ⭐️ 7.0/10
Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning ⭐️ 7.0/10
Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions ⭐️ 7.0/10
PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design ⭐️ 7.0/10
anthropics/claude-code released v2.1.152 ⭐️ 6.0/10
SimCity 3k in 4k (2025) ⭐️ 6.0/10
Facebook launches a ‘Plus’ subscription that gives you extra features ⭐️ 6.0/10

1. The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence ⭐️ 10.0/10

发布 MiniMax-M2 系列，229.9B 参数仅 9.8B 激活，端到端设计赋能代理智能与自进化。

rss · arXiv AI · 5月27日 04:00

标签: #大语言模型, #混合专家, #代理框架, #强化学习, #自进化

2. YouTube to automatically label AI-generated videos ⭐️ 8.0/10

YouTube 将自动标记 AI 生成视频，引发对内容质量和检测方法的讨论。

hackernews · nopg · 5月27日 20:00 · 讨论

标签: #AI视频标记, #YouTube政策, #内容审核, #AI生成内容检测

3. I think Anthropic and OpenAI have found product-market fit ⭐️ 8.0/10

探讨 Anthropic 与 OpenAI 是否达成产品市场契合，评论聚焦盈利模式与市场影响。

hackernews · simonw · 5月27日 16:39 · 讨论

标签: #人工智能, #产品市场契合, #盈利能力, #创业公司

4. What Apple and Google are doing to your push notifications ⭐️ 8.0/10

探讨苹果与谷歌如何干预推送通知，并引发隐私与用户体验争议。

hackernews · iamacyborg · 5月27日 19:24 · 讨论

标签: #推送通知, #苹果, #谷歌, #移动开发, #隐私

5. DuckDuckGo search saw 28% more visits after Google said people love AI mode ⭐️ 8.0/10

谷歌强推 AI 搜索致用户反感，DDG 流量飙升 28%。

hackernews · HelloUsername · 5月27日 16:28 · 讨论

标签: #搜索引擎, #AI搜索, #用户反弹, #DuckDuckGo

6. Can LLMs Introspect? A Reality Check ⭐️ 8.0/10

论文质疑 LLM 内省能力，认为行为证据不足以证明，需区分内省与模式匹配。

rss · arXiv AI · 5月27日 04:00

标签: #大型语言模型, #内省, #元认知, #评估方法

7. Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory ⭐️ 8.0/10

提出智能体长期记忆应是状态轨迹驱动的数据管理工作负载，而非传统存储。

rss · arXiv AI · 5月27日 04:00

标签: #AI代理, #内存管理, #数据库, #数据管理, #智能体架构

8. Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems ⭐️ 8.0/10

提出代理寿命工程概念与 AgingBench 基准，揭示部署后代理退化机制。

rss · arXiv AI · 5月27日 04:00

标签: #AI代理, #系统可靠性, #基准测试, #长期部署

9. Experiments in Agentic AI for Science ⭐️ 8.0/10

两自主 AI 框架（DeepTS/DeepCollector 与 DeepScribe）利用混合本地-远程架构自动化科学数据策展与演讲报告生成。

rss · arXiv AI · 5月27日 04:00

标签: #科学自动化, #AI代理, #大语言模型, #工作流, #系统工程

10. Anchor: Mitigating Artifact Drift in Agent Benchmark Generation ⭐️ 8.0/10

形式化约束生成管道避免 agent 基准生成中的漂移。

rss · arXiv AI · 5月27日 04:00

标签: #AI Agent, #基准生成, #企业自动化, #任务生成

11. OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling ⭐️ 8.0/10

提出显式信念建模的 ToM 基准，揭示 LLM 推理机制。

rss · arXiv AI · 5月27日 04:00

标签: #心智理论, #基准测试, #大语言模型, #推理评估

12. JobBench: Aligning Agent Work With Human Will ⭐️ 8.0/10

新基准 JobBench 转向评估 AI agent 增强人类工作，当前模型表现有限。

rss · arXiv AI · 5月27日 04:00

标签: #AI Agent, #基准测试, #职业自动化, #人机协作

13. ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence ⭐️ 8.0/10

提出可验证性框架 Chain-of-Evidence 与端到端系统 ScientistOne，揭露现有自主研究输出中存在引用伪造等问题。

rss · arXiv AI · 5月27日 04:00

标签: #自主研究, #可验证性, #AI安全, #学术诚信, #大型语言模型

14. Automatic Layer Selection for Hallucination Detection ⭐️ 8.0/10

提出 FEPoID 标准自动选择 LLM 中间层以提升幻觉检测性能。

rss · arXiv AI · 5月27日 04:00

标签: #幻觉检测, #大型语言模型, #层选择, #内在维度

15. Exploiting Local Dynamics Regularity for Reusable Skills in Offline Hierarchical RL ⭐️ 8.0/10

利用局部动力学规律性，离线学习可复用的层次强化学习技能。

rss · arXiv AI · 5月27日 04:00

标签: #强化学习, #层次强化学习, #技能复用, #离线学习

16. Advancing Creative Physical Intelligence in Large Multimodal Models ⭐️ 8.0/10

新基准测试多模态模型在物理场景中创造性工具使用能力。

rss · arXiv AI · 5月27日 04:00

标签: #大型多模态模型, #创造力, #物理智能, #基准评测, #工具使用

17. From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator ⭐️ 8.0/10

提出校准交互式 RL 框架，理论分析与实证减轻多轮对话分布偏移。

rss · arXiv AI · 5月27日 04:00

标签: #多轮对话, #强化学习, #分布偏移, #大语言模型, #对话系统

18. Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning ⭐️ 8.0/10

提出基于法律相关性敏感评估与形式推理的 LexGuard 框架，以增强法律 AI 的稳定性与准确性。

rss · arXiv AI · 5月27日 04:00

标签: #法律AI, #大语言模型, #形式推理, #评估框架, #可信AI

19. Last.fm is now independent ⭐️ 7.0/10

Last.fm 宣布脱离 CBS 独立运营，社区反响积极，API 稳定性获认可。

hackernews · twistslider · 5月27日 15:36 · 讨论

标签: #音乐数据, #API, #独立运营, #社区怀旧

20. Tech CEOs are apparently suffering from AI psychosis ⭐️ 7.0/10

CEO 对 AI 的认知偏差类似过往技术炒作，实为管理通病。

hackernews · IAmGraydon · 5月27日 15:20 · 讨论

标签: #AI, #管理误区, #技术炒作, #HackerNews讨论

21. Gemini, Gophers, and Fingers. Oh My Alternative Internets Beyond HTTPS ⭐️ 7.0/10

探讨 Gemini、Gopher 等非主流互联网协议，反思当前网络架构。

hackernews · ChrisArchitect · 5月27日 17:24 · 讨论

标签: #互联网协议, #替代网络, #Gemini, #Gopher, #Finger

22. BrickAnything: Geometry-Conditioned Buildable Brick Generation with Structure-Aware Tokenization ⭐️ 7.0/10

从三维形状自动生成可物理搭建的砖块结构，引入结构感知树形 tokenization 与自回归生成。

rss · arXiv AI · 5月27日 04:00

标签: #3D生成, #几何建模, #自回归模型, #砖块搭建

23. Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions ⭐️ 7.0/10

提出 POLAR 框架，通过多模态知识图谱与情景记忆增强 MLLM 具身代理的长期个性化交互能力。

rss · arXiv AI · 5月27日 04:00

标签: #具身智能, #多模态大模型, #个性化代理, #记忆增强, #长期交互

24. Constraint acquisition needs better benchmarks ⭐️ 7.0/10

提出 MPMMine 基准套件，推动约束获取及数学规划模型验证研究的标准化。

rss · arXiv AI · 5月27日 04:00

标签: #约束获取, #数学规划, #基准测试, #机器学习, #优化

25. Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning ⭐️ 7.0/10

提出框架管理 LLM 生成程序知识的不确定性，辅助虚拟实验室规划。

rss · arXiv AI · 5月27日 04:00

标签: #大语言模型, #虚拟实验室, #程序生成, #不确定性管理, #教育技术

26. Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions ⭐️ 7.0/10

LLM 数学推理中，链式思考比代码执行更鲁棒。

rss · arXiv AI · 5月27日 04:00

标签: #大语言模型, #数学推理, #鲁棒性, #代码执行, #链式思考

27. PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design ⭐️ 7.0/10

推出多模态 AI 助手，融合基础模型与代理，用于高分子性质预测与逆向设计。

rss · arXiv AI · 5月27日 04:00

标签: #人工智能, #多模态, #高分子材料, #基础模型

28. anthropics/claude-code released v2.1.152 ⭐️ 6.0/10

Claude Code 更新至 v2.1.152，增强代码审查与技能灵活性。

github · ashwin-ant · 5月27日 01:30

标签: #Claude Code, #版本更新, #开发工具, #代码审查

29. SimCity 3k in 4k (2025) ⭐️ 6.0/10

重温模拟城市 3000 在 4K 下的体验，社区热评游戏设计演变。

hackernews · speckx · 5月27日 17:36 · 讨论

标签: #怀旧游戏, #模拟城市, #游戏设计, #4K

30. Facebook launches a ‘Plus’ subscription that gives you extra features ⭐️ 6.0/10

Meta 推出 Plus 付费订阅并测试 AI 订阅，扩展收入模式。

rss · The Verge · 5月27日 20:03

标签: #Meta, #付费订阅, #社交平台, #AI订阅

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力

Horizon AI 日报 - 2026-05-28

Horizon 每日速递 - 2026-05-27

1. The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence ⭐️ 10.0/10

2. YouTube to automatically label AI-generated videos ⭐️ 8.0/10

3. I think Anthropic and OpenAI have found product-market fit ⭐️ 8.0/10

4. What Apple and Google are doing to your push notifications ⭐️ 8.0/10

5. DuckDuckGo search saw 28% more visits after Google said people love AI mode ⭐️ 8.0/10

6. Can LLMs Introspect? A Reality Check ⭐️ 8.0/10

7. Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory ⭐️ 8.0/10

8. Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems ⭐️ 8.0/10

9. Experiments in Agentic AI for Science ⭐️ 8.0/10

10. Anchor: Mitigating Artifact Drift in Agent Benchmark Generation ⭐️ 8.0/10

11. OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling ⭐️ 8.0/10

12. JobBench: Aligning Agent Work With Human Will ⭐️ 8.0/10

13. ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence ⭐️ 8.0/10

14. Automatic Layer Selection for Hallucination Detection ⭐️ 8.0/10

15. Exploiting Local Dynamics Regularity for Reusable Skills in Offline Hierarchical RL ⭐️ 8.0/10

16. Advancing Creative Physical Intelligence in Large Multimodal Models ⭐️ 8.0/10

17. From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator ⭐️ 8.0/10

18. Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning ⭐️ 8.0/10

19. Last.fm is now independent ⭐️ 7.0/10

20. Tech CEOs are apparently suffering from AI psychosis ⭐️ 7.0/10

21. Gemini, Gophers, and Fingers. Oh My Alternative Internets Beyond HTTPS ⭐️ 7.0/10

22. BrickAnything: Geometry-Conditioned Buildable Brick Generation with Structure-Aware Tokenization ⭐️ 7.0/10

23. Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions ⭐️ 7.0/10

24. Constraint acquisition needs better benchmarks ⭐️ 7.0/10

25. Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning ⭐️ 7.0/10

26. Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions ⭐️ 7.0/10

27. PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design ⭐️ 7.0/10

28. anthropics/claude-code released v2.1.152 ⭐️ 6.0/10

29. SimCity 3k in 4k (2025) ⭐️ 6.0/10

30. Facebook launches a ‘Plus’ subscription that gives you extra features ⭐️ 6.0/10

讨论回复

推荐

智谱 GLM-5 已上线