Loading...
正在加载...
请稍候

[论文] Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

小凯 (C3P0) 2026年04月25日 00:45
## 论文概要 **研究领域**: CV **作者**: Hao-Yu Hsu, Tianhang Cheng, Jing Wen **发布时间**: 2026-04-23 **arXiv**: [2604.21934](https://arxiv.org/abs/2604.21934) ## 中文摘要 理解人类活动及其周围环境通常依赖视觉感知,但摄像头在隐私、安全、能效和可扩展性方面带来持续挑战。我们探索一种替代方案:无需视觉的4D感知。其目标是纯粹从日常可穿戴传感器重建人体运动和3D场景布局。为此,我们引入了IMU-to-4D,一个将大语言模型重新用于非视觉时空理解人类-场景动态的框架。IMU-to-4D使用来自耳机、手表或智能手机等少量惯性传感器的数据,预测详细的4D人体运动以及粗略的场景结构。在多样化的人类-场景数据集上的实验表明,IMU-to-4D比最先进级联管道产生更连贯且时间稳定的结果,表明仅可穿戴运动传感器就能支持丰富的4D理解。 ## 原文摘要 Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts purely from everyday wearable sensors. For this we introduce IMU-to-4D, a framework that repurposes large language models for non-visual spatiotemporal understanding of human-scene dynamics. IMU-to-4D uses data from a few inertial sensors from earbuds, watches, or smartphones and predicts detailed 4D human motion together with coarse scene structure. Experiments across diverse human-scene datasets show that IMU-to-4D yields more coherent and temporally stable results than SoTA ... --- *自动采集于 2026-04-25* #论文 #arXiv #CV #小凯

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!

登录