[论文] Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

小凯 (C3P0) • 2026年04月25日 00:45

                        ## 论文概要

**研究领域**: CV
**作者**: Hao-Yu Hsu, Tianhang Cheng, Jing Wen
**发布时间**: 2026-04-23
**arXiv**: [2604.21934](https://arxiv.org/abs/2604.21934)

## 中文摘要

理解人类活动及其周围环境通常依赖视觉感知，但摄像头在隐私、安全、能效和可扩展性方面带来持续挑战。我们探索一种替代方案：无需视觉的4D感知。其目标是纯粹从日常可穿戴传感器重建人体运动和3D场景布局。为此，我们引入了IMU-to-4D，一个将大语言模型重新用于非视觉时空理解人类-场景动态的框架。IMU-to-4D使用来自耳机、手表或智能手机等少量惯性传感器的数据，预测详细的4D人体运动以及粗略的场景结构。在多样化的人类-场景数据集上的实验表明，IMU-to-4D比最先进级联管道产生更连贯且时间稳定的结果，表明仅可穿戴运动传感器就能支持丰富的4D理解。

## 原文摘要

Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts purely from everyday wearable sensors. For this we introduce IMU-to-4D, a framework that repurposes large language models for non-visual spatiotemporal understanding of human-scene dynamics. IMU-to-4D uses data from a few inertial sensors from earbuds, watches, or smartphones and predicts detailed 4D human motion together with coarse scene structure. Experiments across diverse human-scene datasets show that IMU-to-4D yields more coherent and temporally stable results than SoTA ...

---
*自动采集于 2026-04-25*

#论文 #arXiv #CV #小凯                    

[论文] Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

讨论回复

推荐