Human-aware navigation data and environments

HA-VLN

A public resource for vision-and-language navigation in dynamic scenes with humans, activities, social constraints, and grounded instructions.

arXiv:2503.14229 University of Washington + collaborators
We present several annotated instances of human subjects from the proposed HAPS 2.0 Dataset (overall and single), showcasing a variety of well-aligned motions, movements, and interactions.

Interactive demos

Explore human-aware navigation examples up front.

Simulator clips, motion cases, and trajectory visualizations show how humans shape the navigation task before the page moves into dataset details.

Explore the Simulator

Demo gallery

Watch the human dynamics that shape the navigation task.

Short clips show the variety of human movements, interaction cues, and navigation cases that HA-VLN makes available to researchers.

Scene diversity

Scene diversity across human-aware navigation cases

Dynamic people and activity references make navigation decisions depend on the evolving context around the agent.

Navigation visualization

Socially aware behavior is visible in the trajectory.

These examples pair instructions with agent trajectories in scenes where human motion changes what a reasonable route looks like.

Navigation Instruction: Start by moving forward in the lounge area, where an individual is engaged in a phone conversation while pacing back and forth. Navigate carefully to avoid crossing their path. As you proceed, you will pass by a television mounted on the wall. Continue your movement, observing people relaxing and watching the TV, some seated comfortably on sofas. Further along, notice a group of friends raising their glasses in a toast, enjoying cocktails together. Maintain a steady course, ensuring you do not disrupt their gathering. Finally, reach the end of your path where a potted plant is situated next to a door. Stop at this location, positioning yourself near the plant and door without obstructing access.
Navigation Instruction: Exit the room and make a left turn. Proceed down the hallway where an individual is ironing clothes, carefully smoothing out wrinkles on garments. Continue walking and make another left turn. Enter the next room, which is a bedroom. Inside, someone is comfortably seated in bed, engrossed in reading a book. Move past the bed, ensuring not to disturb the reader. Turn left again to enter the bathroom. Once inside, position yourself near the sink and wait there, observing the surroundings without interfering with any activities.

Overview

Human-aware VLN needs data where people matter.

Many VLN setups focus on static environments. HA-VLN shifts attention to scenes where people move, interact, occlude views, and shape what a safe navigation decision means.

Dynamic Humans

Scenes with moving people, not just static obstacles

Human activities, motion, partial observability, and local crowd behavior create navigation decisions that static VLN scenes usually miss.

Social Grounding

Instructions tied to human activities and spatial cues

Agents must interpret language that refers to people, activities, rooms, objects, and social context rather than only final goal locations.

Data + Environment

A practical public resource for human-aware VLN

HA-VLN releases human-populated navigation data, environment support, baseline references, and evaluation metrics for reproducible study.

Safety Signals

Metrics that make social navigation visible

Strict success, collision rate, trajectory collision rate, and navigation error help separate reaching goals from reaching them safely.

Data and environments

A human-populated resource for socially grounded navigation.

HA-VLN brings together navigation instructions, dynamic human motions, activity cues, and Habitat-based environments so agents can be tested beyond static shortest-path behavior.

Dataset instructions in docs
16,844socially grounded instructions
172human activity categories
486detailed 3D motion models
58,320human motion frames
90annotated scans
910single-human motion instances
HA-VLN navigation scenario with instruction cues, dynamic humans, and RGB-depth observations.
A navigation instruction can refer to human activities, spatial cues, and decision points. The agent must progress while respecting nearby people.

Environment support

Habitat-based scenes with annotated human activity.

The simulator support is a practical companion to the dataset: it places dynamic humans into photorealistic environments and exposes the state needed for human-aware navigation research.

HA-VLN simulator annotation and rendering pipeline.
Annotation and rendering pipeline for dynamic human activities in HA-VLN environments.
Overview of annotated HA-VLN scenes.
Overview of annotated scenarios with human-populated navigation scenes.
What to take away

HA-VLN is best understood as data and environment infrastructure for studying human-aware navigation. The Habitat-based simulator and baseline code are supporting resources, not the main novelty claim.

Resources

Everything you need to inspect, reproduce, and build on HA-VLN.

Team and citation

Built by researchers across embodied AI and navigation.

Yifei DongFengyi WuQi HeLingdong KongBohan XiongYetong ShaHeng LiMinghan LiZebang ChengYuxuan ZhouJingdong SunQi DaiAlexander G. HauptmannZhi-Qi Cheng
@misc{dong2026havln20openbenchmark,
      title={HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions}, 
      author={Yifei Dong and Fengyi Wu and Qi He and Lingdong Kong and Heng Li and Minghan Li and Zebang Cheng and Yuxuan Zhou and Jingdong Sun and Qi Dai and Alexander G Hauptmann and Zhi-Qi Cheng},
      year={2026},
      eprint={2503.14229},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2503.14229}, 
}