Researcher · Engineer

Chaoda Zheng

I build generative models that imagine how the physical world may unfold, then use those imagined futures to help intelligent agents reason and act.

01 · About

From understanding geometry
to simulating futures.

My research began with 3D perception: point-cloud tracking, segmentation, occupancy, and geometry-aware representations. Today, I focus on action-conditioned world models, causal video generation, VLA systems, and closed-loop simulation.

I was the first author and a core contributor to X-World. Alongside hands-on research, I define problems, design experiments, and mentor early-career researchers across both product and academic projects.

2,000+

Scholar Citations

2 Oral

CVPR / ICCV Papers

1 Spotlight

NeurIPS Paper
02 · Research Interests

What I work on

I

Generative World Models

Action-conditioned, multi-view video models for controllable, reproducible, long-horizon simulation.

Video Diffusion · Action Control · Causal Generation
II

VLA & Embodied Decision

Multimodal action decoding, latent future reasoning, and reinforcement-learning post-training.

VLA · Diffusion Policy · RL
III

3D Perception

Object-centric occupancy, point-cloud tracking, segmentation, and geometry-aware representation.

Occupancy · 3D Tracking · Point Clouds
IV

Scalable Model Systems

Efficient autoregressive inference, sequence parallelism, and high-throughput closed-loop evaluation.

Self-Forcing · KV Cache · Sequence Parallelism
03 · Featured Work

X-World

Object-centric occupancy completion visualizations
NeurIPS 2024 · First Author

Object-Centric Occupancy Completion

Recovering detailed object geometry from sparse observations to augment flexible 3D detection.

FutureX latent chain-of-thought world model overview
Arxiv 2025 · Corresponding Author

FutureX

An end-to-end driving policy that reasons through possible futures in latent space before choosing an action.

Motion-centric paradigm for 3D single object tracking
CVPR 2022 Oral · TPAMI 2023 · First Author

Motion-Centric 3D Tracking

Reframing point-cloud tracking around motion rather than appearance matching for greater robustness.

Figure 1 from the LATR paper
ICCV 2023 Oral · Second Author

LATR

Transformer-based 3D lane detection from monocular images, bridging image features and lane geometry in 3D space.

05 · Academic Community

Service