Chaoda Zheng

01 · About

From understanding geometry
to simulating futures.

My research began with 3D perception: point-cloud tracking, segmentation, occupancy, and geometry-aware representations. Today, I focus on action-conditioned world models, causal video generation, VLA systems, and closed-loop simulation.

I was the first author and a core contributor to X-World. Alongside hands-on research, I define problems, design experiments, and mentor early-career researchers across both product and academic projects.

2,000+

Scholar Citations

2 Oral

CVPR / ICCV Papers

1 Spotlight

NeurIPS Paper

02 · Research Interests

What I work on

I

Generative World Models

Action-conditioned, multi-view video models for controllable, reproducible, long-horizon simulation.

Video Diffusion · Action Control · Causal Generation

II

VLA & Embodied Decision

Multimodal action decoding, latent future reasoning, and reinforcement-learning post-training.

VLA · Diffusion Policy · RL

III

3D Perception

Object-centric occupancy, point-cloud tracking, segmentation, and geometry-aware representation.

Occupancy · 3D Tracking · Point Clouds

IV

Scalable Model Systems

Efficient autoregressive inference, sequence parallelism, and high-throughput closed-loop evaluation.

Self-Forcing · KV Cache · Sequence Parallelism

03 · Featured Work

X-World

Arxiv 2026 · First Author

Controllable ego-centric multi-camera world models.

A seven-camera action-conditioned world model for stable 30+ second generation and VLA closed-loop evaluation. I led causalization, few-step distillation, long-horizon post-training, and the first simulation inference pipeline.

I 30+ sec stable controlled generation
II 50 → 4 denoising steps per block
III ~49× throughput gain over bidirectional baseline

Paper Project

NeurIPS 2024 · First Author

Object-Centric Occupancy Completion

Recovering detailed object geometry from sparse observations to augment flexible 3D detection.

Paper Code

Arxiv 2025 · Corresponding Author

FutureX

An end-to-end driving policy that reasons through possible futures in latent space before choosing an action.

Paper

Motion-centric paradigm for 3D single object tracking

CVPR 2022 Oral · TPAMI 2023 · First Author

Motion-Centric 3D Tracking

Reframing point-cloud tracking around motion rather than appearance matching for greater robustness.

Paper Project

ICCV 2023 Oral · Second Author

LATR

Transformer-based 3D lane detection from monocular images, bridging image features and lane geometry in 3D space.

Paper

04 · Selected Publications

Papers

2026
X-World: Controllable Ego-Centric Multi-Camera World Models Arxiv · First Author
↗
2025
FutureX: End-to-End Driving via Latent Chain-of-Thought World Model Arxiv · Corresponding Author
↗
2024
Object-Centric Occupancy Completion Augments 3D Object Detection NeurIPS · First Author
↗
2023
A Motion-Centric Paradigm for 3D Single Object Tracking TPAMI / CVPR Oral · First Author
↗
2023
LATR: 3D Lane Detection from Monocular Images with Transformer ICCV Oral · Second Author
↗
2022
2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds ECCV · Co-First Author
↗
2021
Box-aware Feature Enhancement for Single Object Tracking on Point Clouds ICCV · First Author
↗

View Complete Publication Record Google Scholar ↗

05 · Academic Community

Service

ECCV 2026

Area Chair

Top Venues

Reviewer

CVPR · ICCV · ECCV · NeurIPS

From understanding geometryto simulating futures.