Academic
About Me#
I am a senior student in the School of Artificial Intelligence / Qian Xuesen Honors College at Xi’an Jiaotong University. I am an incoming Ph.D. student at Shanghai Jiao Tong University and will be joining the Shanghai Artificial Intelligence Laboratory under the supervision of Research Scientist Jiangmiao Pang.
My research philosophy is simple: I build things because it’s fun.
Currently, I am focusing on building a scalable VLA training framework alongside the community and making further progress in infrastructure. Prior to this, I participated in large-scale VLA technical reports, and my work in simulation has led to the creation of significant simulation datasets and benchmarks. At the same time, the pursuit of scaling laws and the profound connection between data and models is pushing me to go deeper and further. You can find more of my insights on the development of VLA and Embodiment AI on my blog.
Here is my academic CV, feel free to download it.
Download CV
Research Interests#
Robotics Manipulation
Robotic Manipulation, Grasping, Dexterous Control, Object Interaction
VLA
Vision-Language-Action Models, Multi-modal Learning, Embodied AI
Simulation Platform
Virtual Environments, Physics Simulation, Training Platforms
Selected Publications#
2025
InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy
InternData-A1 is a large-scale synthetic robotic dataset (630k trajectories, 7,433 hours across 4 embodiments and 70 tasks) generated through a fully autonomous and compositional simulation pipeline. Using the same architecture as π₀, we show—for the first time—that a VLA model trained entirely on synthetic data can match the strongest real-robot datasets, achieving comparable performance across 49 simulation tasks, 5 real-world tasks, and long-horizon dexterous manipulation. The model also demonstrates zero-shot sim-to-real transfer, highlighting the substantial value of scalable simulation for embodied AI. The dataset and generation pipeline are released to enable broader access to large-scale robotic data creation.
InternVLA-M1: Latent Spatial Grounding for Instruction-Following Robotic Manipulation
InternVLA-M1 is a unified framework for spatial grounding and robot control that advances instruction-following robots toward general-purpose intelligence. Its core idea is spatially guided vision-language-action training, where spatial grounding serves as the critical link between instructions and robot actions. InternVLA-M1 employs a two-stage pipeline: (i) spatial grounding pre-training on over 2.3M spatial reasoning data to determine “where to act” by aligning instructions with visual, embodiment-agnostic positions, and (ii) spatially guided action post-training to decide “how to act” by generating embodiment-aware actions through plug-and-play spatial prompting.
GenManip: A Simulation Platform for Generalizable TableTop Manipulation in the Era of MLLM
Embodied manipulation benchmark based on Isaac Sim with automatic demonstration/layout generation and closed-loop test features. Served as a core developer and provided support for subsequent research at SHAILAB. To be released later.
2024
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation
Proposed a semi-supervised learning framework for medical image segmentation using Mean Teachers to enhance model diversity and regularization. Achieved state-of-the-art results and demonstrated generalization across datasets.
Open-source Projects#
StarVLA
VLA ModelStarVLA is a modular and flexible codebase for developing Vision-Language Models (VLMs) into Vision-Language-Action (VLA) models. Each component (model, data, trainer, configuration, evaluation) is designed for high cohesion and low coupling, enabling plug-and-play research and fast iteration.
InternData-M1
Robotics DatasetInternData-M1 is a comprehensive embodied robotics dataset containing ~250,000 simulation demonstrations with rich frame-based information including 2D/3D boxes, trajectories, grasp points, and semantic masks, with comprehensive annotations.
InternData-A1
Robotics DatasetInternData-A1 is a hybrid synthetic-real manipulation dataset containing over 630k trajectories and 7,433 hours across 4 embodiments, 18 skills, 70 tasks, and 227 scenes, covering rigid, articulated, deformable, and fluid-object manipulation.