Puhao Li   |   李浦豪

I am currently a Ph.D. student in Dept. of Automation, Tsinghua University advised by Prof. Song-Chun Zhu. I am also a research intern in General Vision Lab at Beijing Institute for General Artificial Intelligence (BIGAI), and I am grateful to be advised by Dr. Tengyu Liu and Dr. Siyuan Huang. Previously, I obtained my B.Eng. degree from Tsinghua University in 2023.

My research interests lie in the intersection of robotics manipulation and 3D computer vision. My long-term goal is to develop embodied intelligent systems capable of interpreting human intent and naturally interacting with people in various environments, learning reusable and endless low-level skill sets and high-level common sense. Currently, I am working on 3D scene understanding and robotic manipulation learning, pushing the boundaries of how robots operate within complex settings.

Email  /  CV  /  Google Scholar  /  Github  /  Twitter

profile photo
Research
ControlVLA: Few-shot Object-centric Adaptation for Pre-trained VLA models
Puhao Li, Yingying Wu, Ziheng Xi, Wanlin Li, Yuzhe Huang, Zhiyuan Zhang, Yinghan Chen, Jianan Wang, Song-Chun Zhu, Tengyu Liu, Siyuan Huang
arXiv 2025
[Paper] [Code] [Project Page]
We introduce ControlVLA, a few-shot object-centric adaptation method for pre-trained VLA. By reducing demonstrations requirements, ControlVLA lowers barriers to deploying robots in diverse scenarios.
Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation
Yuyang Li, Wenxin Du*, Chang Yu*, Puhao Li, Zihang Zhao, Tengyu Liu, Chenfanfu Jiang, Yixin Zhu, Siyuan Huang
arXiv 2025
[Paper] [Code] [Docs]
We develop Taccel, a high-performance GPU-based simulator, combining ABD and IPC, for simulating robots with vision-based tactile sensors.
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
Kailin Li, Puhao Li, Tengyu Liu, Yuyang Li, Siyuan Huang
CVPR 2025
[Paper] [Code] [Data] [Project Page]
We introduce ManipTrans, a novel method for efficiently transferring human skills to dexterous robotic hands in simulation. Leveraging ManipTrans, we contribute DexManipNet, a large-scale dexterous manipulation dataset with diverse tasks.
MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans
Huangyue Yu*, Baoxiong Jia*, Yixin Chen*, Yandan Yang, Puhao Li, Rongpeng Su, Jiaxin Li, Qing Li, Wei Liang, Song-Chun Zhu, Tengyu Liu, Siyuan Huang
CVPR 2025
[Paper] [Code] [Data] [Project Page]
We present MetaScenes, a large-scale 3D scene dataset constructed from real-world scans. It features 706 scenes with 15,366 objects across a wide range of types, with realistic layouts, visually accurate appearances and physical plausibility.
PhysPart: Physically Plausible Part Completion for Interactable Objects Rundong Luo*, Haoran Geng*, Congyue Deng, Puhao Li, Zan Wang, Baoxiong Jia, Leonidas Guibas, Siyuan Huang
ICRA 2025
[Paper] [Project Page]
We propose a diffusion-based part generation model that utilizes geometric conditioning through classifier-free guidance and formulates physical constraints as a set of stability and mobility losses to guide the sampling process.
PhyRecon: Physically Plausible Neural Scene Reconstruction
Junfeng Ni*, Yixin Chen*, Bohan Jing, Nan Jiang, Bing Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, Siyuan Huang
NeurlPS 2024
[Paper] [Code] [Project Page]
We introduce PhyRecon, which enables physically plausible 3D scene reconstruction. PhyRecon features a joint optimization framwork incorporating both differentiable rendering and physics-based objectives.
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
Puhao Li*, Tengyu Liu*, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang
IROS 2024 (Oral Pitch)
[Paper] [Code] [Project Page]
We introduce Ag2Manip, which enables various robotic manipulation tasks without any domain-specific demonstrations. Ag2Manip also supports robust imitation learning of manipulation skills in the real world.
Grasp Multiple Objects with One Hand
Yuyang Li, Bo Liu, Yiran Geng, Puhao Li, Yaodong Yang, Yixin Zhu, Tengyu Liu, Siyuan Huang
RA-L, presented at IROS 2024 (Oral Presentation)
[Paper] [Code] [Data] [Project Page]
We introduce MultiGrasp, a two-stage framework for simultaneous multi-object grasping with multi-finger dexterous hands. In addition, we contribute Grasp'Em, a large-scale synthetic multi-object grasping dataset.
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
Zan Wang, Yixin Chen, Baoxiong Jia, Puhao Li, Jinlu Zhang, Jingze Zhang, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang
CVPR 2024 (Highlight)
[Paper] [Code] [Project Page]
We introduce a novel two-stage framework that employs scene affordance as an intermediate representation, effectively linking 3D scene grounding and conditional motion generation.
An Embodied Generalist Agent in 3D World
Jiangyong Huang*, Silong Yong*, Xiaojian Ma*, Xiongkun Linghu*, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang
ICML 2024
ICLR 2024 @ LLMAgents Workshop
[Paper] [Code] [Data] [Project Page]
We introduce LEO, an embodied multi-modal and multi-task generalist agent that excels in perceiving, grounding, reasoning, planning, and acting in 3D world.
Diffusion-based Generation, Optimization, and Planning in 3D Scenes
Siyuan Huang*, Zan Wang*, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, Song-Chun Zhu
CVPR 2023
[Paper] [Code] [Project Page] [Hugging Face]
We introduce SceneDiffuser, a unified conditional generative model for 3D scene understanding. In contrast to prior work, SceneDiffuser is intrinsically scene-aware, physics-based, and goal-oriented.
GenDexGrasp: Generalizable Dexterous Grasping
Puhao Li*, Tengyu Liu*, Yuyang Li, Yiran Geng, Yixin Zhu, Yaodong Yang, Siyuan Huang
ICRA 2023
[Paper] [Code] [Data] [Project Page]
We introduce GenDexGrasp, a versatile dexterous grasping method that can generalize to out-of-domain robotic hands. In addition, we contribute MultiDex, a large-scale synthetic dexterous grasping dataset.
DexGraspNet: A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation
Ruicheng Wang*, Jialiang Zhang*, Jiayi Chen, Yinzhen Xu, Puhao Li, Tengyu Liu, He Wang
ICRA 2023 (Oral Presentation, Outstanding Manipulation Paper Finalist)
[Paper] [Code] [Data] [Project Page]
We introduce a large-scale dexterous grasping dataset DexGraspNet, which based on simulation. DexGraspNet features more physical stability and higher diversity than previous grasping datasets.
Experience
Tsinghua University, China
2023.09 - present

Ph.D. Student
Advisor: Prof. Song-Chun Zhu
Beijing Institute for General Artificial Intelligence (BIGAI), China
2021.09 - present

Research Intern
Advisor: Dr. Tengyu Liu and Dr. Siyuan Huang
Tsinghua University, China
2019.08 - 2023.06

Undergraduate Student

Fell free to contact me if you have any problem. Thanks for your visiting by 😊
This page is designed based on Jon Barron's website and deployed on Github Pages.