Multimodal Learning
Connecting visual, textual, and spatial signals so models can answer richer questions about scenes.
Peking University / Wangxuan Institute of Computer Technology
PhD student at Peking University working on multimodal learning, computer vision, and embodied AI.
I joined the Wangxuan Institute of Computer Technology (WICT), PKU in September 2022 for my PhD, supervised by Prof. Yang Liu. My work explores how vision, language, and 3D scene understanding can support stronger reasoning in embodied environments.
Research
Connecting visual, textual, and spatial signals so models can answer richer questions about scenes.
Studying 2D and 3D scene understanding with an emphasis on practical visual question answering.
Designing evaluation and pre-training resources that help agents reason in grounded environments.
Publications
Wentao Mo, Yang Liu
Wentao Mo, Qingchao Chen, Yuxin Peng, Siyuan Huang, Yang Liu
Wentao Mo, Yang Liu
Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu
Contact