Peking University / Wangxuan Institute of Computer Technology

Wentao Mo

PhD student at Peking University working on multimodal learning, computer vision, and embodied AI.

I joined the Wangxuan Institute of Computer Technology (WICT), PKU in September 2022 for my PhD, supervised by Prof. Yang Liu. My work explores how vision, language, and 3D scene understanding can support stronger reasoning in embodied environments.

Research

Building blocks for visual reasoning

Multimodal Learning

Connecting visual, textual, and spatial signals so models can answer richer questions about scenes.

Computer Vision

Studying 2D and 3D scene understanding with an emphasis on practical visual question answering.

Embodied AI

Designing evaluation and pre-training resources that help agents reason in grounded environments.

Publications

Selected work

ACM Multimedia 2025

Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset

Wentao Mo, Qingchao Chen, Yuxin Peng, Siyuan Huang, Yang Liu

AAAI 2024

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Wentao Mo, Yang Liu

IJCAI 2024

3D Vision and Language Pretraining with Large-Scale Synthetic Data

Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu

Contact

Open to research conversations and collaboration.