Peking University / Wangxuan Institute of Computer Technology

Wentao Mo

PhD student at Peking University working on multimodal learning, computer vision, and embodied AI.

I joined the Wangxuan Institute of Computer Technology (WICT), PKU in September 2022 for my PhD, supervised by Prof. Yang Liu. My work explores how vision, language, and 3D scene understanding can support stronger reasoning in embodied environments.

View publications Email GitHub X

Multimodal Learning

Connecting visual, textual, and spatial signals so models can answer richer questions about scenes.

Computer Vision

Studying 2D and 3D scene understanding with an emphasis on practical visual question answering.

Embodied AI

Designing evaluation and pre-training resources that help agents reason in grounded environments.

ICML 2026

Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs

Wentao Mo, Yang Liu

PDF soon Code soon arXiv soon Project soon

ACM Multimedia 2025

Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset

Wentao Mo, Qingchao Chen, Yuxin Peng, Siyuan Huang, Yang Liu

PDF Code Project

AAAI 2024

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Wentao Mo, Yang Liu

PDF Code Project

IJCAI 2024

3D Vision and Language Pretraining with Large-Scale Synthetic Data

Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu

PDF Code

Contact

Open to research conversations and collaboration.

mowt@pku.edu.cn github.com/matthewdm0816 x.com/Kagurazaka_L