Xinyuan Wang

Hi! I am Xinyuan Wang (王心远).

me.png

I am a Ph.D. student at HKU, mentored by Prof. Tao Yu. I obtained my master’s degree from University of California, San Diego (UCSD). I was luckily to be mentored two distinguished professors at UCSD in Natural Language Processing and Computer Vision - Prof. Zhiting Hu and Prof. Zhuowen Tu. Prior to my study at UCSD, I graduated from Central South University (CSU) in Hunan, China, where I was mentored by Prof. Ying Zhao.

Research Interests

  • Agent Foundation Model: Designing and developing LLM/VLM based agent foundation model capable of interpreting and executing actions across real-world, digital, and simulated environments (OpenCUA, Kimi-VL).
  • Language Model Reasoning: Improving the planning, reasoning, decision-making capability of VLM/LLMs . (LLM Reasoners)
  • Foundation Model Prompting: Employing interpretable prompting to bridge the domain gap between user objectives and the outputs of foundation models. Effectively boosting the performance of foundation models on complex tasks through efficient and effective prompting. (PromptAgent)

Research Overview

I am now working on agentic foundation models, expecially computer-use agent models, including OpenCUA and Kimi-VL. At UCSD, I worked on automatic LLM prompt optimization (PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization) and LLM Reasoning (LLM Reasoners). I also worked in Prof. Zhuowen Tu’s group, exploring how to improve diffusion models’ conceptual performance with an end-to-end loss. During my undergraduate years, I was mentored by Prof. Ying Zhao and worked on Interpretation of Convolutional Neural Networks and Visualization. Here is my graduate thesis: The Research on The Interpretability Method of DeepNeural Network Based on Average Image

How to contact me

Email: xywang626@gmail.com

News

Aug 13, 2025 OpenCUA: Open Foundations for Computer-Use Agents is publish on Arxiv! It is the first open-source foundation for computer-use agents, including infrastructure, dataset, training recipe, model and benchmark.
Apr 15, 2025 Kimi-VL Technical Report is publish on Arxiv! I worked on its computer-use capability as a core contributor.
Apr 8, 2024 LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models is accepted by ICLR2024 workshop!
Jan 16, 2024 PromptAgent is accepted by ICLR 2024 (The Twelfth International Conference on Learning Representations)!
Nov 17, 2023 PromptAgent’s poster is presented at SoCal NLP 2023 at UCLA, Los Angeles, CA!

Selected Publications

  1. opencua_main_fig.png
    Opencua: Open foundations for computer-use agents
    Xinyuan Wang, Bowen Wang, Dunjie Lu, Junlin Yang, Tianbao Xie, and 6 more authors
    arXiv preprint arXiv:2508.09123, 2025
  2. jedi.png
    Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
    Tianbao Xie, Jiaqi Deng, Xiaochuan Li, Junlin Yang, Haoyuan Wu, and 6 more authors
    arXiv preprint arXiv:2505.13227, 2025
  3. kimivl.png
    Kimi-vl technical report
    Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, and 6 more authors
    arXiv preprint arXiv:2504.07491, 2025