Portrait
Letian Ruan
4th-year Undergrad
University of Michigan & Shanghai Jiao Tong University
About Me

I'm a senior student in the CSE Department at University of Michigan and Global College at Shanghai Jiao Tong University pursuing dual Bachelor's degrees.

I'm also doing research at Catalyst Group in Carnegie Mellon University, advised by Zhihao Jia, and SymbioticLab in University of Michigan, advised by Mosharaf Chowdhury. Previously, I was fortunate to work with Shixuan Sun at EPCC Lab in Shanghai Jiao Tong University.

My research interest lies in Machine Learning and Systems, including serving system for LLM/Robotics/Multimodality, post-training system, deep learning compiler and agentic workflow.

Education
  • Catalyst Group, CMU
    Catalyst Group, CMU
    Visiting Student Researcher, Megakernel Compilor and Agentic Serving Systems
    Apr. 2026 - present
  • University of Michigan, Ann Arbor
    University of Michigan, Ann Arbor
    B.S.E. in Computer Science
    Aug. 2025 - present
  • Shanghai Jiao Tong University
    Shanghai Jiao Tong University
    B.S. in Electrical and Computer Engineering (dual degree)
    Sep. 2022 - present
Experience
  • MiniMax
    MiniMax
    System Software Intern, RL Infra
    Dec. 2025 - Feb. 2026
  • SymbioticLab, U-M
    SymbioticLab, U-M
    Research Intern, Serving Systems for Robotics/Multimodal
    Aug. 2025 - present
  • EPCC Lab, SJTU
    EPCC Lab, SJTU
    Research Intern, Multi-LoRA Serving and Serverless Graph Computing
    Oct. 2024 - Nov. 2025
News
2026
Glad to share our recent work in the SGLang-RL team to optimize refitting for large-scale RL training. Check out our blog on LMSYS.Org.
Apr 29
I'm attending ASPLOS 2026 in Pittsburgh. Feel free to reach out!
Mar 01
Excited to announce the release of Forge, a scalable Agent RL framework powering the M2~M3 series models.
Feb 01
Our paper on IDP is accepted by EuroSys 2026. Congratulations!
Jan 15
2025
Our work FaaSBoard is accepted by SIGMOD 2026. Check out the paper!
Dec 15
Selected Publications (view all )
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models

Hongyu Chen, Letian Ruan, Zilin Xu, Yuchen Li, Xinyu Chen, Jingwen Leng, Bingsheng He, Minyi Guo, Shixuan Sun

ArXiv Preprint

InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models

Hongyu Chen, Letian Ruan, Zilin Xu, Yuchen Li, Xinyu Chen, Jingwen Leng, Bingsheng He, Minyi Guo, Shixuan Sun

ArXiv Preprint

FaaSBoard: Efficient Graph Processing with a Disaggregated Architecture on Serverless Services
FaaSBoard: Efficient Graph Processing with a Disaggregated Architecture on Serverless Services

Yushi Liu*, Yikang Ruan*, Letian Ruan, Zijun Li, Sen Gao, Weihao Cui, Shixuan Sun, Quan Chen, Shuo Quan, Jie Wu, Bingsheng He, Minyi Guo (* equal contribution)

SIGMOD 2026

FaaSBoard: Efficient Graph Processing with a Disaggregated Architecture on Serverless Services

Yushi Liu*, Yikang Ruan*, Letian Ruan, Zijun Li, Sen Gao, Weihao Cui, Shixuan Sun, Quan Chen, Shuo Quan, Jie Wu, Bingsheng He, Minyi Guo (* equal contribution)

SIGMOD 2026

Bridging the GPU Utilization Gap: Predictive Multi-Dimensional Resource Scheduling for AI Workloads
Bridging the GPU Utilization Gap: Predictive Multi-Dimensional Resource Scheduling for AI Workloads

Yilei Lu, Dongbiao He, Teng Ma, Zhe Liu, Letian Ruan, Jinlei Jiang, Yongwei Wu

EuroSys 2026

Bridging the GPU Utilization Gap: Predictive Multi-Dimensional Resource Scheduling for AI Workloads

Yilei Lu, Dongbiao He, Teng Ma, Zhe Liu, Letian Ruan, Jinlei Jiang, Yongwei Wu

EuroSys 2026

All publications
Selected Blogs
Updating 1T parameters in seconds — P2P weight transfer in Large-Scale Distributed RL
Updating 1T parameters in seconds — P2P weight transfer in Large-Scale Distributed RL

Jiadong Guo, Xin Ji, Letian Ruan, Teng Ma, Chenyang Zhao, Yueming Yuan, Zhichen Zeng

SGLang-RL Team, LMSYS.Org, April 2026

Updating 1T parameters in seconds — P2P weight transfer in Large-Scale Distributed RL

Jiadong Guo, Xin Ji, Letian Ruan, Teng Ma, Chenyang Zhao, Yueming Yuan, Zhichen Zeng

SGLang-RL Team, LMSYS.Org, April 2026

Forge: Scalable Agent RL Framework and Algorithm
Forge: Scalable Agent RL Framework and Algorithm

MiniMax Team

MiniMax M2.5 Tech Report, Feb. 2026

Forge: Scalable Agent RL Framework and Algorithm

MiniMax Team

MiniMax M2.5 Tech Report, Feb. 2026

Open Source Projects
Mirage Persistent Kernel
Mirage Persistent Kernel 2.3k 2 x OSDI

Mirage Persistent Kernel (MPK) is a compiler and runtime system that automatically transforms LLM inference into a single megakernel.
Working on: Speculative Decoding, Runtime Optimization.

Mirage Persistent Kernel 2.3k 2 x OSDI

Mirage Persistent Kernel (MPK) is a compiler and runtime system that automatically transforms LLM inference into a single megakernel.
Working on: Speculative Decoding, Runtime Optimization.

SGLang
SGLang 27.9k 6k

SGLang is a high-performance serving framework for large language models and multimodal models.
Working on: Peer-to-peer communicsation in RL.

SGLang 27.9k 6k

SGLang is a high-performance serving framework for large language models and multimodal models.
Working on: Peer-to-peer communicsation in RL.

Mooncake
Mooncake 5.3k FAST 2025 Best Paper

Mooncake is a KVCache-centric disaggregated architecture separating the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated KVCache pool.
Working on: KVCache Tranfer Pipeline.

Mooncake 5.3k FAST 2025 Best Paper

Mooncake is a KVCache-centric disaggregated architecture separating the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated KVCache pool.
Working on: KVCache Tranfer Pipeline.