Homepage - Letian Ruan

Letian Ruan

4th-year Undergrad
University of Michigan & Shanghai Jiao Tong University

About Me

I'm a senior student in the CSE Department at University of Michigan and Global College at Shanghai Jiao Tong University pursuing dual Bachelor's degrees.

I'm also doing research at Catalyst Group in Carnegie Mellon University, advised by Zhihao Jia, and SymbioticLab in University of Michigan, advised by Mosharaf Chowdhury. Previously, I was fortunate to work with Shixuan Sun at EPCC Lab in Shanghai Jiao Tong University.

My research interest lies in Machine Learning and Systems, including serving system for LLM/Robotics/Multimodality, post-training system, deep learning compiler and agentic workflow.

Education

Catalyst Group, CMU

Visiting Student Researcher, Megakernel Compilor and Agentic Serving Systems

Apr. 2026 - present
University of Michigan, Ann Arbor

B.S.E. in Computer Science

Aug. 2025 - present
Shanghai Jiao Tong University

B.S. in Electrical and Computer Engineering (dual degree)

Sep. 2022 - present

Experience

MiniMax

System Software Intern, RL Infra

Dec. 2025 - Feb. 2026
SymbioticLab, U-M

Research Intern, Serving Systems for Robotics/Multimodal

Aug. 2025 - present
EPCC Lab, SJTU

Research Intern, Multi-LoRA Serving and Serverless Graph Computing

Oct. 2024 - Nov. 2025

News

2026

Glad to share our recent work in the SGLang-RL team to optimize refitting for large-scale RL training. Check out our blog on LMSYS.Org.

Apr 29

I'm attending ASPLOS 2026 in Pittsburgh. Feel free to reach out!

Mar 01

Excited to announce the release of Forge, a scalable Agent RL framework powering the M2~M3 series models.

Feb 01

Our paper on IDP is accepted by EuroSys 2026. Congratulations!

Jan 15

2025

Our work FaaSBoard is accepted by SIGMOD 2026. Check out the paper!

Dec 15

Selected Publications (view all )

InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models

Hongyu Chen, Letian Ruan, Zilin Xu, Yuchen Li, Xinyu Chen, Jingwen Leng, Bingsheng He, Minyi Guo, Shixuan Sun

ArXiv Preprint

[Preprint] [Github]

InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models

Hongyu Chen, Letian Ruan, Zilin Xu, Yuchen Li, Xinyu Chen, Jingwen Leng, Bingsheng He, Minyi Guo, Shixuan Sun

ArXiv Preprint

[Preprint] [Github]

FaaSBoard: Efficient Graph Processing with a Disaggregated Architecture on Serverless Services

Yushi Liu*, Yikang Ruan*, Letian Ruan, Zijun Li, Sen Gao, Weihao Cui, Shixuan Sun, Quan Chen, Shuo Quan, Jie Wu, Bingsheng He, Minyi Guo (* equal contribution)

SIGMOD 2026

[Paper] [Github]

FaaSBoard: Efficient Graph Processing with a Disaggregated Architecture on Serverless Services

Yushi Liu*, Yikang Ruan*, Letian Ruan, Zijun Li, Sen Gao, Weihao Cui, Shixuan Sun, Quan Chen, Shuo Quan, Jie Wu, Bingsheng He, Minyi Guo (* equal contribution)

SIGMOD 2026

[Paper] [Github]

Bridging the GPU Utilization Gap: Predictive Multi-Dimensional Resource Scheduling for AI Workloads

Yilei Lu, Dongbiao He, Teng Ma, Zhe Liu, Letian Ruan, Jinlei Jiang, Yongwei Wu

EuroSys 2026

[Paper] [Github]

Bridging the GPU Utilization Gap: Predictive Multi-Dimensional Resource Scheduling for AI Workloads

Yilei Lu, Dongbiao He, Teng Ma, Zhe Liu, Letian Ruan, Jinlei Jiang, Yongwei Wu

EuroSys 2026

[Paper] [Github]

All publications

Selected Blogs

Updating 1T parameters in seconds — P2P weight transfer in Large-Scale Distributed RL

Jiadong Guo, Xin Ji, Letian Ruan, Teng Ma, Chenyang Zhao, Yueming Yuan, Zhichen Zeng

SGLang-RL Team, LMSYS.Org, April 2026

[Post]

Updating 1T parameters in seconds — P2P weight transfer in Large-Scale Distributed RL

Jiadong Guo, Xin Ji, Letian Ruan, Teng Ma, Chenyang Zhao, Yueming Yuan, Zhichen Zeng

SGLang-RL Team, LMSYS.Org, April 2026

[Post]

Forge: Scalable Agent RL Framework and Algorithm

MiniMax Team

MiniMax M2.5 Tech Report, Feb. 2026

[Tech Report]

Forge: Scalable Agent RL Framework and Algorithm

MiniMax Team

MiniMax M2.5 Tech Report, Feb. 2026

[Tech Report]

Open Source Projects

Mirage Persistent Kernel 2.3k 2 x OSDI

Mirage Persistent Kernel (MPK) is a compiler and runtime system that automatically transforms LLM inference into a single megakernel.
Working on: Speculative Decoding, Runtime Optimization.

[GitHub] [Doc] [Blog] [Paper]

Mirage Persistent Kernel 2.3k 2 x OSDI

Mirage Persistent Kernel (MPK) is a compiler and runtime system that automatically transforms LLM inference into a single megakernel.
Working on: Speculative Decoding, Runtime Optimization.

[GitHub] [Doc] [Blog] [Paper]

SGLang 27.9k 6k

SGLang is a high-performance serving framework for large language models and multimodal models.
Working on: Peer-to-peer communicsation in RL.

[GitHub] [Doc] [Blog]

SGLang 27.9k 6k

SGLang is a high-performance serving framework for large language models and multimodal models.
Working on: Peer-to-peer communicsation in RL.

[GitHub] [Doc] [Blog]

Mooncake 5.3k FAST 2025 Best Paper

Mooncake is a KVCache-centric disaggregated architecture separating the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated KVCache pool.
Working on: KVCache Tranfer Pipeline.

[GitHub] [Doc] [Paper]

Mooncake 5.3k FAST 2025 Best Paper

[GitHub] [Doc] [Paper]