Qinlong works on AI infrastructure at Bytedance Seed, focusing on robust and elastic distributed system for data processing, training and inference. At Bytedance, he contributes to the Robust LLM Training Infrastructure, especially in NCCL hang & straggler detection. He was also responsible for the data and RL training engineering in Seed3D. Before joining Bytedance, he initiated the DLRover, a open source project to stabilize the LLM training and was the core contributor of ElasticDL at AntGroup.
Open Source Softwares (Python, Golang, C++)
-
DLRover, An Automatic Distributed Deep Learning System, #1 contributor, 1251 commits
-
ElasticDL, A Kubernetes-native Deep Learning Framework, #1 contributor, 320 commits
Papers
- Handling Network Faults in Distributed AI Training: Failover is Now an Option, EUROSYS'26
- Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training, SOSP 2025
- Robust LLM Training Infrastructure at ByteDance, SOSP 2025
- DLRover-RM: An Automatic Resource Optimization System for Recommendation Model Training, VLDB 2024
Tech Reports
- Seed3D 2.0: Advancing High-Fidelity Simulation-Ready 3D Content Generation, 2026
- Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets, 2025
- Seed1. 5-thinking: Advancing superb reasoning models with reinforcement learning, 2025
- Seedance 1.0: Exploring the Boundaries of Video Generation Models, 2025