The Large Model Systems Organization develops large models and systems that are open, accessible, and scalable.
Latest Blog
See all posts
HiSparse: Turbocharging Sparse Attention with Hierarchical Memory
Self-attention has become a major bottleneck in scaling LLMs to long contexts because of its quadratic compute and memory/IO cost. This has driven growing interest in efficient attention mechanisms. A...

Highlights of SGLang at NVIDIA GTC 2026
SGLang came to NVIDIA GTC 2026 with panels, a happy hour, a 200-person meetup, and a hands-on training lab. Three days, five events, one packed week at the center of the LLM ecosystem and left with a ...

Elastic EP in SGLang: Achieving Partial Failure Tolerance for DeepSeek MoE Deployments
To serve massive Mixture-of-Experts (MoE) models efficiently, deploying a "wide" Expert Parallelism (EP) strategy—often spanning 32 GPUs or more per inference instance—is not just an option; it is a n...
Projects
View all projectsOur Sponsors & Partners
Backed by leading companies and institutions advancing AI research.
Voltage Park, NVIDIA, Nebius, Google Cloud, AtlasCloud, a16z, AMD, InnoMatrix, Laude Institute, Hyperbolic, NovitaAI, Verda Cloud, Sky9, Kaggle, MBZUAI, Together, RunPod, Anyscale, HuggingFace




