BLOG

Latest updates and releases by LMSYS Org are announced through our blogpost series.

Unified FP8: Moving Beyond Mixed Precision for Stable and Accelerated MoE RL

by: InfiXAI Team, Ant Group AQ Team, SGLang RL Team, miles Team, slime Team, November 24, 2025

TL;DR: We have implemented fully FP8-based sampling and training in RL. Experiments show that for MoE models, the larger the model, the more severe the train–inference discrepancy becomes when using BF16 training with FP8 rollout. In contrast, using unified FP8 for both training and rollout effecti...

LMSYS Fellowship Program

by: LMSYS Board, November 23, 2025

We're proud to launch the LMSYS Fellowship Program! This year, the program will provide funding to full-time PhD students in the United States who have made significant contributions to the open-source AI infrastructure community, especially to our flagship projects. Fellowship recipients will recei...

Introducing Miles — RL Framework To Fire Up Large-Scale MoE Training

by: RadixArk Team, November 19, 2025

A journey of a thousand miles is made one small step at a time. Today, we are releasing Miles, an enterprise-grade reinforcement learning framework tailored for large-scale MoE training and production workloads. Miles is built on top of slime, the lightweight RL framework that has quietly powered ...

🚀 AutoRound Meets SGLang: Enabling Quantized Model Inference with AutoRound

by: By Intel Neural Compressor Team, November 14, 2025

Overview We are thrilled to announce an official collaboration between SGLang and AutoRound, enabling low-bit quantization for efficient LLM inference. Through this integration, developers can now quantize large models with AutoRound’s signed-gradient optimization and directly deploy them in SGLang’...

SGLang Diffusion: Accelerating Video and Image Generation

by: The SGLang Diffusion Team, November 7, 2025

We are excited to introduce SGLang Diffusion, which brings SGLang's state-of-the-art performance to accelerate image and video generation for diffusion models. SGLang Diffusion supports major open-source video and image generation models (Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux) while provid...

"No Free Lunch": Deconstruct Efficient Attention with MiniMax M2

by: MiniMax LLM Team together with Xinyuan Tong, Kangyan Zhou, Mingyi Lu, and Chenyang Zhao, November 4, 2025

We are excited to announce day-one support for the new flagship model, MiniMax M2, on SGLang. The MiniMax M2 redefines efficiency for agents: it is a compact, fast, and cost-effective Mixture of Experts (MoE) model (230 billion total parameters, 10 billion active) built for elite performance in codi...

Optimizing GPT-OSS on NVIDIA DGX Spark: Getting the Most Out of Your Spark

by: Jerry Zhou, November 3, 2025

We’ve got some exciting updates about the NVIDIA DGX Spark! In the week following the official launch, we collaborated closely with NVIDIA and successfully brought GPT-OSS 20B and GPT-OSS 120B support to SGLang on the DGX Spark. The results are impressive: around 70 tokens/s on GPT-OSS 20B and 50 to...

SGLang-Jax: An Open-Source Solution for Native TPU Inference

by: The SGLang-Jax Team, October 29, 2025

We're excited to introduce SGLang-Jax, a state-of-the-art open-source inference engine built entirely on Jax and XLA. It leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, this project delivers fast, native TPU infer...

Accelerating Hybrid Inference in SGLang with KTransformers CPU Kernels

by: KVCache.AI and Approaching AI, October 22, 2025

Background: Hybrid Inference for Sparse MoE Models Modern Mixture-of-Experts (MoE) language models such as DeepSeek-V3 contain hundreds of billions of parameters, but only a small subset of experts are activated per token. This sparse activation pattern makes MoE models ideal for CPU/GPU hybrid infe...

SGLang and NVIDIA Accelerating SemiAnalysis InferenceMAX and GB200 Together

by: NVIDIA and community SGLang developers, Oct 14, 2025

The SGLang and NVIDIA teams have a strong track record of collaboration, consistently delivering inference optimizations and system-level improvements to ensure exceptional performance of the SGLang framework. Most recently, this collaboration has been centered on the NVIDIA Blackwell architecture, ...

NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference

by: Jerry Zhou and Richard Chen, October 13, 2025

Thanks to NVIDIA’s early access program, we are thrilled to get our hands on the NVIDIA DGX™ Spark. It’s quite an unconventional system, as NVIDIA rarely releases compact, all-in-one machines that bring supercomputing-class performance to a desktop workstation form factor. Over the past year, SGLang...

SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention

by: The SGLang Team, September 29, 2025

We are excited to announce that SGLang supports DeepSeek-V3.2 on Day 0! According to the DeepSeek tech report, it equips DeepSeek-V3.1-Terminus with DeepSeek Sparse Attention (DSA) through continued training. With DSA, a fine-grained sparse attention mechanism powered by a lightning indexer, DeepSee...

PD-Multiplexing: Unlocking High-Goodput LLM Serving with GreenContext

by: Weihao Cui, Yukang Chen, Xiaoze Fan, Han Zhao, Ziyi Xu, Xusheng Chen, Bingsheng He, Quan Chen, September 28, 2025

This post highlights our initial efforts to support a new serving paradigm, PD-Multiplexing, in SGLang. It is designed to deliver higher goodput in LLM serving. PD-Multiplexing leverages GreenContext, a new NVIDIA GPU capability that allows lightweight and fine-grained partitioning of GPU resources ...

Together with SGLang: Best Practices for Serving DeepSeek-R1 on H20-96G

by: Tianyu Zhang*, Peng Zhang*, Yusong Gao, Yun Zhang, September 26, 2025

Introduction Operationalizing scaled Mixture-of-Experts (MoE) models such as DeepSeek-R1 requires a careful balance of latency, throughput, and cost. The challenge is especially acute on hardware with asymmetric performance profiles—for example, the H20 GPU, which offers high memory bandwidth but co...

Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput

by: The SGLang Team, September 25, 2025

The GB200 NVL72 is one of the most powerful hardware for deep learning. In this blog post, we share our progress after our previous blog post to optimize the inference performance of DeepSeek V3/R1 with FP8 attention, NVFP4 MoE, large-scale expert parallelism, prefill-decode disaggregation, and vari...

Towards Deterministic Inference in SGLang and Reproducible RL Training

by: The SGLang Team, September 22, 2025 (Updated on September 24)

TL;DR: This post shares our efforts to enable deterministic inference in SGLang and our collaboration with slime to work towards reproducible RL training. <br /> Recently, the Thinking Machines Lab published a blog detailing their findings. Since this blog was published, the industry has respo...

Optimizing FP4 Mixed-Precision Inference on AMD GPUs

by: Haohui Mai, Lei Zhang, September 21, 2025

Introduction As frontier large language models (LLMs) continue scaling to unprecedented sizes, they demand increasingly more compute power and memory bandwidth from GPUs. Both GPU manufacturers and model developers are shifting toward low-precision floating-point formats. FP4 (4-bit floating point) ...

SGLang HiCache: Fast Hierarchical KV Caching with Your Favorite Storage Backends

by: Zhiqiang Xie, September 10, 2025

From the community: In a coding agent scenario using Qwen3-Coder-480B, the observed dialogues often stretched past 25K tokens around 8 turns per session. Without full KV cache retention, nearly every request required costly re-computation. By integrating SGLang HiCache with DeepSeek 3FS KVStore for ...

LongCat-Flash: Deploying Meituan's Agentic Model with SGLang

by: Meituan LongCat Team, September 01, 2025

1. Introduction: Deploying Meituan's Agentic Open-Source MoE Model LongCat-Flash, Meituan's open-source Agentic Mixture-of-Experts (MoE) model is now available from huggingface LongCat-Flash-Chat. Released by Meituan LongCat Team, it features: 560B total params 18.6B–31.3B (27B on average) per toke...

Fine-tune and deploy gpt-oss MXFP4: ModelOpt + SGLang

by: NVIDIA ModelOpt Team, Aug 28, 2025

(Updated on Aug 29) OpenAI recently released gpt-oss, the first open source model family from OpenAI's lab since GPT-2. These models demonstrate strong math, coding, and general capabilities. Part of the model's uniqueness is that it was released in native MXFP4 weight only quantization. This allows...

SGLang for gpt-oss: From Day 0 Support to Enhanced Performance

by: Liangsheng Yin, Ke Bao, August 27, 2025

We are excited to announce a major update for SGLang, focusing on deep performance optimizations and new features for the recently released openai/gpt-oss-120b model. While we had support from day zero, we took the last few weeks to enhance our engine to ensure you get the best possible performance....

GLM-4.5 Meets SGLang: Reasoning, Coding, and Agentic Abilities

by: GLM Team, July 31, 2025

Today, we are excited to introduce our latest flagship models GLM-4.5 and GLM-4.5-Air, along with their FP8 variants. All models are now available with day-one support on SGLang. GLM-4.5 and GLM-4.5-Air are both powerful models designed to unify reasoning, coding, and agentic capabilities, with 355B...

SpecForge: Accelerating Speculative Decoding Training for SGLang

by: The SGLang Team, July 25, 2025

Speculative decoding is a powerful technique for accelerating Large Language Model (LLM) inference. In this blog post, we are excited to announce the open-sourcing of SpecForge, our new training framework for Eagle3-based speculative decoding. SpecForge is designed for ease of use and is tightly int...

Deploying Kimi K2 with PD Disaggregation and Large-Scale Expert Parallelism on 128 H200 GPUs

by: The Mooncake Team, July 20, 2025

1️⃣ Introduction: Deploying the Most Advanced Open-Source MoE Model Kimi K2 is currently the most advanced open-source Mixture-of-Experts (MoE) model available. Released by Moonshot AI in 2025, it features: 1 trillion total parameters 32 billion activated parameters per token 384 experts with dynam...

Accelerating SGLang with Multiple Token Prediction

by: Eigen AI Team, July 17, 2025

TL;DR SGLang now supports smooth combination of these advanced features: Multiple Token Prediction (MTP), Large-Scale Expert Parallelism (EP), and Prefill-Decode disaggregation. This integration delivers up to 60% higher output throughput through a new decoding paradigm, better parallelism, and more...

How to support new VLMs into SGLang: A Case Study with NVILA

by: The NVILA Team, July 16, 2025

The world of LLMs is evolving at a remarkable pace, with Visual Language Models (VLMs) at the forefront of this revolution. These models power applications that can understand and reason about both images and text. There are tons of new VLM models emerging daily, and we want to integrate them into S...

Cost Effective Deployment of DeepSeek R1 with Intel® Xeon® 6 CPU on SGLang

by: Intel PyTorch Team, July 14, 2025

The impressive performance of DeepSeek R1 marked a rise of giant Mixture of Experts (MoE) models in Large Language Models (LLM). However, its massive model size and unique architecture have posed new challenges on deployment. The significant memory requirements will normally require 8x or even 16x h...

slime: An SGLang-Native Post-Training Framework for RL Scaling

by: The slime Team, July 9, 2025

Vision That Drives slime We believe in RL. We believe RL is the final piece toward AGI. If you feel the same way, you'll share our vision: Every field should be end-to-end RLed and every task should become an agent environment. Every RL run should last longer, and every model should scale larger. R...

OME: Revolutionizing LLM Infrastructure with Model-Driven Architecture

by: The Oracle Team, July 8, 2025

The Tale of Two Teams: Why Model Serving Is Broken In any large organization deploying LLMs, two distinct teams emerge with conflicting needs: The ML Engineers spend months benchmarking models, experimenting with serving technologies, and crafting optimal deployment strategies. Each model demands di...

Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part I): 2.7x Higher Decoding Throughput

by: The SGLang Team, June 16, 2025

The GB200 NVL72 is the world's most advanced hardware for AI training and inference. In this blog post, we're excited to share early results from running DeepSeek 671B with prefill-decode disaggregation and large-scale expert parallelism on the GB200 NVL72. By leveraging Blackwell-specific features ...

Deploying DeepSeek with PD Disaggregation and Large-Scale Expert Parallelism on 96 H100 GPUs

by: The SGLang Team, May 5, 2025

DeepSeek is a popular open-source large language model (LLM) praised for its strong performance. However, its large size and unique architecture, which uses Multi-head Latent Attention (MLA) and Mixture of Experts (MoE), require an advanced system for efficient serving at scale. In this blog, we exp...

SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs

by: The SGLang Team, December 4, 2024

We’re excited to release SGLang v0.4, featuring significant performance improvements and new features: Zero-overhead batch scheduler: 1.1x increase in throughput. Cache-aware load balancer: up to 1.9x increase in throughput with 3.8x higher cache hit rate. Data parallelism attention for DeepSeek mo...

Announcing a New Site for Chatbot Arena

by: LMSys Team, Sep 20, 2024

We’re excited to share that Chatbot Arena now has its own dedicated website: lmarena.ai and blog! You might be wondering why we’re making this change. Over the past year, with the incredible support of our community, Chatbot Arena has evolved into a mature ecosystem and platform. We believe it’s tim...

RedTeam Arena: An Open-Source, Community-driven Jailbreaking Platform

by: Anastasios Angelopoulos*, Luca Vivona*, Wei-Lin Chiang*, Aryan Vichare, Lisa Dunlap, Salvivona, Pliny, Ion Stoica, Sep 13, 2024

We are excited to launch RedTeam Arena, a community-driven redteaming platform, built in collaboration with Pliny and the BASI community! <img src="/images/blog/redteam_arena/badwords.png" style="display:block; margin-top: auto; margin-left: auto; margin-right: auto; margin-bottom:...

SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision

by: The SGLang Team, September 4, 2024

We're excited to announce the release of SGLang v0.3, which brings significant performance enhancements and expanded support for novel model architectures. Here are the key updates: Up to 7x higher throughput for DeepSeek Multi-head Latent Attention (MLA) Up to 1.5x lower latency with torch.compile...

Does style matter? Disentangling style and substance in Chatbot Arena

by: Tianle Li*, Anastasios Angelopoulos*, Wei-Lin Chiang*, Aug 29, 2024

Why is GPT-4o-mini so good? Why does Claude rank so low, when anecdotal experience suggests otherwise? We have answers for you. We controlled for the effect of length and markdown, and indeed, the ranking changed. This is just a first step towards our larger goal of disentangling substance and style...

Achieving Faster Open-Source Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM)

by: The SGLang Team, Jul 25, 2024

At LMSYS.org, we've been running the Chatbot Arena platform for over a year, serving millions of users. We know firsthand how crucial efficient serving is for AI products and research. Through our operational experiences and in-depth research, we've continuously enhanced the underlying serving syste...

RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing

by: Isaac Ong*, Amjad Almahairi*, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica, July 1, 2024

LLMs have demonstrated remarkable capabilities across a range of tasks, but there exists wide variation in their costs and capabilities, as seen from the plot of performance against cost in Figure 1. Very broadly, more capable models tend to be more expensive than less capable models. This leads to ...

The Multimodal Arena is Here!

by: Christopher Chou*, Lisa Dunlap*, Wei-Lin Chiang, Ying Sheng, Lianmin Zheng, Anastasios Angelopoulos, Trevor Darrell, Ion Stoica, Joseph E. Gonzalez, June 27, 2024

Multimodal Chatbot Arena We added image support to Chatbot Arena! You can now chat with your favorite vision-language models from OpenAI, Anthropic, Google, and most other major LLM providers to help discover how these models stack up against eachother. In just two weeks, we have collected over 17,0...

Introducing Hard Prompts Category in Chatbot Arena

by: Tianle Li, Wei-Lin Chiang, Lisa Dunlap, May 20, 2024

Background Introducing Hard Prompts, a new and challenging category in the Chatbot Arena Leaderboard. Over the past few months, the community has shown a growing interest in more challenging prompts that push the limits of current language models. To meet this demand, we are excited to introduce the...

What’s up with Llama 3? Arena data analysis

by: Lisa Dunlap, Evan Frick, Tianle Li, Isaac Ong, Joseph E. Gonzalez, Wei-Lin Chiang, May 8, 2024

On April 18th, Meta released Llama 3, their newest open-weight large language model. Since then, Llama 3-70B has quickly risen to the top of the English Chatbot Arena leaderboard with over 50,000 battles. This remarkable achievement by Meta is excellent news for the open-source community. In this bl...

LMSYS Kaggle Competition – Predicting Human Preference with $100,000 in Prizes

by: LMSYS Arena Team, May 2, 2024

Overview LMSYS and Kaggle are launching a human preference prediction competition! You are challenged to predict which responses users will prefer in head-to-head battles between Large Language Models (LLMs). You'll work with a dataset from the Chatbot Arena, containing conversations and user prefer...

From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline

by: Tianle Li*, Wei-Lin Chiang*, Evan Frick, Lisa Dunlap, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica, April 19, 2024

Building an affordable and reliable benchmark for LLM chatbots has become a critical challenge. A high-quality benchmark should 1) robustly separate model capability, 2) reflect human preference in real-world use cases, and 3) frequently update to avoid over-fitting or test set leakage. Traditional ...

LMSYS Chatbot Arena: Live and Community-Driven LLM Evaluation

by: LMSYS Arena Team, Mar 1, 2024

Our Mission Chatbot Arena (lmarena.ai) is an open-source project developed by members from LMSYS and UC Berkeley SkyLab. Our mission is to advance LLM development and understanding through live, open, and community-driven evaluations. We maintain the open evaluation platform for any user to rate LLM...

Fast JSON Decoding for Local LLMs with Compressed Finite State Machine

by: Liangsheng Yin, Ying Sheng, Lianmin Zheng, Feb 5, 2024

Constraining an LLM to consistently generate valid JSON or YAML that adheres to a specific schema is a critical feature for many applications. In this blog post, we introduce an optimization that significantly accelerates this type of constrained decoding. Our approach utilizes a compressed finite s...

Fast and Expressive LLM Inference with RadixAttention and SGLang

by: Lianmin Zheng*, Liangsheng Yin, Zhiqiang Xie, Jeff Huang, Chuyue Sun, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng*, Jan 17, 2024

Large Language Models (LLMs) are increasingly utilized for complex tasks that require multiple chained generation calls, advanced prompting techniques, control flow, and interaction with external environments. However, there is a notable deficiency in efficient systems for programming and executing ...

Chatbot Arena: New models & Elo system update

by: Wei-Lin Chiang, Tim Li, Joseph E. Gonzalez, Ion Stoica, Dec 7, 2023

Welcome to our latest update on the Chatbot Arena, our open evaluation platform to test the most advanced LLMs. We're excited to share that over 130,000 votes that are now collected to rank the most capable 40+ models! In this blog post, we'll cover the results of several new models: Tulu-2-DPO-70B...

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

by: Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang, November 21, 2023

TL;DR: We introduce lookahead decoding, a new, exact, and parallel decoding algorithm to accelerate LLM inference. Lookahead decoding breaks the sequential dependency in autoregressive decoding by concurrently extracting and verifying n-grams directly with the LLM, utilizing the Jacobi iteration me...

Recipe for Serving Thousands of Concurrent LoRA Adapters

by: Ying Sheng*, Shiyi Cao*, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica, November 15, 2023

In this blog post, we introduce S-LoRA (code), a system designed for the scalable serving of many LoRA adapters. S-LoRA adopts the idea of Unified Paging for KV cache and adapter weights to reduce memory fragmentation. Heterogeneous Batching of LoRA computation with different ranks leveraging optim...

Catch me if you can! How to beat GPT-4 with a 13B model

by: Shuo Yang*, Wei-Lin Chiang*, Lianmin Zheng*, Joseph E. Gonzalez, Ion Stoica, Nov 14, 2023

Announcing Llama-rephraser: 13B models reaching GPT-4 performance in major benchmarks (MMLU/GSK-8K/HumanEval)! To ensure result validity, we followed OpenAI's decontamination method and found no evidence of data contamination. <img src="/images/blog/decontaminator/llama-rephraser.png" s...

ToxicChat: A Benchmark for Content Moderation in Real-world User-AI Interactions

by: Zi Lin*, Zihan Wang*, Yongqi Tong, Yangkun Wang, Yuxin Guo, Yujia Wang, Jingbo Shang, October 30, 2023

In this blogpost, we introduce ToxicChat, a benchmark consisting of 10K high-quality data for content moderation in real-world user-AI interactions. Evaluation results show that fine-tuning on this benchmark notably improves a baseline model’s ability to detect toxic queries in user-AI interactions....

Chatbot Arena Conversation Dataset Release

by: LMSYS Org, July 20, 2023

Since its launch three months ago, Chatbot Arena has become a widely cited LLM evaluation platform that emphasizes large-scale, community-based, and interactive human evaluation. In that short time span, we collected around 53K votes from 19K unique IP addresses for 22 models. In this blog post, we ...

How Long Can Open-Source LLMs Truly Promise on Context Length?

by: The LongChat Team, June 29, 2023

In this blogpost, we introduce our latest series of chatbot models, LongChat-7B and LongChat-13B, featuring a new level of extended context length up to 16K tokens. Evaluation results show that the long-range retrieval accuracy of LongChat-13B is up to 2x higher than other long-context open models s...

Chatbot Arena Leaderboard Week 8: Introducing MT-Bench and Vicuna-33B

by: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Hao Zhang, June 22, 2023

In this blog post, we share the latest update on Chatbot Arena leaderboard, which now includes more open models and three metrics: Chatbot Arena Elo, based on 42K anonymous votes from Chatbot Arena using the Elo rating system. MT-Bench score, based on a challenging multi-turn benchmark and GPT-4 gr...

Building a Truly "Open" OpenAI API Server with Open Models Locally

by: Shuo Yang and Siyuan Zhuang, June 9, 2023

Many applications have been built on closed-source OpenAI APIs, but now you can effortlessly port them to use open-source alternatives without modifying the code. FastChat's OpenAI-compatible API server enables this seamless transition. In this blog post, we show how you can do this and use LangChai...

Chatbot Arena Leaderboard Updates (Week 4)

by: LMSYS Org, May 25, 2023

In this update, we are excited to welcome the following models joining the Chatbot Arena: Google PaLM 2, chat-tuned with the code name chat-bison@001 on Google Cloud Vertex AI Anthropic Claude-instant-v1 MosaicML MPT-7B-chat Vicuna-7B A new Elo rating leaderboard based on the 27K anonymous voting ...

Chatbot Arena Leaderboard Updates (Week 2)

by: LMSYS Org, May 10, 2023

We release an updated leaderboard with more models and new data we collected last week, after the announcement of the anonymous Chatbot Arena. We are actively iterating on the design of the arena and leaderboard scores. In this update, we have added 4 new yet strong players into the Arena, including...

Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings

by: Lianmin Zheng*, Ying Sheng*, Wei-Lin Chiang, Hao Zhang, Joseph E. Gonzalez, Ion Stoica, May 3, 2023

We present Chatbot Arena, a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner. In this blog post, we are releasing our initial results and a leaderboard based on the Elo rating system, which is a widely-used rating system in ches...

Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality

by: The Vicuna Team, March 30, 2023

We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achieves more than 90%* quality of OpenAI ChatGPT and Google Bard while outperforming other models like LL...