PROJECTS

LMSYS Org develops open models, datasets, systems, and evaluation tools for large models.

EVALUATION

Chatbot Arena

A benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner. It comes with a leaderboard based on Elo ratings.

Arena Hard Auto

An automatic pipeline converting live data to high quality benchmarks for evaluating chat assistants. The questions are more difficult than those in MT-Bench.

MT-Bench

A set of challenging, multi-turn, and open-ended questions for evaluating chat assistants. It uses LLM-as-a-judge to evaluate model responses.

SYSTEMS

FastChat

An open and scalable platform for training, finetuning, serving, and evaluating LLM-based chatbots.

SGLang

A fast serving engine for LLMs and VLMs.

S-LoRA

A system for serving thousands of concurrent LoRA adapters.

RouteLLM

A framework for serving and evaluating LLM routers.

Lookahead Decoding

An exact, fast, parallel decoding algorithm without the need for draft models or data stores.

DATASETS

LMSYS-Chat-1M

This dataset contains one million real-world conversations with 25 state-of-the-art LLMs.

Chatbot Arena Conversations

This dataset contains 33K cleaned conversations with pairwise human preferences collected on Chatbot Arena.

ToxicChat

This dataset contains 10K high-quality data for content moderation in real-world user-AI interactions based on user queries from the Vicuna online demo.

MODELS

Vicuna

Base: Llama

Size: 7B, 13B, 33B

An open-source chatbot impressing GPT-4 with 90%* ChatGPT quality.

LongChat

Base: Llama

Size: 7B, 13B

A series of open-source chatbots with long context length (16K - 32K).

FastChat-T5

Base: Flan-T5

Size: 3B

A commercial-friendly, compact, yet powerful chat assistant.