PROJECTS
LMSYS Org develops open models, datasets, systems, and evaluation tools for large models.
EVALUATION
Chatbot Arena
A benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner. It comes with a leaderboard based on Elo ratings.
Arena Hard Auto
An automatic pipeline converting live data to high quality benchmarks for evaluating chat assistants. The questions are more difficult than those in MT-Bench.
MT-Bench
A set of challenging, multi-turn, and open-ended questions for evaluating chat assistants. It uses LLM-as-a-judge to evaluate model responses.
SYSTEMS
FastChat
An open and scalable platform for training, finetuning, serving, and evaluating LLM-based chatbots.
SGLang
A fast serving engine for LLMs and VLMs.
S-LoRA
A system for serving thousands of concurrent LoRA adapters.
RouteLLM
A framework for serving and evaluating LLM routers.
Lookahead Decoding
An exact, fast, parallel decoding algorithm without the need for draft models or data stores.
DATASETS
LMSYS-Chat-1M
This dataset contains one million real-world conversations with 25 state-of-the-art LLMs.
Chatbot Arena Conversations
This dataset contains 33K cleaned conversations with pairwise human preferences collected on Chatbot Arena.
ToxicChat
This dataset contains 10K high-quality data for content moderation in real-world user-AI interactions based on user queries from the Vicuna online demo.