PROJECTS
LMSYS Org develops open models, datasets, systems, and evaluation tools for large models.
MODELS
DATASETS
EVALUATION
Chatbot Arena
A benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner. It comes with a leaderboard based on Elo ratings.
MT-Bench
A set of challenging, multi-turn, and open-ended questions for evaluating chat assistants. It uses LLM-as-a-judge to evaluate model responses.