LMSYS Org develops open models, datasets, systems, and evaluation tools for large models.
This dataset contains one million real-world conversations with 25 state-of-the-art LLMs.
Chatbot Arena Conversations
This dataset contains 33K cleaned conversations with pairwise human preferences collected on Chatbot Arena.
This dataset contains 10K high-quality data for content moderation in real-world user-AI interactions based on user queries from the Vicuna online demo.
A benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner. It comes with a leaderboard based on Elo ratings.
A set of challenging, multi-turn, and open-ended questions for evaluating chat assistants. It uses LLM-as-a-judge to evaluate model responses.