BLOG
Chatbot Arena Conversation Dataset Release
by: LMSYS Org, July 20, 2023
Since its launch three months ago, Chatbot Arena has become a widely cited LLM evaluation platform that emphasizes large-scale, community-based, and interactive human evaluation. In that short time span, we collected around 53K votes from 19K unique IP addresses for 22 models. In this blog post, we ...
How Long Can Open-Source LLMs Truly Promise on Context Length?
by: The LongChat Team, June 29, 2023
In this blogpost, we introduce our latest series of chatbot models, LongChat-7B and LongChat-13B, featuring a new level of extended context length up to 16K tokens. Evaluation results show that the long-range retrieval accuracy of LongChat-13B is up to 2x higher than other long-context open models s...
Chatbot Arena Leaderboard Week 8: Introducing MT-Bench and Vicuna-33B
by: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Hao Zhang, June 22, 2023
In this blog post, we share the latest update on Chatbot Arena leaderboard, which now includes more open models and three metrics: Chatbot Arena Elo, based on 42K anonymous votes from Chatbot Arena using the Elo rating system. MT-Bench score, based on a challenging multi-turn benchmark and GPT-4 gr...
Building a Truly "Open" OpenAI API Server with Open Models Locally
by: Shuo Yang and Siyuan Zhuang, June 9, 2023
Many applications have been built on closed-source OpenAI APIs, but now you can effortlessly port them to use open-source alternatives without modifying the code. FastChat's OpenAI-compatible API server enables this seamless transition. In this blog post, we show how you can do this and use LangChai...
Chatbot Arena Leaderboard Updates (Week 4)
by: LMSYS Org, May 25, 2023
In this update, we are excited to welcome the following models joining the Chatbot Arena: Google PaLM 2, chat-tuned with the code name chat-bison@001 on Google Cloud Vertex AI Anthropic Claude-instant-v1 MosaicML MPT-7B-chat Vicuna-7B A new Elo rating leaderboard based on the 27K anonymous voting ...
Chatbot Arena Leaderboard Updates (Week 2)
by: LMSYS Org, May 10, 2023
We release an updated leaderboard with more models and new data we collected last week, after the announcement of the anonymous Chatbot Arena. We are actively iterating on the design of the arena and leaderboard scores. In this update, we have added 4 new yet strong players into the Arena, including...
Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings
by: Lianmin Zheng*, Ying Sheng*, Wei-Lin Chiang, Hao Zhang, Joseph E. Gonzalez, Ion Stoica, May 3, 2023
We present Chatbot Arena, a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner. In this blog post, we are releasing our initial results and a leaderboard based on the Elo rating system, which is a widely-used rating system in ches...
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
by: The Vicuna Team, March 30, 2023
We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achieves more than 90%* quality of OpenAI ChatGPT and Google Bard while outperforming other models like LL...