Who's GPT-4's Favorite? Battles Between State-of-the-Art Chatbots

by: The Vicuna Team, 30 Mar, 2023

We have compiled a list of 80 challenging questions, spanning 9 categories such as writing, roleplay, math, coding, and knowledge. We then asked each LLM to generate responses to these questions, and used GPT-4 to evaluate and determine which LLM produced the better responses. Explore the questions, responses, and judgement results below! The code of this evaluation pipeline is available here.