GPQA Diamond

Graduate-Level Google-Proof Q&A

Model ranking

#ModelScore (%)
1Claude 3.5 Sonnet
Anthropic
65.0%
2Gemini 1.5 Pro
Google DeepMind
59.1%
3GPT-4o
OpenAI
53.6%
4Llama 3.1 405B
Meta AI
51.1%