HumanEval

Hand-written programming problems

Model ranking

#ModelScore (pass@1)
1Claude 3.5 Sonnet
Anthropic
92.0
2GPT-4o
OpenAI
90.2
3Llama 3.1 405B
Meta AI
89.0
4Gemini 1.5 Pro
Google DeepMind
84.1