SWE-Bench Verified

Software Engineering Benchmark (Verified)

Model ranking

#ModelScore (%)
1Claude 3.5 Sonnet
Anthropic
49.0%
2Gemini 1.5 Pro
Google DeepMind
38.0%
3GPT-4o
OpenAI
33.2%
4Llama 3.1 405B
Meta AI
24.0%