AI Benchmarks

How the leading AI models perform across reasoning, coding, math and vision benchmarks.

GPQA Diamond

Graduate-Level Google-Proof Q&A

Massive Multitask Language Understanding (Pro)

Software Engineering Benchmark (Verified)

Hand-written programming problems

American Invitational Mathematics Examination