AI BENCHY
AI Benchmark Leaderboard
Last updated at: 2026-05-22
Models Evaluated: 153
153/153
Filter models
No models match the current search and filters.
| Rank | Model | Score Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance. | Company | Total Cost | Response Time (avg) Response Time (avg) | Tests Correct Shows how many tests are fully passed (all runs pass). |
|---|---|---|---|---|---|---|
| #151#151 | Qwen3.5-9Bmedium | 4.2โฆ | Qwen | $0.035โฆ | 80.10sโฆ | A test is fully passed only if every run passed for that test. Timed out: 10 Wrong answer: 3 No answer: 2 Extra formatting: 1 Did not follow instructions: 1 Response Time (avg)80.10s Response Time (max)226.38s Response Time (total)1281.62s โฆ |
|
||||||
| #152#152 | LFM2-24B-A2BnoneArchived model: this model is no longer updated or tested on new tests. | 4.2โฆ | Liquid | $0.001โฆ | 811msโฆ | A test is fully passed only if every run passed for that test. Wrong answer: 8 API error: 4 Did not follow instructions: 2 Response Time (avg)811ms Response Time (max)2.88s Response Time (total)11.35s โฆ |
|
||||||
| #153#153 | Granite 4.1 8Bnone | 4.1โฆ | IBM Granite | $0.003โฆ | 723msโฆ | A test is fully passed only if every run passed for that test. Wrong answer: 12 Did not follow instructions: 5 Invalid tool call: 1 Response Time (avg)723ms Response Time (max)2.17s Response Time (total)14.45s โฆ |
|
||||||
Quick Compare
Gemini 3 Flash PreviewmediumvsGemini 3.5 FlashhighGemini 3 Flash PreviewmediumvsGemini 3.5 FlashlowGemini 3 Flash PreviewmediumvsGemini 3.1 Pro PreviewmediumGemini 3 Flash PreviewmediumvsQwen3.7 MaxmediumGemini 3 Flash PreviewmediumvsGemini 3.5 FlashmediumGemini 3 Flash PreviewmediumvsClaude Opus 4.7mediumGemini 3 Flash PreviewmediumvsRing-2.6-1TmediumGemini 3.5 FlashhighvsGemini 3.5 FlashlowGemini 3.5 FlashlowvsGemini 3.1 Pro PreviewmediumGemini 3.1 Pro PreviewmediumvsQwen3.7 MaxmediumQwen3.7 MaxmediumvsGemini 3.5 FlashmediumGemini 3.5 FlashmediumvsClaude Opus 4.7medium