#61
Mistral ยท Release: 2026-03-16 ยท mistralai/mistral-small-2603::none
Flaky tests
1
Flaky tests had mixed outcomes across runs (at least one pass and one fail).
Charts
Choose the first model, then click a second model to open a side-by-side page.
Score vs Total Cost
Response Time (avg)
Score vs Response Time (avg)
Total Output Tokens
Score vs Total Output Tokens
Quick Compare
Mistral Small 4nonevsKimi K2.5noneMistral Small 4nonevsGrok 4.20 BetanoneMistral Small 4nonevsNemotron 3 Super 120b A12bnoneFree AvailableMistral Small 4nonevsGLM 4.7 FlashnoneMistral Small 4nonevsGPT-4o-mininoneMistral Small 4nonevsGemini 3 Flash PreviewmediumMistral Small 4nonevsGemini 3.1 Pro PreviewmediumMistral Small 4nonevsStep 3.5 FlashmediumFree Available
Category Breakdown
| Category | Score | Consistency | Tests Correct |
|---|---|---|---|
| Anti-AI Tricks | 3.4 | 7.9 | |
| Combined | 3.0 | 10.0 | |
| Data parsing and extraction | 10.0 | 10.0 | |
| Domain specific | 5.3 | 10.0 | |
| General Intelligence | 4.0 | 10.0 | |
| Instructions following | 6.5 | 10.0 | |
| Puzzle Solving | 3.1 | 9.9 | |
| Tool Calling | 10.0 | 10.0 |