#7
Anthropic
Release: 2026-04-16
Tested on: 2026-05-22 00:18
anthropic/claude-opus-4.7::medium
(medium)
(none)
8.9
Consistency
10.0
10.0
$0.624
Total Output Tokens
12,637
Input Price
$5.000 / 1M
Output Price
$25.000 / 1M
Flaky tests
0
Flaky tests had mixed outcomes across runs (at least one pass and one fail).
Run history
| Tested on | Score | Reliability | Tests Correct | Total Cost | Compare |
|---|---|---|---|---|---|
| 2026-05-22 00:18 Re-test | 8.9 | 10.0 | $0.625 | Current run | |
| 2026-04-16 15:59 First recorded run | 9.2 | N/A | $0.447 | Compare |
Charts
Choose the first model, then click a second model to open a side-by-side page.
Score vs Total Cost
Response Time (avg)
Score vs Response Time (avg)
Total Output Tokens
Score vs Total Output Tokens
Quick Compare
Claude Opus 4.7mediumvsGemini 3.5 FlashmediumClaude Opus 4.7mediumvsGPT-5.5lowClaude Opus 4.7mediumvsQwen3.7 MaxmediumClaude Opus 4.7mediumvsGemini 3.5 FlashnoneClaude Opus 4.7mediumvsGemini 3.1 Pro PreviewmediumClaude Opus 4.7mediumvsGemini 3 Flash PreviewmediumClaude Opus 4.7mediumvsGemini 3.5 FlashhighClaude Opus 4.7mediumvsRing-2.6-1TmediumClaude Opus 4.7mediumvsGemini 3.5 Flashlow
Category Breakdown
| Category | Score | Consistency | Tests Correct |
|---|---|---|---|
| Anti-AI Tricks | 8.3 | 10.0 | |
| Coding | 10.0 | 10.0 | |
| Combined | 10.0 | 10.0 | |
| Data parsing and extraction | 10.0 | 10.0 | |
| Domain specific | 7.7 | 10.0 | |
| General Intelligence | 10.0 | 10.0 | |
| Instructions following | 10.0 | 10.0 | |
| Puzzle Solving | 10.0 | 10.0 | |
| Tool Calling | 10.0 | 10.0 | |
| Trivia | 3.0 | 10.0 |