#83
DeepSeek
Release: 2026-04-24
Tested on: 2026-04-29 14:46
deepseek/deepseek-v4-pro::none
(high)
(none)
Input Price
$0.435 / 1M
Output Price
$0.870 / 1M
Flaky tests
3
Flaky tests had mixed outcomes across runs (at least one pass and one fail).
Run history
| Tested on | Score | Reliability | Tests Correct | Total Cost | Compare |
|---|---|---|---|---|---|
| 2026-04-29 14:46 Re-test | 6.2 | 7.9 | $0.043 | Current run | |
| 2026-04-24 09:19 Initial run | 3.1 | N/A | $0.009 | Compare |
Charts
Choose the first model, then click a second model to open a side-by-side page.
Score vs Total Cost
Response Time (avg)
Score vs Response Time (avg)
Total Output Tokens
Score vs Total Output Tokens
Quick Compare
DeepSeek V4 PrononevsGemini 2.5 FlashnoneDeepSeek V4 PrononevsGemma 4 26B A4BnoneFree AvailableDeepSeek V4 PrononevsNemotron 3 SupermediumFree AvailableDeepSeek V4 PrononevsGPT-5 NanomediumDeepSeek V4 PrononevsLaguna M.1mediumFree AvailableDeepSeek V4 PrononevsGemini 3 Flash PreviewmediumDeepSeek V4 PrononevsGemini 3.1 Pro PreviewmediumDeepSeek V4 PrononevsHY3 PreviewhighFree Available
Category Breakdown
| Category | Score | Consistency | Tests Correct |
|---|---|---|---|
| Anti-AI Tricks | 3.5 | 8.0 | |
| Coding | 7.1 | 3.7 | |
| Combined | 9.5 | 10.0 | |
| Data parsing and extraction | 8.8 | 10.0 | |
| Domain specific | 5.3 | 10.0 | |
| General Intelligence | 4.3 | 9.9 | |
| Instructions following | 6.3 | 10.0 | |
| Puzzle Solving | 7.6 | 7.2 | |
| Tool Calling | 10.0 | 10.0 |