AI BENCHY
Compare Charts
โค๏ธ Made by XCS
Your ad here

#40

Mercury 2

Inception ยท Release: 2026-02-24 ยท inception/mercury-2::medium

Avg Score

48

Cost per result

0.726

Consistency

83

Total Cost

$0.044

Tests Correct

6

A test is fully passed only if every run passed for that test.

Wrong Tests

9

Attempt pass rate: 51.1%

Flaky tests

3

Response time: avg 2.47s ยท total 34.56s ยท max 14.63s

Wrong answer: 5 Did not follow instructions: 3 API error: 1

Top Models by Score

Choose the first model, then click a second model to open a side-by-side page.

Quick Compare

Category Breakdown

Category Avg Score Consistency Tests Correct
Anti-AI Tricks 73 98 2/3
Combined 100 100 1/1
Data parsing and extraction 55 59 1/2
Domain specific 100 72 0/3
Instructions following 55 100 1/2
Puzzle Solving 17 75 0/3
Tool Calling 100 100 1/1