比较图表

语言:

❤️ Made by XCS

#50

Mercury 2

Inception · 发布日期: 2026-02-24 · inception/mercury-2::none

平均分

31

每个结果成本

0.196

一致性

89

总成本

$0.006

测试正确

3

只有当某个测试的所有运行都通过时，才计为完全通过。

错误测试数

12

尝试通过率: 26.7%

不稳定测试

2

响应时间：平均 594ms · 总计 8.91s · 最大 1.27s

答案错误: 11 未遵循指令: 1

按分数排名的模型

先选择第一个模型，再点击第二个模型打开并排页面。

#43 Qwen3.5-35B-A3B 45

#44 GPT-5.4 45

#45 Trinity Large Preview (free) 41

#46 GPT-4o-mini 41

#47 GLM 4.7 Flash 38

#48 Kimi K2.5 37

#49 Qwen3 Coder Next 34

#50 Mercury 2 31

#51 Qwen3 Coder Next 31

#52 Grok 4.1 Fast 29

#53 GLM 4.7 Flash 29

#54 MiMo-V2-Flash 27

#55 LFM2-24B-A2B 23

快速对比

Mercury 2nonevsQwen3 Coder Nextnone Mercury 2nonevsQwen3 Coder Nextmedium Mercury 2nonevsKimi K2.5none Mercury 2nonevsGrok 4.1 Fastnone Mercury 2nonevsGLM 4.7 Flashnone Mercury 2nonevsGLM 4.7 Flashmedium Mercury 2nonevsGemini 3 Flash Previewmedium Mercury 2nonevsGemini 3.1 Pro Previewmedium Mercury 2nonevsStep 3.5 Flashmedium免费可用

类别细分

类别	平均分	一致性	测试正确
Anti-AI Tricks	100	100	0/3
Combined	100	100	0/1
Data parsing and extraction	55	59	1/2
Domain specific	40	72	1/3
Instructions following	35	100	0/2
Puzzle Solving	100	100	0/3
Tool Calling	100	100	1/1