#13
Stepfun · 发布日期: 2026-02-01 · stepfun/step-3.5-flash::medium
不稳定测试
2
不稳定测试在运行之间出现混合结果(至少一次通过且至少一次失败)。
未遵循指令: 3 答案错误: 3
图表
先选择第一个模型,再点击第二个模型打开并排页面。
快速对比
Step 3.5 Flashmedium免费可用vsGemini 3.1 Flash Lite PreviewmediumStep 3.5 Flashmedium免费可用vsGLM 5mediumStep 3.5 Flashmedium免费可用vsClaude Sonnet 4.6mediumStep 3.5 Flashmedium免费可用vsGPT-5.2 ChatnoneStep 3.5 Flashmedium免费可用vsQwen3.5-122B-A10BmediumStep 3.5 Flashmedium免费可用vsGemini 2.5 FlashmediumStep 3.5 Flashmedium免费可用vsGemini 3 Flash PreviewmediumStep 3.5 Flashmedium免费可用vsGemini 3.1 Pro PreviewmediumStep 3.5 Flashmedium免费可用vsTrinity Large Previewnone免费可用
类别细分
| 类别 | 平均分 | 一致性 | 测试正确 |
|---|---|---|---|
| Anti-AI Tricks | 10.0 | 10.0 | |
| Combined | 10.0 | 10.0 | |
| Data parsing and extraction | 10.0 | 10.0 | |
| Domain specific | 4.0 | 7.2 | |
| General Intelligence | 6.0 | 10.0 | |
| Instructions following | 9.0 | 6.8 | |
| Puzzle Solving | 4.0 | 10.0 | |
| Tool Calling | 10.0 | 10.0 |