#130
Arcee AI
Release: 2026-01-27
Tested on: 2026-05-08 15:30
arcee-ai/trinity-large-preview::none
4.8
Consistency
8.9
10.0
$0.001
Total Output Tokens
2,190
Total Input Tokens
0
Input Price
$0.150 / 1M
Output Price
$0.450 / 1M
Flaky tests
3
Flaky tests had mixed outcomes across runs (at least one pass and one fail).
Run history
| Tested on | Score | Reliability | Tests Correct | Total Cost | Compare |
|---|---|---|---|---|---|
| 2026-05-22 00:42 Suite changed | 4.8 | 10.0 | $0.001 | Compare | |
| 2026-05-08 15:30 Suite changed | 4.8 | 10.0 | $0.001 | Current run | |
| 2026-04-23 10:54 First recorded run | 5.3 | N/A | $0.000 | Compare |
This run used a different benchmark suite. Keep suite changes in mind when reading historical movement.
Charts
Choose the first model, then click a second model to open a side-by-side page.
Score vs Total Cost
Response Time (avg)
Score vs Response Time (avg)
Total Output Tokens
Score vs Total Output Tokens
Quick Compare
Trinity Large PreviewnonevsMiniMax M2.7mediumTrinity Large PreviewnonevsDeepSeek V3.2noneTrinity Large PreviewnonevsGrok 4.20noneTrinity Large PreviewnonevsQwen3.5-122B-A10BnoneTrinity Large Previewnonevsgpt-oss-120bnoneFree AvailableTrinity Large PreviewnonevsGLM 5 TurbononeTrinity Large PreviewnonevsGemini 3 Flash PreviewmediumTrinity Large PreviewnonevsGemini 3.5 FlashhighTrinity Large PreviewnonevsRing-2.6-1TmediumTrinity Large PreviewnonevsGemini 3.5 Flashlow
Category Breakdown
| Category | Score | Consistency | Tests Correct |
|---|---|---|---|
| Anti-AI Tricks | 3.1 | 10.0 | |
| Coding | 4.9 | 3.2 | |
| Combined | 3.0 | 10.0 | |
| Data parsing and extraction | 10.0 | 10.0 | |
| Domain specific | 5.3 | 10.0 | |
| General Intelligence | 4.5 | 10.0 | |
| Instructions following | 3.4 | 6.2 | |
| Puzzle Solving | 3.6 | 7.7 | |
| Tool Calling | 10.0 | 10.0 | |
| Trivia | 3.0 | 10.0 |