Navigate
AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Anthropic: Claude Opus 4.6 vs Trinity Large Preview

Compare:

Last updated at: 2026-03-06

Metric Anthropic: Claude Opus 4.6 medium Release: 2026-02-05 Trinity Large Preview none Release: 2026-01-27 Free Available
Rank #26 #45
Avg Score 6.6 4.2
Consistency 9.0 9.6
Cost per result 13.118 0.000
Total Cost $1.312 $0.000
Tests Correct
Attempt pass rate 66.7% 33.3%
Flaky tests 2 1
Total Runs 48 48
Output Tokens 26,254 1,837
Reasoning Tokens 17,363 0
Response Time (avg) 22.86s 3.15s
Response Time (max) 83.40s 8.91s
Response Time (total) 205.71s 50.46s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Avg Score vs Response Time (avg)

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Anthropic: Claude Opus 4.6 4.0 4.4 55.6% 2 11.88s 897 1,000
Trinity Large Preview 10.0 10.0 0.0% 0 3.59s 587 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Anthropic: Claude Opus 4.6 10.0 10.0 100.0% 0 76.66s 8,178 5,194
Trinity Large Preview 10.0 10.0 0.0% 0 8.91s 294 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Anthropic: Claude Opus 4.6 9.9 10.0 100.0% 0 7.37s 691 757
Trinity Large Preview 9.9 10.0 100.0% 0 3.26s 186 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Anthropic: Claude Opus 4.6 10.0 10.0 0.0% 0 83.40s 14,642 8,687
Trinity Large Preview 4.0 10.0 33.3% 0 877ms 25 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Anthropic: Claude Opus 4.6 10.0 10.0 100.0% 0 5.04s 188 292
Trinity Large Preview 3.0 9.9 0.0% 0 2.86s 124 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Anthropic: Claude Opus 4.6 10.0 10.0 100.0% 0 2.43s 266 467
Trinity Large Preview 3.5 6.7 16.7% 1 1.09s 63 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Anthropic: Claude Opus 4.6 7.0 10.0 66.7% 0 4.60s 531 637
Trinity Large Preview 4.0 10.0 33.3% 0 3.30s 291 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Anthropic: Claude Opus 4.6 10.0 10.0 100.0% 0 9.73s 861 329
Trinity Large Preview 10.0 10.0 100.0% 0 6.67s 267 0

Quick Compare

Switch Comparison Pair