Navigate
AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Trinity Large Preview vs Inception: Mercury 2

Compare:

Last updated at: 2026-03-06

Metric Trinity Large Preview none Release: 2026-01-27 Free Available Inception: Mercury 2 medium Release: 2026-02-24
Rank #45 #36
Avg Score 4.2 5.3
Consistency 9.6 8.4
Cost per result 0.000 0.631
Total Cost $0.000 $0.045
Tests Correct
Attempt pass rate 33.3% 54.2%
Flaky tests 1 3
Total Runs 48 48
Output Tokens 1,837 3,708
Reasoning Tokens 0 45,921
Response Time (avg) 3.15s 2.36s
Response Time (max) 8.91s 14.63s
Response Time (total) 50.46s 35.39s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Avg Score vs Response Time (avg)

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Trinity Large Preview 10.0 10.0 0.0% 0 3.59s 587 0
Inception: Mercury 2 7.3 9.8 66.7% 0 1.30s 2,531 2,410
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Trinity Large Preview 10.0 10.0 0.0% 0 8.91s 294 0
Inception: Mercury 2 10.0 10.0 100.0% 0 3.28s 268 4,887
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Trinity Large Preview 9.9 10.0 100.0% 0 3.26s 186 0
Inception: Mercury 2 5.5 5.9 83.3% 1 1.11s 183 1,656
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Trinity Large Preview 4.0 10.0 33.3% 0 877ms 25 0
Inception: Mercury 2 10.0 7.2 11.1% 1 6.48s 41 30,754
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Trinity Large Preview 3.0 9.9 0.0% 0 2.86s 124 0
Inception: Mercury 2 4.0 10.0 0.0% 0 821ms 137 542
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Trinity Large Preview 3.5 6.7 16.7% 1 1.09s 63 0
Inception: Mercury 2 10.0 10.0 100.0% 0 1.07s 14 958
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Trinity Large Preview 4.0 10.0 33.3% 0 3.30s 291 0
Inception: Mercury 2 1.7 7.5 22.2% 1 934ms 354 2,758
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Trinity Large Preview 10.0 10.0 100.0% 0 6.67s 267 0
Inception: Mercury 2 10.0 10.0 100.0% 0 1.89s 180 1,956

Quick Compare

Switch Comparison Pair