Anthropic: Claude Opus 4.7 vs Tencent: Hy3 preview
Summary
Claude Opus 4.7 vs Hy3 preview benchmark comparison: Claude Opus 4.7 leads on average score with 7.4 vs 6.8. Hy3 preview has the lower benchmark cost at $0.059 vs $0.505. Claude Opus 4.7 is faster at 3.02s vs 56.57s, with pass rates of 76.2% vs 55.6%.
Recommended model: Claude Opus 4.7 - It has the best score here (7.4), while responding about 18.7x faster than Hy3 preview.
Last updated at: 2026-06-18
Metric
Claude Opus 4.7Claude Opus 4.7noneArchived model: this model is no longer updated or tested on new tests.Release: 2026-04-16
Hy3 previewHy3 previewhighArchived model: this model is no longer updated or tested on new tests.Release: 2026-04-22
Metric
Claude Opus 4.7Claude Opus 4.7noneArchived model: this model is no longer updated or tested on new tests.Release: 2026-04-16
Hy3 previewHy3 previewhighArchived model: this model is no longer updated or tested on new tests.Release: 2026-04-22
Score
7.4Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
6.8Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
Rank
#49
#74
Reliability
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
Consistency
9.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
9.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
Tests Correct
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.02sResponse Time (max)18.27sResponse Time (total)57.44sA test is fully passed only if every run passed for that test.…
A test is fully passed only if every run passed for that test.API error: 7Wrong answer: 3Response Time (avg)56.57sResponse Time (max)149.94sResponse Time (total)848.59sA test is fully passed only if every run passed for that test.…
Attempt pass rate
76.2%Attempt pass rate = passed attempts / total attempts across runs.…
55.6%Attempt pass rate = passed attempts / total attempts across runs.…
Flaky tests
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
2Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
Total Runs
57Total Runs…
63Total Runs…
Cost per result
3.154Shows the average cost per correct benchmark answer in cents (lower is better).…
0.000Shows the average cost per correct benchmark answer in cents (lower is better).…
Total Cost
$0.505Total Cost (Current Price)…
$0.059Total Cost (Current Price)…
Input Price
$5.000 / 1MInput Price…
$0.066 / 1MInput Price…
Output Price
$25.000 / 1MOutput Price…
$0.260 / 1MOutput Price…
Total Input Tokens
69,576Total Input Tokens…
25,987Total Input Tokens…
Output Tokens
6,265Output Tokens…
216,719Output Tokens…
Reasoning Tokens
0Reasoning Tokens…
0Reasoning Tokens…
Response Time (avg)
3.02sResponse Time (avg)…
56.57sResponse Time (avg)…
Response Time (max)
18.27sResponse Time (max)…
149.94sResponse Time (max)…
Response Time (total)
57.44sResponse Time (total)…
848.59sResponse Time (total)…
Generation showcase
Hamster playing table tennis
Prompt: Create a detailed SVG illustration of a hamster playing table tennis.
#49 Claude Opus 4.7
none
Cost
$0.051
Time
24.2s
Tokens
2,181 tok
#74 Hy3 preview
high
Hy3 preview is no longer available as a free model. It has transitioned to a paid model. Continue using it here: https://openrouter.ai/tencent/hy3-preview
Cost
$0.000
Time
0.0s
Tokens
0 tok
Score
-
Cost
-
Time
-
Tokens
-
Top Models by Score
Score vs Total Cost
Response Time (avg)
Score vs Response Time (avg)
Total Output Tokens
Score vs Total Output Tokens
Category Breakdown
Anti-AI Tricks
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
8.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
75.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.12sResponse Time (max)3.75sResponse Time (total)8.50sA test is fully passed only if every run passed for that test.…
2.12sResponse Time (avg)…
894Total Input Tokens…
522Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
6.4Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.9Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
58.3%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 2Response Time (avg)15.12sResponse Time (max)19.99sResponse Time (total)45.37sA test is fully passed only if every run passed for that test.…
15.12sResponse Time (avg)…
373Total Input Tokens…
6,839Output Tokens…
0Reasoning Tokens…
Coding
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
3.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
3.3Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
33.3%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.84sResponse Time (max)2.84sResponse Time (total)2.84sA test is fully passed only if every run passed for that test.…
2.84sResponse Time (avg)…
1,176Total Input Tokens…
494Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
5.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
33.3%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 2Response Time (avg)99.76sResponse Time (max)99.76sResponse Time (total)99.76sA test is fully passed only if every run passed for that test.…
99.76sResponse Time (avg)…
741Total Input Tokens…
38,167Output Tokens…
0Reasoning Tokens…
Combined
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
9.5Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.27sResponse Time (max)18.27sResponse Time (total)18.27sA test is fully passed only if every run passed for that test.…
18.27sResponse Time (avg)…
37,740Total Input Tokens…
3,504Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)113.09sResponse Time (max)113.09sResponse Time (total)113.09sA test is fully passed only if every run passed for that test.…
113.09sResponse Time (avg)…
13,119Total Input Tokens…
31,319Output Tokens…
0Reasoning Tokens…
Data parsing and extraction
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.15sResponse Time (max)2.33sResponse Time (total)4.29sA test is fully passed only if every run passed for that test.…
2.15sResponse Time (avg)…
10,533Total Input Tokens…
324Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
6.5Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
50.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)12.11sResponse Time (max)12.11sResponse Time (total)12.11sA test is fully passed only if every run passed for that test.…
12.11sResponse Time (avg)…
2,316Total Input Tokens…
4,323Output Tokens…
0Reasoning Tokens…
Domain specific
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
7.7Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
66.7%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.19sResponse Time (max)1.40sResponse Time (total)3.58sA test is fully passed only if every run passed for that test.…
1.19sResponse Time (avg)…
1,020Total Input Tokens…
78Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
5.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
44.4%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)109.04sResponse Time (max)149.94sResponse Time (total)327.11sA test is fully passed only if every run passed for that test.…
109.04sResponse Time (avg)…
747Total Input Tokens…
87,559Output Tokens…
0Reasoning Tokens…
General Intelligence
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.47sResponse Time (max)3.47sResponse Time (total)3.47sA test is fully passed only if every run passed for that test.…
3.47sResponse Time (avg)…
723Total Input Tokens…
257Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0msA test is fully passed only if every run passed for that test.…
0msResponse Time (avg)…
0Total Input Tokens…
0Output Tokens…
0Reasoning Tokens…
Instructions following
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.46sResponse Time (max)1.68sResponse Time (total)2.91sA test is fully passed only if every run passed for that test.…
1.46sResponse Time (avg)…
939Total Input Tokens…
114Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.36sResponse Time (max)41.83sResponse Time (total)68.73sA test is fully passed only if every run passed for that test.…
34.36sResponse Time (avg)…
675Total Input Tokens…
13,483Output Tokens…
0Reasoning Tokens…
Puzzle Solving
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.46sResponse Time (max)3.72sResponse Time (total)7.38sA test is fully passed only if every run passed for that test.…
2.46sResponse Time (avg)…
939Total Input Tokens…
597Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
7.7Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
66.7%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)27.94sResponse Time (max)45.06sResponse Time (total)55.89sA test is fully passed only if every run passed for that test.…
27.94sResponse Time (avg)…
390Total Input Tokens…
15,567Output Tokens…
0Reasoning Tokens…
Tool Calling
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.74sResponse Time (max)4.74sResponse Time (total)4.74sA test is fully passed only if every run passed for that test.…
4.74sResponse Time (avg)…
15,339Total Input Tokens…
372Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)78.83sResponse Time (max)78.83sResponse Time (total)78.83sA test is fully passed only if every run passed for that test.…
78.83sResponse Time (avg)…
7,410Total Input Tokens…
10,370Output Tokens…
0Reasoning Tokens…
Trivia
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.46sResponse Time (max)1.46sResponse Time (total)1.46sA test is fully passed only if every run passed for that test.…
1.46sResponse Time (avg)…
273Total Input Tokens…
3Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.71sResponse Time (max)47.71sResponse Time (total)47.71sA test is fully passed only if every run passed for that test.…