LFM2-24B-A2BLFM2-24B-A2BnoneArchived model: this model is no longer updated or tested on new tests.Release: 2026-02-24
Hy3 previewHy3 previewhighArchived model: this model is no longer updated or tested on new tests.Release: 2026-04-22
Metric
LFM2-24B-A2BLFM2-24B-A2BnoneArchived model: this model is no longer updated or tested on new tests.Release: 2026-02-24
Hy3 previewHy3 previewhighArchived model: this model is no longer updated or tested on new tests.Release: 2026-04-22
Score
4.2Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
8.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
Rank
#152
#22
Reliability
N/AFirst-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
Consistency
9.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
9.5Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
Tests Correct
A test is fully passed only if every run passed for that test.Wrong answer: 8API error: 4Did not follow instructions: 2Response Time (avg)811msResponse Time (max)2.88sResponse Time (total)11.35sA test is fully passed only if every run passed for that test.…
A test is fully passed only if every run passed for that test.Wrong answer: 3API error: 1Response Time (avg)56.77sResponse Time (max)149.94sResponse Time (total)851.49sA test is fully passed only if every run passed for that test.…
Attempt pass rate
18.8%Attempt pass rate = passed attempts / total attempts across runs.…
77.1%Attempt pass rate = passed attempts / total attempts across runs.…
Flaky tests
2Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
Total Runs
48Total Runs…
60Total Runs…
Cost per result
0.024Shows the average cost per correct benchmark answer in cents (lower is better).…
0.000Shows the average cost per correct benchmark answer in cents (lower is better).…
Total Cost
$0.001Total Cost…
$0.000Total Cost…
Input Price
$0.030 / 1MInput Price…
$0.066 / 1MInput Price…
Output Price
$0.120 / 1MOutput Price…
$0.260 / 1MOutput Price…
Output Tokens
1,185Output Tokens…
216,503Output Tokens…
Reasoning Tokens
0Reasoning Tokens…
0Reasoning Tokens…
Response Time (avg)
811msResponse Time (avg)…
56.77sResponse Time (avg)…
Response Time (max)
2.88sResponse Time (max)…
149.94sResponse Time (max)…
Response Time (total)
11.35sResponse Time (total)…
851.49sResponse Time (total)…
Top Models by Score
Score vs Total Cost
Response Time (avg)
Score vs Response Time (avg)
Total Output Tokens
Score vs Total Output Tokens
Category Breakdown
Anti-AI Tricks
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Output Tokens
Reasoning Tokens
LFM2-24B-A2BArchived model: this model is no longer updated or tested on new tests.
3.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
9.8Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)471msResponse Time (max)872msResponse Time (total)1.41sA test is fully passed only if every run passed for that test.…
471msResponse Time (avg)…
490Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
8.9Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.12sResponse Time (max)19.99sResponse Time (total)45.37sA test is fully passed only if every run passed for that test.…
15.12sResponse Time (avg)…
6,839Output Tokens…
0Reasoning Tokens…
Combined
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Output Tokens
Reasoning Tokens
LFM2-24B-A2BArchived model: this model is no longer updated or tested on new tests.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0msA test is fully passed only if every run passed for that test.…
0msResponse Time (avg)…
0Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)113.09sResponse Time (max)113.09sResponse Time (total)113.09sA test is fully passed only if every run passed for that test.…
113.09sResponse Time (avg)…
31,319Output Tokens…
0Reasoning Tokens…
Data parsing and extraction
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Output Tokens
Reasoning Tokens
LFM2-24B-A2BArchived model: this model is no longer updated or tested on new tests.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)714msResponse Time (max)987msResponse Time (total)1.43sA test is fully passed only if every run passed for that test.…
714msResponse Time (avg)…
219Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
6.5Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
50.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)12.11sResponse Time (max)12.11sResponse Time (total)12.11sA test is fully passed only if every run passed for that test.…
12.11sResponse Time (avg)…
4,323Output Tokens…
0Reasoning Tokens…
Domain specific
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Output Tokens
Reasoning Tokens
LFM2-24B-A2BArchived model: this model is no longer updated or tested on new tests.
5.9Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
55.6%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)287msResponse Time (max)334msResponse Time (total)860msA test is fully passed only if every run passed for that test.…
287msResponse Time (avg)…
30Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
5.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
44.4%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)109.04sResponse Time (max)149.94sResponse Time (total)327.11sA test is fully passed only if every run passed for that test.…
109.04sResponse Time (avg)…
87,559Output Tokens…
0Reasoning Tokens…
General Intelligence
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Output Tokens
Reasoning Tokens
LFM2-24B-A2BArchived model: this model is no longer updated or tested on new tests.
4.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)395msResponse Time (max)395msResponse Time (total)395msA test is fully passed only if every run passed for that test.…
395msResponse Time (avg)…
72Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
0.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
0.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0msA test is fully passed only if every run passed for that test.…
0msResponse Time (avg)…
0Output Tokens…
0Reasoning Tokens…
Instructions following
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Output Tokens
Reasoning Tokens
LFM2-24B-A2BArchived model: this model is no longer updated or tested on new tests.
6.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
50.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.09sResponse Time (max)1.90sResponse Time (total)2.18sA test is fully passed only if every run passed for that test.…
1.09sResponse Time (avg)…
60Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
9.9Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.02sResponse Time (max)41.83sResponse Time (total)68.04sA test is fully passed only if every run passed for that test.…
34.02sResponse Time (avg)…
13,331Output Tokens…
0Reasoning Tokens…
Puzzle Solving
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Output Tokens
Reasoning Tokens
LFM2-24B-A2BArchived model: this model is no longer updated or tested on new tests.
3.7Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.7Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
11.1%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 1Did not follow instructions: 1Wrong answer: 1Response Time (avg)1.69sResponse Time (max)2.88sResponse Time (total)5.08sA test is fully passed only if every run passed for that test.…
1.69sResponse Time (avg)…
314Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)29.74sResponse Time (max)45.06sResponse Time (total)59.48sA test is fully passed only if every run passed for that test.…
29.74sResponse Time (avg)…
15,503Output Tokens…
0Reasoning Tokens…
Tool Calling
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Output Tokens
Reasoning Tokens
LFM2-24B-A2BArchived model: this model is no longer updated or tested on new tests.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0msA test is fully passed only if every run passed for that test.…
0msResponse Time (avg)…
0Output Tokens…
0Reasoning Tokens…
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)78.83sResponse Time (max)78.83sResponse Time (total)78.83sA test is fully passed only if every run passed for that test.…
78.83sResponse Time (avg)…
10,370Output Tokens…
0Reasoning Tokens…
Coding
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Output Tokens
Reasoning Tokens
LFM2-24B-A2BArchived model: this model is no longer updated or tested on new tests.
-
-
-
-
-
-
-
-
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)99.76sResponse Time (max)99.76sResponse Time (total)99.76sA test is fully passed only if every run passed for that test.…
99.76sResponse Time (avg)…
38,167Output Tokens…
0Reasoning Tokens…
Trivia
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Output Tokens
Reasoning Tokens
LFM2-24B-A2BArchived model: this model is no longer updated or tested on new tests.
-
-
-
-
-
-
-
-
Hy3 previewArchived model: this model is no longer updated or tested on new tests.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.71sResponse Time (max)47.71sResponse Time (total)47.71sA test is fully passed only if every run passed for that test.…