8.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
8.4Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
Rank
#21
#16
Reliability
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
9.9First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
Consistency
8.8Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
9.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
Tests Correct
A test is fully passed only if every run passed for that test.API error: 3Wrong answer: 3Response Time (avg)9.05sResponse Time (max)64.36sResponse Time (total)153.86sA test is fully passed only if every run passed for that test.…
A test is fully passed only if every run passed for that test.Wrong answer: 4Timed out: 1Response Time (avg)36.84sResponse Time (max)178.04sResponse Time (total)736.86sA test is fully passed only if every run passed for that test.…
Attempt pass rate
76.7%Attempt pass rate = passed attempts / total attempts across runs.…
80.0%Attempt pass rate = passed attempts / total attempts across runs.…
Flaky tests
3Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
2Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
Total Runs
60Total Runs…
60Total Runs…
Cost per result
6.594Shows the average cost per correct benchmark answer in cents (lower is better).…
1.324Shows the average cost per correct benchmark answer in cents (lower is better).…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.53sResponse Time (max)3.43sResponse Time (total)10.12sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.58sResponse Time (max)12.75sResponse Time (total)34.33sA test is fully passed only if every run passed for that test.…
8.2Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
6.7Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
83.3%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)39.62sResponse Time (max)64.36sResponse Time (total)79.24sA test is fully passed only if every run passed for that test.…
6.5Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
5.9Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
66.7%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)122.40sResponse Time (max)178.04sResponse Time (total)244.81sA test is fully passed only if every run passed for that test.…
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0msA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.24sResponse Time (max)65.24sResponse Time (total)65.24sA test is fully passed only if every run passed for that test.…
6.5Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
50.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)8.10sResponse Time (max)8.10sResponse Time (total)8.10sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.75sResponse Time (max)23.18sResponse Time (total)43.49sA test is fully passed only if every run passed for that test.…
7.6Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
77.8%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.64sResponse Time (max)14.00sResponse Time (total)31.92sA test is fully passed only if every run passed for that test.…
3.6Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
22.2%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)45.35sResponse Time (max)88.89sResponse Time (total)136.04sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.46sResponse Time (max)3.46sResponse Time (total)3.46sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)25.48sResponse Time (max)25.48sResponse Time (total)25.48sA test is fully passed only if every run passed for that test.…
9.8Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.38sResponse Time (max)3.40sResponse Time (total)6.76sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.13sResponse Time (max)17.18sResponse Time (total)32.26sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.13sResponse Time (max)3.33sResponse Time (total)9.39sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.38sResponse Time (max)19.42sResponse Time (total)49.14sA test is fully passed only if every run passed for that test.…
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0msA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.02sResponse Time (max)15.02sResponse Time (total)15.02sA test is fully passed only if every run passed for that test.…
2.8Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
1.6Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
33.3%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.87sResponse Time (max)4.87sResponse Time (total)4.87sA test is fully passed only if every run passed for that test.…
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)91.07sResponse Time (max)91.07sResponse Time (total)91.07sA test is fully passed only if every run passed for that test.…