Anti-AI Tricks
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)4.75sResponse Time (max)7.62sResponse Time (total)19.00s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)30.53sResponse Time (max)30.53sResponse Time (total)30.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.16sResponse Time (max)26.55sResponse Time (total)46.33s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.18sResponse Time (max)4.46sResponse Time (total)8.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.33sResponse Time (max)17.33sResponse Time (total)17.33s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)4.87sResponse Time (max)6.30sResponse Time (total)14.62s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)35.61sResponse Time (max)35.61sResponse Time (total)35.61s
Combined
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)53.14sResponse Time (max)53.14sResponse Time (total)53.14s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.93sResponse Time (max)5.03sResponse Time (total)9.86s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)24.14sResponse Time (max)45.83sResponse Time (total)72.43s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.30sResponse Time (max)6.00sResponse Time (total)8.59s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)10.19sResponse Time (max)14.92sResponse Time (total)20.37s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.31sResponse Time (max)6.31sResponse Time (total)6.31s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.API error: 8Wrong answer: 2Response Time (avg)15.25sResponse Time (max)43.55sResponse Time (total)182.96s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)11.69sResponse Time (max)19.37sResponse Time (total)35.08s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.95sResponse Time (max)34.95sResponse Time (total)34.95s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.95sResponse Time (max)15.40sResponse Time (total)29.90s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)22.08sResponse Time (max)43.55sResponse Time (total)66.23s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)3.40sResponse Time (max)3.40sResponse Time (total)3.40s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.API error: 2Response Time (avg)7.52sResponse Time (max)7.52sResponse Time (total)7.52s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.87sResponse Time (max)5.87sResponse Time (total)5.87s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.97sResponse Time (max)7.48sResponse Time (total)15.89s
Coding
: 6.6 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)19.08sResponse Time (max)30.81sResponse Time (total)38.16s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.01sResponse Time (max)10.01sResponse Time (total)10.01s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.64sResponse Time (max)29.16sResponse Time (total)43.28s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)8.58sResponse Time (max)9.48sResponse Time (total)25.74s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.15sResponse Time (max)15.94sResponse Time (total)20.30s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.26sResponse Time (max)8.26sResponse Time (total)8.26s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.38sResponse Time (max)2.38sResponse Time (total)2.38s
Anti-AI Tricks
: 6.9 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)2.68sResponse Time (max)3.09sResponse Time (total)8.04s
Coding
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)14.36sResponse Time (max)14.36sResponse Time (total)14.36s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.92sResponse Time (max)15.92sResponse Time (total)15.92s
Data parsing and extraction
: 7.1 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)9.34sResponse Time (max)16.71sResponse Time (total)18.68s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)11.12sResponse Time (max)29.11sResponse Time (total)33.35s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.68sResponse Time (max)2.03sResponse Time (total)3.36s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)1.93sResponse Time (max)1.97sResponse Time (total)3.87s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)3.39sResponse Time (max)3.39sResponse Time (total)3.39s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.00sResponse Time (max)11.53sResponse Time (total)39.99s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)47.38sResponse Time (max)47.38sResponse Time (total)47.38s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.36sResponse Time (max)26.57sResponse Time (total)34.71s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)128.15sResponse Time (max)309.02sResponse Time (total)384.46s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.60sResponse Time (max)14.49sResponse Time (total)23.20s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.19sResponse Time (max)11.19sResponse Time (total)11.19s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)36.98sResponse Time (max)36.98sResponse Time (total)36.98s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)39.68sResponse Time (max)47.10sResponse Time (total)79.37s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)21.74sResponse Time (max)21.74sResponse Time (total)21.74s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.60sResponse Time (max)3.92sResponse Time (total)7.19s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.00sResponse Time (max)4.69sResponse Time (total)8.99s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.63sResponse Time (max)2.77sResponse Time (total)5.27s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.78sResponse Time (max)22.78sResponse Time (total)22.78s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.50sResponse Time (max)2.50sResponse Time (total)2.50s
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)3.81sResponse Time (max)6.85sResponse Time (total)15.23s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.17sResponse Time (max)15.17sResponse Time (total)15.17s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.49sResponse Time (max)14.02sResponse Time (total)16.98s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.33sResponse Time (max)2.94sResponse Time (total)6.99s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.82sResponse Time (max)2.92sResponse Time (total)5.65s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.02sResponse Time (max)6.02sResponse Time (total)6.02s
Anti-AI Tricks
: 6.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.19sResponse Time (max)2.04sResponse Time (total)4.75s
Coding
: 4.0 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)1.30sResponse Time (max)1.30sResponse Time (total)1.30s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.70sResponse Time (max)3.70sResponse Time (total)3.70s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)979msResponse Time (max)1.02sResponse Time (total)1.96s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)925msResponse Time (max)1.16sResponse Time (total)2.77s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)987msResponse Time (max)1.13sResponse Time (total)1.97s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)2.83sResponse Time (max)2.83sResponse Time (total)2.83s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Coding
: 4.7 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)1.39sResponse Time (max)1.39sResponse Time (total)1.39s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.81sResponse Time (max)3.81sResponse Time (total)3.81s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.04sResponse Time (max)1.05sResponse Time (total)2.08s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)927msResponse Time (max)1.17sResponse Time (total)2.78s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.03sResponse Time (max)1.17sResponse Time (total)2.07s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)2.79sResponse Time (max)2.79sResponse Time (total)2.79s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 3API error: 1Response Time (avg)705msResponse Time (max)975msResponse Time (total)2.12s
Coding
: 7.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.93sResponse Time (max)2.93sResponse Time (total)2.93s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)4.32sResponse Time (max)4.32sResponse Time (total)4.32s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.37sResponse Time (max)5.76sResponse Time (total)6.73s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)5.50sResponse Time (max)15.42sResponse Time (total)16.50s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)683msResponse Time (max)691msResponse Time (total)1.37s
Puzzle Solving
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)891msResponse Time (max)1.21sResponse Time (total)1.78s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.54sResponse Time (max)7.54sResponse Time (total)7.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Coding
: 2.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.01sResponse Time (max)2.01sResponse Time (total)2.01s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)646msResponse Time (max)658msResponse Time (total)1.29s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)371msResponse Time (max)419msResponse Time (total)1.11s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)439msResponse Time (max)448msResponse Time (total)878ms
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)650msResponse Time (max)843msResponse Time (total)1.30s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)1.93sResponse Time (max)1.93sResponse Time (total)1.93s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 6.4 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)1.20sResponse Time (max)1.48sResponse Time (total)3.59s
Coding
: 3.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)38.09sResponse Time (max)38.09sResponse Time (total)38.09s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.72sResponse Time (max)3.88sResponse Time (total)5.43s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)56.67sResponse Time (max)147.45sResponse Time (total)170.02s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Puzzle Solving
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)1.40sResponse Time (max)1.57sResponse Time (total)2.79s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)584msResponse Time (max)772msResponse Time (total)1.75s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 3.8 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.42sResponse Time (max)2.21sResponse Time (total)2.84s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)489msResponse Time (max)513msResponse Time (total)1.47s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 3.3 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)471msResponse Time (max)872msResponse Time (total)1.41s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)714msResponse Time (max)987msResponse Time (total)1.43s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)287msResponse Time (max)334msResponse Time (total)860ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)752msResponse Time (max)1.22sResponse Time (total)1.50s
Puzzle Solving
: 3.8 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)1.78sResponse Time (max)3.15sResponse Time (total)5.34s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)11.21sResponse Time (max)11.21sResponse Time (total)11.21s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)35.34sResponse Time (max)35.34sResponse Time (total)35.34s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.48sResponse Time (max)12.71sResponse Time (total)16.96s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.95sResponse Time (max)7.65sResponse Time (total)14.84s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.45sResponse Time (max)1.45sResponse Time (total)1.45s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.52sResponse Time (max)8.19sResponse Time (total)11.04s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)18.80sResponse Time (max)18.80sResponse Time (total)18.80s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.06sResponse Time (max)1.06sResponse Time (total)1.06s
Coding
: 2.7 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)4.56sResponse Time (max)4.56sResponse Time (total)4.56s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)35.84sResponse Time (max)35.84sResponse Time (total)35.84s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)2.85sResponse Time (max)2.85sResponse Time (total)2.85s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)17.61sResponse Time (max)25.68sResponse Time (total)52.82s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)12.98sResponse Time (max)23.51sResponse Time (total)25.95s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)33.76sResponse Time (max)33.76sResponse Time (total)33.76s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.71sResponse Time (max)2.71sResponse Time (total)2.71s
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.71sResponse Time (max)3.79sResponse Time (total)6.84s
Coding
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)5.39sResponse Time (max)5.69sResponse Time (total)10.78s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)5.91sResponse Time (max)5.91sResponse Time (total)5.91s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)847msResponse Time (max)1.09sResponse Time (total)1.69s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)464msResponse Time (max)622msResponse Time (total)1.39s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)514msResponse Time (max)582msResponse Time (total)1.03s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.32sResponse Time (max)2.32sResponse Time (total)2.32s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)1.88sResponse Time (max)1.88sResponse Time (total)1.88s
Data parsing and extraction
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)575msResponse Time (max)583msResponse Time (total)1.15s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)357msResponse Time (max)463msResponse Time (total)1.07s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)499msResponse Time (max)499msResponse Time (total)499ms
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.17sResponse Time (max)2.17sResponse Time (total)2.17s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)306msResponse Time (max)306msResponse Time (total)306ms
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.85sResponse Time (max)4.45sResponse Time (total)7.40s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)14.84sResponse Time (max)26.13sResponse Time (total)29.68s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.25sResponse Time (max)3.02sResponse Time (total)4.51s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.22sResponse Time (max)4.68sResponse Time (total)9.67s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.09sResponse Time (max)2.09sResponse Time (total)2.09s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.84sResponse Time (max)4.45sResponse Time (total)5.68s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.25sResponse Time (max)1.25sResponse Time (total)1.25s
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.28sResponse Time (max)2.09sResponse Time (total)5.13s
Coding
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)3.83sResponse Time (max)7.07sResponse Time (total)7.66s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)30.53sResponse Time (max)30.53sResponse Time (total)30.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.70sResponse Time (max)2.21sResponse Time (total)3.41s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.49sResponse Time (max)4.23sResponse Time (total)7.48s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)690msResponse Time (max)878msResponse Time (total)1.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)57.10sResponse Time (max)57.10sResponse Time (total)57.10s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)778msResponse Time (max)778msResponse Time (total)778ms
Anti-AI Tricks
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)5.51sResponse Time (max)6.59sResponse Time (total)11.02s
Coding
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.35sResponse Time (max)5.57sResponse Time (total)6.70s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)3.22sResponse Time (max)3.22sResponse Time (total)3.22s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.82sResponse Time (max)4.82sResponse Time (total)4.82s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)744msResponse Time (max)744msResponse Time (total)744ms
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.59sResponse Time (max)1.59sResponse Time (total)1.59s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)888msResponse Time (max)888msResponse Time (total)888ms
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.05sResponse Time (max)7.05sResponse Time (total)7.05s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)692msResponse Time (max)692msResponse Time (total)692ms
A test is fully passed only if every run passed for that test.Wrong answer: 12Response Time (avg)3.74sResponse Time (max)27.18sResponse Time (total)74.71s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.32sResponse Time (max)3.89sResponse Time (total)5.30s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)993msResponse Time (max)1.29sResponse Time (total)1.99s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.22sResponse Time (max)6.22sResponse Time (total)6.22s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.57sResponse Time (max)1.83sResponse Time (total)3.14s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)905msResponse Time (max)1.10sResponse Time (total)2.71s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)803msResponse Time (max)803msResponse Time (total)803ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.81sResponse Time (max)13.73sResponse Time (total)17.61s
Puzzle Solving
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)10.89sResponse Time (max)27.18sResponse Time (total)32.68s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.67sResponse Time (max)3.67sResponse Time (total)3.67s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)588msResponse Time (max)588msResponse Time (total)588ms
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)6.55sResponse Time (max)9.41sResponse Time (total)26.19s
Coding
: 4.2 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)10.57sResponse Time (max)10.57sResponse Time (total)10.57s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)23.53sResponse Time (max)23.53sResponse Time (total)23.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.37sResponse Time (max)1.37sResponse Time (total)2.73s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.04sResponse Time (max)1.08sResponse Time (total)3.11s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.36sResponse Time (max)9.81sResponse Time (total)10.73s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)25.72sResponse Time (max)25.72sResponse Time (total)25.72s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 14Did not follow instructions: 1Response Time (avg)1.85sResponse Time (max)7.58sResponse Time (total)24.00s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.34sResponse Time (max)1.83sResponse Time (total)2.67s
Coding
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.05sResponse Time (max)2.55sResponse Time (total)4.10s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.58sResponse Time (max)7.58sResponse Time (total)7.58s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)637msResponse Time (max)637msResponse Time (total)637ms
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)909msResponse Time (max)909msResponse Time (total)909ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.11sResponse Time (total)1.11s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.51sResponse Time (max)2.51sResponse Time (total)2.51s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)794msResponse Time (max)794msResponse Time (total)794ms
A test is fully passed only if every run passed for that test.Wrong answer: 13Did not follow instructions: 2Response Time (avg)5.47sResponse Time (max)16.45sResponse Time (total)109.43s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.46sResponse Time (max)9.94sResponse Time (total)17.83s
Coding
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.02sResponse Time (max)3.05sResponse Time (total)6.04s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)16.45sResponse Time (max)16.45sResponse Time (total)16.45s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.92sResponse Time (max)13.23sResponse Time (total)15.84s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)6.23sResponse Time (max)14.38sResponse Time (total)18.70s
General Intelligence
: 4.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)950msResponse Time (max)950msResponse Time (total)950ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)804msResponse Time (max)921msResponse Time (total)1.61s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)16.00sResponse Time (max)16.00sResponse Time (total)16.00s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.94sResponse Time (max)8.94sResponse Time (total)8.94s
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.19sResponse Time (max)6.85sResponse Time (total)8.74s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.74sResponse Time (max)5.52sResponse Time (total)7.47s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.36sResponse Time (max)2.36sResponse Time (total)2.36s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)1.01sResponse Time (max)1.18sResponse Time (total)2.03s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)756msResponse Time (max)877msResponse Time (total)2.27s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.86sResponse Time (max)6.86sResponse Time (total)6.86s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)751msResponse Time (max)821msResponse Time (total)1.50s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)2.43sResponse Time (total)2.43s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.89sResponse Time (max)3.89sResponse Time (total)3.89s
A test is fully passed only if every run passed for that test.Wrong answer: 14Did not follow instructions: 1Response Time (avg)629msResponse Time (max)1.72sResponse Time (total)12.59s…
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)395msResponse Time (max)769msResponse Time (total)1.58s
Coding
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.03sResponse Time (max)1.28sResponse Time (total)2.07s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.72sResponse Time (max)1.72sResponse Time (total)1.72s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)822msResponse Time (max)1.08sResponse Time (total)1.64s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)367msResponse Time (max)388msResponse Time (total)1.10s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)729msResponse Time (max)729msResponse Time (total)729ms
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)380msResponse Time (max)380msResponse Time (total)759ms
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.40sResponse Time (max)1.40sResponse Time (total)1.40s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)397msResponse Time (max)397msResponse Time (total)397ms
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)20.18sResponse Time (max)26.54sResponse Time (total)80.73s
Coding
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)24.47sResponse Time (max)24.90sResponse Time (total)48.94s
Combined
: 4.5 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)111.96sResponse Time (max)111.96sResponse Time (total)111.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.79sResponse Time (max)23.85sResponse Time (total)47.57s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)19.73sResponse Time (max)27.66sResponse Time (total)59.18s
General Intelligence
: 4.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)23.74sResponse Time (max)23.74sResponse Time (total)23.74s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)17.54sResponse Time (max)18.51sResponse Time (total)35.08s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)77.93sResponse Time (max)77.93sResponse Time (total)77.93s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.07sResponse Time (max)3.07sResponse Time (total)3.07s
Coding
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)1.17sResponse Time (max)1.69sResponse Time (total)2.34s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.28sResponse Time (max)4.28sResponse Time (total)4.28s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)81.80sResponse Time (max)81.80sResponse Time (total)81.80s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)638msResponse Time (max)638msResponse Time (total)638ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.49sResponse Time (max)13.67sResponse Time (total)14.99s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.64sResponse Time (max)2.64sResponse Time (total)2.64s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)399msResponse Time (max)399msResponse Time (total)399ms