Anti-AI Tricks
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.19sResponse Time (max)2.73sResponse Time (total)4.76s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.87sResponse Time (max)2.87sResponse Time (total)2.87s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)564msResponse Time (max)564msResponse Time (total)564ms
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)857msResponse Time (max)955msResponse Time (total)1.71s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.86sResponse Time (max)2.70sResponse Time (total)3.71s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.28sResponse Time (max)2.28sResponse Time (total)2.28s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.82sResponse Time (max)1.82sResponse Time (total)1.82s
Anti-AI Tricks
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.10sResponse Time (max)6.15sResponse Time (total)8.41s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.29sResponse Time (max)22.52sResponse Time (total)24.58s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.46sResponse Time (max)2.03sResponse Time (total)2.93s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)7.45sResponse Time (max)12.46sResponse Time (total)22.35s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.51sResponse Time (max)3.51sResponse Time (total)3.51s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.86sResponse Time (max)2.83sResponse Time (total)3.73s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)414msResponse Time (max)414msResponse Time (total)414ms
A test is fully passed only if every run passed for that test.Wrong answer: 15Did not follow instructions: 1Response Time (avg)614msResponse Time (max)1.27sResponse Time (total)12.28s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)483msResponse Time (max)716msResponse Time (total)1.93s
Coding
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)831msResponse Time (max)969msResponse Time (total)1.66s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)606msResponse Time (max)606msResponse Time (total)606ms
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)667msResponse Time (max)819msResponse Time (total)1.33s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)534msResponse Time (max)733msResponse Time (total)1.60s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)551msResponse Time (max)622msResponse Time (total)1.10s
Puzzle Solving
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)535msResponse Time (max)642msResponse Time (total)1.60s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)548msResponse Time (max)548msResponse Time (total)548ms
Coding
: 2.7 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)4.56sResponse Time (max)4.56sResponse Time (total)4.56s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)35.84sResponse Time (max)35.84sResponse Time (total)35.84s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)2.85sResponse Time (max)2.85sResponse Time (total)2.85s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)17.61sResponse Time (max)25.68sResponse Time (total)52.82s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)12.98sResponse Time (max)23.51sResponse Time (total)25.95s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)33.76sResponse Time (max)33.76sResponse Time (total)33.76s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.71sResponse Time (max)2.71sResponse Time (total)2.71s
Coding
: 3.4 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)55.33sResponse Time (max)89.40sResponse Time (total)110.66s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)65.57sResponse Time (max)65.57sResponse Time (total)65.57s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)1.51sResponse Time (max)1.51sResponse Time (total)1.51s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)174.55sResponse Time (max)174.55sResponse Time (total)174.55s
General Intelligence
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.14sResponse Time (max)18.14sResponse Time (total)18.14s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.97sResponse Time (max)2.97sResponse Time (total)2.97s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.95sResponse Time (max)15.95sResponse Time (total)15.95s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)11.13sResponse Time (max)11.13sResponse Time (total)11.13s
A test is fully passed only if every run passed for that test.Wrong answer: 13Did not follow instructions: 3Response Time (avg)1.62sResponse Time (max)5.51sResponse Time (total)19.48s…
Coding
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.79sResponse Time (max)1.79sResponse Time (total)1.79s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.33sResponse Time (max)3.33sResponse Time (total)3.33s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)943msResponse Time (max)943msResponse Time (total)943ms
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.06sResponse Time (max)1.06sResponse Time (total)1.06s
Puzzle Solving
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.10sResponse Time (max)1.36sResponse Time (total)2.21s
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.51sResponse Time (max)5.51sResponse Time (total)5.51s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)731msResponse Time (max)731msResponse Time (total)731ms
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)6.55sResponse Time (max)9.41sResponse Time (total)26.19s
Coding
: 4.2 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)10.57sResponse Time (max)10.57sResponse Time (total)10.57s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)23.53sResponse Time (max)23.53sResponse Time (total)23.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.37sResponse Time (max)1.37sResponse Time (total)2.73s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.04sResponse Time (max)1.08sResponse Time (total)3.11s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.36sResponse Time (max)9.81sResponse Time (total)10.73s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)25.72sResponse Time (max)25.72sResponse Time (total)25.72s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 5.1 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)34.44sResponse Time (max)57.86sResponse Time (total)103.31s
Coding
: 2.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Timed out: 1Response Time (avg)135.61sResponse Time (max)135.61sResponse Time (total)135.61s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Timed out: 3Response Time (avg)137.75sResponse Time (max)202.61sResponse Time (total)413.24s
General Intelligence
: 2.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)226.38sResponse Time (max)226.38sResponse Time (total)226.38s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)5.75sResponse Time (max)5.75sResponse Time (total)5.75s
Puzzle Solving
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)32.27sResponse Time (max)47.31sResponse Time (total)96.80s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.31sResponse Time (max)4.31sResponse Time (total)4.31s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)177.02sResponse Time (max)177.02sResponse Time (total)177.02s
Anti-AI Tricks
: 3.3 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)471msResponse Time (max)872msResponse Time (total)1.41s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)714msResponse Time (max)987msResponse Time (total)1.43s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)287msResponse Time (max)334msResponse Time (total)860ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)752msResponse Time (max)1.22sResponse Time (total)1.50s
Puzzle Solving
: 3.8 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)1.78sResponse Time (max)3.15sResponse Time (total)5.34s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)584msResponse Time (max)772msResponse Time (total)1.75s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 3.8 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.42sResponse Time (max)2.21sResponse Time (total)2.84s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)489msResponse Time (max)513msResponse Time (total)1.47s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)1.88sResponse Time (max)1.88sResponse Time (total)1.88s
Data parsing and extraction
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)575msResponse Time (max)583msResponse Time (total)1.15s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)357msResponse Time (max)463msResponse Time (total)1.07s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)499msResponse Time (max)499msResponse Time (total)499ms
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.17sResponse Time (max)2.17sResponse Time (total)2.17s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)306msResponse Time (max)306msResponse Time (total)306ms