A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)13.83sResponse Time (max)33.37sResponse Time (total)276.53s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.36sResponse Time (max)8.75sResponse Time (total)25.44s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.98sResponse Time (max)32.31sResponse Time (total)45.96s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.60sResponse Time (max)19.60sResponse Time (total)19.60s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.80sResponse Time (max)10.25sResponse Time (total)17.60s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)24.94sResponse Time (max)29.00sResponse Time (total)74.81s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.70sResponse Time (max)11.70sResponse Time (total)11.70s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.46sResponse Time (max)10.17sResponse Time (total)14.92s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.84sResponse Time (max)11.71sResponse Time (total)26.51s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.63sResponse Time (max)6.63sResponse Time (total)6.63s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)33.37sResponse Time (max)33.37sResponse Time (total)33.37s
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.95sResponse Time (max)5.68sResponse Time (total)15.80s
Coding
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)65.07sResponse Time (max)105.80sResponse Time (total)130.13s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.40sResponse Time (max)17.40sResponse Time (total)17.40s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.17sResponse Time (max)5.02sResponse Time (total)8.34s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.26sResponse Time (max)4.46sResponse Time (total)8.52s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.22sResponse Time (max)11.63sResponse Time (total)18.66s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)13.68sResponse Time (max)13.68sResponse Time (total)13.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)63.48sResponse Time (max)63.48sResponse Time (total)63.48s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)54.83sResponse Time (max)95.88sResponse Time (total)109.65s
Combined
: 6.9 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)15.06sResponse Time (max)15.06sResponse Time (total)15.06s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.60sResponse Time (max)9.92sResponse Time (total)19.19s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)38.15sResponse Time (max)67.08sResponse Time (total)114.45s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.09sResponse Time (max)11.09sResponse Time (total)11.09s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.74sResponse Time (max)5.23sResponse Time (total)7.47s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.24sResponse Time (max)16.95sResponse Time (total)30.72s
Tool Calling
: 7.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)12.53sResponse Time (max)12.53sResponse Time (total)12.53s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)40.96sResponse Time (max)40.96sResponse Time (total)40.96s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 2Response Time (avg)6.13sResponse Time (max)18.33sResponse Time (total)122.61s…
Coding
: 6.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.52sResponse Time (max)11.72sResponse Time (total)21.03s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.96sResponse Time (max)11.96sResponse Time (total)11.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.21sResponse Time (max)2.52sResponse Time (total)4.42s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)13.01sResponse Time (max)18.33sResponse Time (total)39.04s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.51sResponse Time (max)4.60sResponse Time (total)7.01s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.99sResponse Time (max)3.16sResponse Time (total)8.97s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.36sResponse Time (max)8.36sResponse Time (total)8.36s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.38sResponse Time (max)4.38sResponse Time (total)4.38s
A test is fully passed only if every run passed for that test.API error: 3Wrong answer: 3Response Time (avg)9.05sResponse Time (max)26.24sResponse Time (total)90.53s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.99sResponse Time (max)26.24sResponse Time (total)29.99s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 2Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.37sResponse Time (max)10.37sResponse Time (total)10.37s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.84sResponse Time (max)10.84sResponse Time (total)10.84s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)7.01sResponse Time (max)7.01sResponse Time (total)7.01s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.34sResponse Time (max)9.34sResponse Time (total)9.34s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.26sResponse Time (max)3.26sResponse Time (total)3.26s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.88sResponse Time (max)4.23sResponse Time (total)7.77s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.96sResponse Time (max)11.96sResponse Time (total)11.96s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.62sResponse Time (max)18.61sResponse Time (total)50.50s
Coding
: 6.6 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)165.39sResponse Time (max)168.22sResponse Time (total)330.78s
Combined
: 7.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)83.07sResponse Time (max)83.07sResponse Time (total)83.07s
Data parsing and extraction
: 3.5 A test is fully passed only if every run passed for that test.No answer: 2Response Time (avg)37.30sResponse Time (max)54.01sResponse Time (total)74.60s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)73.38sResponse Time (max)101.55sResponse Time (total)220.15s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.96sResponse Time (max)47.48sResponse Time (total)75.92s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)61.14sResponse Time (max)97.76sResponse Time (total)183.42s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.88sResponse Time (max)16.88sResponse Time (total)16.88s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)80.99sResponse Time (max)80.99sResponse Time (total)80.99s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.13sResponse Time (max)34.96sResponse Time (total)84.53s
Coding
: 6.5 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)244.54sResponse Time (max)409.98sResponse Time (total)489.08s
Combined
: 4.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)75.34sResponse Time (max)75.34sResponse Time (total)75.34s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)59.33sResponse Time (max)97.12sResponse Time (total)118.65s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)88.34sResponse Time (max)106.00sResponse Time (total)265.01s
General Intelligence
: 2.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)30.30sResponse Time (max)30.30sResponse Time (total)30.30s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.45sResponse Time (max)43.36sResponse Time (total)48.89s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)33.13sResponse Time (max)64.81sResponse Time (total)99.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.65sResponse Time (max)4.65sResponse Time (total)4.65s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)177.35sResponse Time (max)177.35sResponse Time (total)177.35s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)15.57sResponse Time (max)95.48sResponse Time (total)311.47s…
Anti-AI Tricks
: 8.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.30sResponse Time (max)15.56sResponse Time (total)25.21s
Coding
: 6.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)54.56sResponse Time (max)92.88sResponse Time (total)109.12s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)28.44sResponse Time (max)28.44sResponse Time (total)28.44s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.06sResponse Time (max)5.06sResponse Time (total)8.11s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)37.34sResponse Time (max)95.48sResponse Time (total)112.01s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.62sResponse Time (max)2.78sResponse Time (total)5.24s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.18sResponse Time (max)4.05sResponse Time (total)9.54s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.20sResponse Time (max)6.20sResponse Time (total)6.20s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.76sResponse Time (max)2.76sResponse Time (total)2.76s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)6.82sResponse Time (max)38.52sResponse Time (total)136.34s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.40sResponse Time (max)4.78sResponse Time (total)13.59s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.05sResponse Time (max)8.97sResponse Time (total)16.09s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.12sResponse Time (max)9.12sResponse Time (total)9.12s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.05sResponse Time (max)3.33sResponse Time (total)6.10s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)17.78sResponse Time (max)38.52sResponse Time (total)53.33s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.51sResponse Time (max)6.55sResponse Time (total)11.02s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.10sResponse Time (max)5.04sResponse Time (total)12.31s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.68sResponse Time (max)4.68sResponse Time (total)4.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.89sResponse Time (max)6.89sResponse Time (total)6.89s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)9.65sResponse Time (max)35.08sResponse Time (total)38.62s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.64sResponse Time (max)12.69sResponse Time (total)21.28s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.06sResponse Time (max)9.06sResponse Time (total)9.06s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)3.35sResponse Time (total)5.49s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)48.27sResponse Time (max)97.10sResponse Time (total)144.81s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.85sResponse Time (max)6.85sResponse Time (total)6.85s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.83sResponse Time (max)2.21sResponse Time (total)3.65s
Puzzle Solving
: 5.7 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)6.19sResponse Time (max)12.51sResponse Time (total)18.56s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.16sResponse Time (max)4.16sResponse Time (total)4.16s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)113.98sResponse Time (max)113.98sResponse Time (total)113.98s
A test is fully passed only if every run passed for that test.Wrong answer: 7No answer: 1Response Time (avg)16.06sResponse Time (max)124.75sResponse Time (total)321.11s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.02sResponse Time (max)12.52sResponse Time (total)16.10s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.43sResponse Time (max)12.69sResponse Time (total)18.86s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.98sResponse Time (max)7.98sResponse Time (total)7.98s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.29sResponse Time (max)3.15sResponse Time (total)4.58s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)43.31sResponse Time (max)72.27sResponse Time (total)129.92s
General Intelligence
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.00sResponse Time (max)7.00sResponse Time (total)7.00s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.58sResponse Time (max)1.80sResponse Time (total)3.16s
Puzzle Solving
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.84sResponse Time (max)3.42sResponse Time (total)5.52s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.25sResponse Time (max)3.25sResponse Time (total)3.25s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)124.75sResponse Time (max)124.75sResponse Time (total)124.75s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.86sResponse Time (max)3.92sResponse Time (total)11.45s
Coding
: 7.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)94.21sResponse Time (max)136.29sResponse Time (total)188.41s
Combined
: 4.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)64.71sResponse Time (max)64.71sResponse Time (total)64.71s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.20sResponse Time (max)17.44sResponse Time (total)34.40s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)8.82sResponse Time (max)14.48sResponse Time (total)26.47s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.92sResponse Time (max)4.92sResponse Time (total)4.92s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.36sResponse Time (max)4.35sResponse Time (total)6.72s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.19sResponse Time (max)8.19sResponse Time (total)8.19s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)82.71sResponse Time (max)82.71sResponse Time (total)82.71s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)6.73sResponse Time (max)9.79sResponse Time (total)13.46s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.84sResponse Time (max)23.84sResponse Time (total)23.84s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.43sResponse Time (max)3.43sResponse Time (total)3.43s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)2.53sResponse Time (max)2.54sResponse Time (total)5.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.11sResponse Time (max)4.11sResponse Time (total)4.11s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.67sResponse Time (max)4.67sResponse Time (total)4.67s
Coding
: 3.5 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)125.80sResponse Time (max)125.80sResponse Time (total)125.80s
Combined
: 4.5 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)60.39sResponse Time (max)60.39sResponse Time (total)60.39s
Data parsing and extraction
: 4.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)7.48sResponse Time (max)7.48sResponse Time (total)7.48s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)237.27sResponse Time (max)237.27sResponse Time (total)237.27s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)11.21sResponse Time (max)17.37sResponse Time (total)22.43s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.35sResponse Time (max)15.35sResponse Time (total)15.35s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)80.79sResponse Time (max)80.79sResponse Time (total)80.79s
A test is fully passed only if every run passed for that test.Wrong answer: 8Response Time (avg)43.65sResponse Time (max)189.38sResponse Time (total)872.90s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.84sResponse Time (max)15.11sResponse Time (total)43.36s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)137.55sResponse Time (max)189.38sResponse Time (total)275.10s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)92.41sResponse Time (max)92.41sResponse Time (total)92.41s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)38.32sResponse Time (max)41.70sResponse Time (total)76.63s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)53.10sResponse Time (max)90.70sResponse Time (total)159.30s
General Intelligence
: 4.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.30sResponse Time (max)25.30sResponse Time (total)25.30s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.25sResponse Time (max)21.65sResponse Time (total)40.50s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.67sResponse Time (max)24.83sResponse Time (total)53.02s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.72sResponse Time (max)14.72sResponse Time (total)14.72s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)92.57sResponse Time (max)92.57sResponse Time (total)92.57s
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.98sResponse Time (max)6.44sResponse Time (total)59.59s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.52sResponse Time (max)5.40sResponse Time (total)10.08s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.54sResponse Time (max)5.59sResponse Time (total)11.08s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.44sResponse Time (max)6.44sResponse Time (total)6.44s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.81sResponse Time (max)2.32sResponse Time (total)3.63s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.39sResponse Time (max)4.44sResponse Time (total)10.16s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.27sResponse Time (max)2.27sResponse Time (total)2.27s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.86sResponse Time (max)2.10sResponse Time (total)3.73s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.35sResponse Time (max)3.25sResponse Time (total)7.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.27sResponse Time (max)3.27sResponse Time (total)3.27s
Trivia
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.88sResponse Time (max)1.88sResponse Time (total)1.88s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.82sResponse Time (max)7.69sResponse Time (total)19.26s
Coding
: 7.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)53.92sResponse Time (max)95.57sResponse Time (total)107.83s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.88sResponse Time (max)13.88sResponse Time (total)13.88s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.19sResponse Time (max)6.42sResponse Time (total)12.38s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)71.07sResponse Time (max)194.23sResponse Time (total)213.22s
General Intelligence
: 6.1 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.05sResponse Time (max)10.05sResponse Time (total)10.05s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.38sResponse Time (max)5.70sResponse Time (total)10.77s
Puzzle Solving
: 8.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)5.23sResponse Time (max)7.26sResponse Time (total)15.69s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.84sResponse Time (max)9.84sResponse Time (total)9.84s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)40.17sResponse Time (max)40.17sResponse Time (total)40.17s
A test is fully passed only if every run passed for that test.Wrong answer: 4Timed out: 2Response Time (avg)67.58sResponse Time (max)266.69sResponse Time (total)878.57s…
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)45.78sResponse Time (max)81.20sResponse Time (total)91.57s
Coding
: 7.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)193.80sResponse Time (max)266.69sResponse Time (total)387.60s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.85sResponse Time (max)46.85sResponse Time (total)46.85s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.91sResponse Time (max)46.91sResponse Time (total)46.91s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)17.50sResponse Time (max)17.50sResponse Time (total)17.50s
General Intelligence
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)79.86sResponse Time (max)79.86sResponse Time (total)79.86s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.93sResponse Time (max)31.93sResponse Time (total)31.93s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)32.50sResponse Time (max)49.12sResponse Time (total)65.01s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.54sResponse Time (max)7.54sResponse Time (total)7.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)103.81sResponse Time (max)103.81sResponse Time (total)103.81s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.31sResponse Time (max)14.20sResponse Time (total)33.24s
Coding
: 4.6 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)145.56sResponse Time (max)172.60sResponse Time (total)291.12s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)43.11sResponse Time (max)43.11sResponse Time (total)43.11s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.33sResponse Time (max)9.40sResponse Time (total)18.66s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)29.77sResponse Time (max)32.22sResponse Time (total)89.30s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.95sResponse Time (max)20.95sResponse Time (total)20.95s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.47sResponse Time (max)10.16sResponse Time (total)14.94s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)31.64sResponse Time (max)46.04sResponse Time (total)94.91s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)29.40sResponse Time (max)29.40sResponse Time (total)29.40s
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 1Response Time (avg)18.97sResponse Time (max)122.87sResponse Time (total)379.49s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.10sResponse Time (max)9.60sResponse Time (total)24.39s
Coding
: 5.1 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)51.92sResponse Time (max)78.01sResponse Time (total)103.85s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.28sResponse Time (max)20.28sResponse Time (total)20.28s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.65sResponse Time (max)10.35sResponse Time (total)19.31s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)14.65sResponse Time (max)26.85sResponse Time (total)43.95s
General Intelligence
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.88sResponse Time (max)9.88sResponse Time (total)9.88s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.05sResponse Time (max)6.94sResponse Time (total)12.10s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)6.29sResponse Time (max)8.18sResponse Time (total)18.87s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.00sResponse Time (max)4.00sResponse Time (total)4.00s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)122.87sResponse Time (max)122.87sResponse Time (total)122.87s
Anti-AI Tricks
: 7.3 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)51.38sResponse Time (max)85.28sResponse Time (total)102.75s
Coding
: 4.1 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)215.89sResponse Time (max)281.00sResponse Time (total)431.77s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)71.37sResponse Time (max)71.37sResponse Time (total)71.37s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)49.78sResponse Time (max)49.78sResponse Time (total)49.78s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)137.29sResponse Time (max)137.29sResponse Time (total)137.29s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)92.47sResponse Time (max)92.47sResponse Time (total)92.47s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.74sResponse Time (max)31.74sResponse Time (total)31.74s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)83.95sResponse Time (max)83.95sResponse Time (total)83.95s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.90sResponse Time (max)19.37sResponse Time (total)39.60s
Coding
: 4.1 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)201.68sResponse Time (max)201.68sResponse Time (total)201.68s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.95sResponse Time (max)34.95sResponse Time (total)34.95s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.95sResponse Time (max)15.40sResponse Time (total)29.90s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)29.59sResponse Time (max)43.55sResponse Time (total)88.77s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.54sResponse Time (max)11.67sResponse Time (total)15.07s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.34sResponse Time (max)7.52sResponse Time (total)19.03s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.87sResponse Time (max)5.87sResponse Time (total)5.87s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.51sResponse Time (max)47.51sResponse Time (total)47.51s
A test is fully passed only if every run passed for that test.Wrong answer: 10Response Time (avg)1.93sResponse Time (max)5.56sResponse Time (total)38.64s…
Anti-AI Tricks
: 6.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.31sResponse Time (max)2.08sResponse Time (total)5.25s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.52sResponse Time (max)2.05sResponse Time (total)3.04s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.56sResponse Time (max)5.56sResponse Time (total)5.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.18sResponse Time (max)1.24sResponse Time (total)2.37s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.31sResponse Time (max)1.39sResponse Time (total)3.92s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.41sResponse Time (max)3.41sResponse Time (total)3.41s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.15sResponse Time (max)1.19sResponse Time (total)2.31s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.29sResponse Time (max)1.56sResponse Time (total)3.87s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.90sResponse Time (max)3.90sResponse Time (total)3.90s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.01sResponse Time (max)5.01sResponse Time (total)5.01s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.66sResponse Time (max)25.06sResponse Time (total)47.32s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)89.47sResponse Time (max)99.85sResponse Time (total)178.93s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)28.96sResponse Time (max)28.96sResponse Time (total)28.96s
Data parsing and extraction
: 7.1 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)8.90sResponse Time (max)8.90sResponse Time (total)8.90s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.25sResponse Time (max)7.25sResponse Time (total)7.25s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.33sResponse Time (max)16.34sResponse Time (total)22.66s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.93sResponse Time (max)15.93sResponse Time (total)15.93s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)67.37sResponse Time (max)67.37sResponse Time (total)67.37s
A test is fully passed only if every run passed for that test.Wrong answer: 4Timed out: 1Response Time (avg)36.84sResponse Time (max)178.04sResponse Time (total)736.86s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.58sResponse Time (max)12.75sResponse Time (total)34.33s
Coding
: 6.5 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)122.40sResponse Time (max)178.04sResponse Time (total)244.81s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.24sResponse Time (max)65.24sResponse Time (total)65.24s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.75sResponse Time (max)23.18sResponse Time (total)43.49s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)45.35sResponse Time (max)88.89sResponse Time (total)136.04s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)25.48sResponse Time (max)25.48sResponse Time (total)25.48s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.13sResponse Time (max)17.18sResponse Time (total)32.26s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.38sResponse Time (max)19.42sResponse Time (total)49.14s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.02sResponse Time (max)15.02sResponse Time (total)15.02s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)91.07sResponse Time (max)91.07sResponse Time (total)91.07s
A test is fully passed only if every run passed for that test.Wrong answer: 5Did not follow instructions: 2Response Time (avg)36.67sResponse Time (max)168.71sResponse Time (total)733.46s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.99sResponse Time (max)48.33sResponse Time (total)71.98s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)107.65sResponse Time (max)140.81sResponse Time (total)215.30s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.67sResponse Time (max)37.67sResponse Time (total)37.67s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.07sResponse Time (max)12.19sResponse Time (total)18.14s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)88.74sResponse Time (max)168.71sResponse Time (total)266.21s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.26sResponse Time (max)9.02sResponse Time (total)14.52s
Puzzle Solving
: 9.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)10.23sResponse Time (max)11.54sResponse Time (total)30.68s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.38sResponse Time (max)12.38sResponse Time (total)12.38s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)48.32sResponse Time (max)48.32sResponse Time (total)48.32s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)30.74sResponse Time (max)38.31sResponse Time (total)61.49s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)88.15sResponse Time (max)88.15sResponse Time (total)88.15s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.58sResponse Time (max)13.87sResponse Time (total)25.16s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)44.63sResponse Time (max)82.55sResponse Time (total)133.89s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.59sResponse Time (max)13.66sResponse Time (total)23.18s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.64sResponse Time (max)18.64sResponse Time (total)18.64s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.99sResponse Time (max)9.99sResponse Time (total)9.99s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.02sResponse Time (max)8.79sResponse Time (total)24.07s
Coding
: 6.6 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)59.35sResponse Time (max)86.11sResponse Time (total)118.69s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.99sResponse Time (max)13.75sResponse Time (total)25.99s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)22.50sResponse Time (max)45.02sResponse Time (total)67.51s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.50sResponse Time (max)10.22sResponse Time (total)15.00s
Puzzle Solving
: 8.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.95sResponse Time (max)8.42sResponse Time (total)17.84s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)32.90sResponse Time (max)32.90sResponse Time (total)32.90s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)185.58sResponse Time (max)218.40sResponse Time (total)371.16s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.30sResponse Time (max)65.30sResponse Time (total)65.30s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.92sResponse Time (max)16.89sResponse Time (total)29.85s
Domain specific
: 5.5 A test is fully passed only if every run passed for that test.Timed out: 2Response Time (avg)233.13sResponse Time (max)431.03sResponse Time (total)466.26s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.14sResponse Time (max)6.80sResponse Time (total)12.27s
Puzzle Solving
: 7.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)49.91sResponse Time (max)128.09sResponse Time (total)149.74s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.91sResponse Time (max)11.91sResponse Time (total)11.91s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)100.80sResponse Time (max)100.80sResponse Time (total)100.80s
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 1Response Time (avg)1.45sResponse Time (max)2.95sResponse Time (total)29.00s…
Anti-AI Tricks
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.21sResponse Time (max)2.58sResponse Time (total)4.85s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.99sResponse Time (max)2.95sResponse Time (total)3.97s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.89sResponse Time (max)2.89sResponse Time (total)2.89s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.04sResponse Time (max)1.06sResponse Time (total)2.08s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.07sResponse Time (max)1.54sResponse Time (total)3.22s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.78sResponse Time (max)1.78sResponse Time (total)1.78s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.07sResponse Time (max)1.17sResponse Time (total)2.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)2.75sResponse Time (total)2.75s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)990msResponse Time (max)990msResponse Time (total)990ms