Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)37.16sResponse Time (max)140.53sResponse Time (total)148.65s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)137.63sResponse Time (max)137.63sResponse Time (total)137.63s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)149.23sResponse Time (max)149.23sResponse Time (total)149.23s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.49sResponse Time (max)4.96sResponse Time (total)8.98s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)139.90sResponse Time (max)141.40sResponse Time (total)419.69s
Instructions following
: 7.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)23.26sResponse Time (max)43.87sResponse Time (total)46.51s
Puzzle Solving
: 5.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 2Response Time (avg)50.83sResponse Time (max)144.85sResponse Time (total)152.49s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.44sResponse Time (max)6.44sResponse Time (total)6.44s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)54.83sResponse Time (max)95.88sResponse Time (total)109.65s
Combined
: 6.9 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)15.06sResponse Time (max)15.06sResponse Time (total)15.06s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.60sResponse Time (max)9.92sResponse Time (total)19.19s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)38.15sResponse Time (max)67.08sResponse Time (total)114.45s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.09sResponse Time (max)11.09sResponse Time (total)11.09s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.74sResponse Time (max)5.23sResponse Time (total)7.47s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.24sResponse Time (max)16.95sResponse Time (total)30.72s
Tool Calling
: 7.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)12.53sResponse Time (max)12.53sResponse Time (total)12.53s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)40.96sResponse Time (max)40.96sResponse Time (total)40.96s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 3Response Time (avg)22.10sResponse Time (max)138.75sResponse Time (total)442.09s…
Anti-AI Tricks
: 8.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.05sResponse Time (max)6.69sResponse Time (total)16.20s
Coding
: 7.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)73.25sResponse Time (max)138.75sResponse Time (total)146.51s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.81sResponse Time (max)17.81sResponse Time (total)17.81s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)3.39sResponse Time (total)4.87s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)65.31sResponse Time (max)102.91sResponse Time (total)195.92s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.13sResponse Time (max)2.45sResponse Time (total)4.25s
Puzzle Solving
: 7.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)4.37sResponse Time (max)7.27sResponse Time (total)13.11s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)9.62sResponse Time (max)9.62sResponse Time (total)9.62s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)30.10sResponse Time (max)30.10sResponse Time (total)30.10s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.13sResponse Time (max)34.96sResponse Time (total)84.53s
Coding
: 6.5 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)244.54sResponse Time (max)409.98sResponse Time (total)489.08s
Combined
: 4.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)75.34sResponse Time (max)75.34sResponse Time (total)75.34s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)59.33sResponse Time (max)97.12sResponse Time (total)118.65s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)88.34sResponse Time (max)106.00sResponse Time (total)265.01s
General Intelligence
: 2.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)30.30sResponse Time (max)30.30sResponse Time (total)30.30s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.45sResponse Time (max)43.36sResponse Time (total)48.89s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)33.13sResponse Time (max)64.81sResponse Time (total)99.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.65sResponse Time (max)4.65sResponse Time (total)4.65s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)177.35sResponse Time (max)177.35sResponse Time (total)177.35s
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 2Response Time (avg)11.79sResponse Time (max)94.06sResponse Time (total)235.81s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.52sResponse Time (max)7.74sResponse Time (total)18.10s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)21.10sResponse Time (max)28.80sResponse Time (total)42.21s
Combined
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.13sResponse Time (max)24.13sResponse Time (total)24.13s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.54sResponse Time (max)3.33sResponse Time (total)5.08s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)38.18sResponse Time (max)94.06sResponse Time (total)114.53s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.88sResponse Time (max)2.61sResponse Time (total)3.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.71sResponse Time (max)7.71sResponse Time (total)7.71s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.81sResponse Time (max)4.81sResponse Time (total)4.81s
A test is fully passed only if every run passed for that test.Wrong answer: 6No answer: 3Response Time (avg)49.43sResponse Time (max)192.75sResponse Time (total)988.58s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.40sResponse Time (max)45.73sResponse Time (total)53.58s
Coding
: 3.7 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)126.82sResponse Time (max)192.75sResponse Time (total)253.65s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.01sResponse Time (max)13.01sResponse Time (total)13.01s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.72sResponse Time (max)24.97sResponse Time (total)29.43s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)149.64sResponse Time (max)163.21sResponse Time (total)448.91s
General Intelligence
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.17sResponse Time (max)4.17sResponse Time (total)4.17s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.52sResponse Time (max)1.89sResponse Time (total)3.03s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)10.22sResponse Time (max)23.65sResponse Time (total)30.66s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.79sResponse Time (max)2.79sResponse Time (total)2.79s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)149.34sResponse Time (max)149.34sResponse Time (total)149.34s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)185.58sResponse Time (max)218.40sResponse Time (total)371.16s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.30sResponse Time (max)65.30sResponse Time (total)65.30s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.92sResponse Time (max)16.89sResponse Time (total)29.85s
Domain specific
: 5.5 A test is fully passed only if every run passed for that test.Timed out: 2Response Time (avg)233.13sResponse Time (max)431.03sResponse Time (total)466.26s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.14sResponse Time (max)6.80sResponse Time (total)12.27s
Puzzle Solving
: 7.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)49.91sResponse Time (max)128.09sResponse Time (total)149.74s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.91sResponse Time (max)11.91sResponse Time (total)11.91s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)100.80sResponse Time (max)100.80sResponse Time (total)100.80s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)30.74sResponse Time (max)38.31sResponse Time (total)61.49s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)88.15sResponse Time (max)88.15sResponse Time (total)88.15s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.58sResponse Time (max)13.87sResponse Time (total)25.16s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)44.63sResponse Time (max)82.55sResponse Time (total)133.89s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.59sResponse Time (max)13.66sResponse Time (total)23.18s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.64sResponse Time (max)18.64sResponse Time (total)18.64s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.99sResponse Time (max)9.99sResponse Time (total)9.99s
A test is fully passed only if every run passed for that test.Wrong answer: 9Response Time (avg)3.31sResponse Time (max)20.51sResponse Time (total)66.17s…
Anti-AI Tricks
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.63sResponse Time (max)5.57sResponse Time (total)10.53s
Coding
: 4.2 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.06sResponse Time (max)3.45sResponse Time (total)6.12s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)20.51sResponse Time (max)20.51sResponse Time (total)20.51s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.87sResponse Time (max)3.54sResponse Time (total)5.74s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.22sResponse Time (max)1.25sResponse Time (total)3.67s
General Intelligence
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.62sResponse Time (max)1.62sResponse Time (total)1.62s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.40sResponse Time (max)1.46sResponse Time (total)2.79s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.65sResponse Time (max)3.59sResponse Time (total)7.94s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.27sResponse Time (max)5.27sResponse Time (total)5.27s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.97sResponse Time (max)1.97sResponse Time (total)1.97s
Anti-AI Tricks
: 8.1 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)15.85sResponse Time (max)20.83sResponse Time (total)47.55s
Coding
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)7.20sResponse Time (max)13.03sResponse Time (total)14.41s
Combined
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)75.68sResponse Time (max)75.68sResponse Time (total)75.68s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)96.01sResponse Time (max)96.01sResponse Time (total)96.01s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.28sResponse Time (max)7.37sResponse Time (total)8.55s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.87sResponse Time (max)5.26sResponse Time (total)7.74s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)27.78sResponse Time (max)27.78sResponse Time (total)27.78s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Anti-AI Tricks
: 6.6 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)74.75sResponse Time (max)182.10sResponse Time (total)298.98s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)220.48sResponse Time (max)243.66sResponse Time (total)440.97s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)262.83sResponse Time (max)262.83sResponse Time (total)262.83s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.27sResponse Time (max)27.52sResponse Time (total)48.54s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 3Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.47sResponse Time (max)19.46sResponse Time (total)34.93s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)31.79sResponse Time (max)50.78sResponse Time (total)95.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)88.68sResponse Time (max)88.68sResponse Time (total)88.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)56.76sResponse Time (max)56.76sResponse Time (total)56.76s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)6.73sResponse Time (max)9.79sResponse Time (total)13.46s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.84sResponse Time (max)23.84sResponse Time (total)23.84s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.43sResponse Time (max)3.43sResponse Time (total)3.43s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)2.53sResponse Time (max)2.54sResponse Time (total)5.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.11sResponse Time (max)4.11sResponse Time (total)4.11s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.67sResponse Time (max)4.67sResponse Time (total)4.67s
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.95sResponse Time (max)5.68sResponse Time (total)15.80s
Coding
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)65.07sResponse Time (max)105.80sResponse Time (total)130.13s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.40sResponse Time (max)17.40sResponse Time (total)17.40s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.17sResponse Time (max)5.02sResponse Time (total)8.34s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.26sResponse Time (max)4.46sResponse Time (total)8.52s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.22sResponse Time (max)11.63sResponse Time (total)18.66s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)13.68sResponse Time (max)13.68sResponse Time (total)13.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)63.48sResponse Time (max)63.48sResponse Time (total)63.48s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)42.21sResponse Time (max)89.34sResponse Time (total)168.84s
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)59.65sResponse Time (max)59.65sResponse Time (total)59.65s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)304.19sResponse Time (max)304.19sResponse Time (total)304.19s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)37.36sResponse Time (max)54.24sResponse Time (total)74.71s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)64.92sResponse Time (max)150.55sResponse Time (total)194.76s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.78sResponse Time (max)17.75sResponse Time (total)23.55s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)104.44sResponse Time (max)104.44sResponse Time (total)104.44s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)113.91sResponse Time (max)113.91sResponse Time (total)113.91s
A test is fully passed only if every run passed for that test.API error: 6Wrong answer: 3Response Time (avg)56.57sResponse Time (max)149.94sResponse Time (total)848.59s…
Anti-AI Tricks
: 6.4 A test is fully passed only if every run passed for that test.API error: 2Response Time (avg)15.12sResponse Time (max)19.99sResponse Time (total)45.37s
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)99.76sResponse Time (max)99.76sResponse Time (total)99.76s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)113.09sResponse Time (max)113.09sResponse Time (total)113.09s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)12.11sResponse Time (max)12.11sResponse Time (total)12.11s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)109.04sResponse Time (max)149.94sResponse Time (total)327.11s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.36sResponse Time (max)41.83sResponse Time (total)68.73s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)27.94sResponse Time (max)45.06sResponse Time (total)55.89s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)78.83sResponse Time (max)78.83sResponse Time (total)78.83s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.71sResponse Time (max)47.71sResponse Time (total)47.71s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)4.59sResponse Time (total)10.98s
Coding
: 3.4 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)183.89sResponse Time (max)299.23sResponse Time (total)367.78s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)25.87sResponse Time (max)25.87sResponse Time (total)25.87s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.04sResponse Time (max)4.12sResponse Time (total)6.07s
General Intelligence
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.61sResponse Time (max)3.61sResponse Time (total)3.61s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.98sResponse Time (max)13.98sResponse Time (total)13.98s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)234.19sResponse Time (max)234.19sResponse Time (total)234.19s
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)24.23sResponse Time (max)29.86sResponse Time (total)96.93s
Coding
: 3.9 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)184.97sResponse Time (max)189.03sResponse Time (total)369.94s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)93.11sResponse Time (max)93.11sResponse Time (total)93.11s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)36.09sResponse Time (max)39.12sResponse Time (total)72.18s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)24.27sResponse Time (max)33.91sResponse Time (total)72.82s
General Intelligence
: 3.4 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)58.29sResponse Time (max)58.29sResponse Time (total)58.29s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)35.78sResponse Time (max)47.30sResponse Time (total)71.56s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.81sResponse Time (max)34.81sResponse Time (total)34.81s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)83.99sResponse Time (max)83.99sResponse Time (total)83.99s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.85sResponse Time (max)4.45sResponse Time (total)7.40s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)14.84sResponse Time (max)26.13sResponse Time (total)29.68s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.25sResponse Time (max)3.02sResponse Time (total)4.51s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.22sResponse Time (max)4.68sResponse Time (total)9.67s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.09sResponse Time (max)2.09sResponse Time (total)2.09s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.84sResponse Time (max)4.45sResponse Time (total)5.68s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.25sResponse Time (max)1.25sResponse Time (total)1.25s
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 3Response Time (avg)1.37sResponse Time (max)4.49sResponse Time (total)27.32s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.10sResponse Time (max)1.65sResponse Time (total)4.42s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)951msResponse Time (max)1.31sResponse Time (total)1.90s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.53sResponse Time (max)2.53sResponse Time (total)2.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.04sResponse Time (max)1.32sResponse Time (total)2.07s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.02sResponse Time (max)1.16sResponse Time (total)3.06s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)932msResponse Time (max)1.00sResponse Time (total)1.86s
Puzzle Solving
: 6.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 2Response Time (avg)2.15sResponse Time (max)4.49sResponse Time (total)6.45s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.51sResponse Time (max)3.51sResponse Time (total)3.51s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)724msResponse Time (max)724msResponse Time (total)724ms
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 1Response Time (avg)2.95sResponse Time (max)29.38sResponse Time (total)58.96s…
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.38sResponse Time (max)2.69sResponse Time (total)5.51s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.77sResponse Time (max)4.39sResponse Time (total)5.54s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)29.38sResponse Time (max)29.38sResponse Time (total)29.38s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.43sResponse Time (max)1.57sResponse Time (total)2.86s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)868msResponse Time (max)1.02sResponse Time (total)2.60s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)929msResponse Time (max)1.05sResponse Time (total)1.86s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.71sResponse Time (max)2.65sResponse Time (total)5.13s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.21sResponse Time (max)1.21sResponse Time (total)1.21s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.99sResponse Time (max)109.60sResponse Time (total)139.95s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.54sResponse Time (max)34.54sResponse Time (total)34.54s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.30sResponse Time (max)9.30sResponse Time (total)9.30s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)114.12sResponse Time (max)114.12sResponse Time (total)114.12s
A test is fully passed only if every run passed for that test.API error: 6Wrong answer: 4Response Time (avg)24.56sResponse Time (max)78.74sResponse Time (total)368.35s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)9.32sResponse Time (max)12.36sResponse Time (total)27.96s
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)27.94sResponse Time (max)27.94sResponse Time (total)27.94s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)78.74sResponse Time (max)78.74sResponse Time (total)78.74s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)5.85sResponse Time (max)5.85sResponse Time (total)5.85s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)40.44sResponse Time (max)46.32sResponse Time (total)121.31s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.98sResponse Time (max)22.24sResponse Time (total)31.97s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)7.51sResponse Time (max)7.86sResponse Time (total)15.02s
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)17.84sResponse Time (max)17.84sResponse Time (total)17.84s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)41.74sResponse Time (max)41.74sResponse Time (total)41.74s
A test is fully passed only if every run passed for that test.Wrong answer: 10Response Time (avg)1.93sResponse Time (max)5.56sResponse Time (total)38.64s…
Anti-AI Tricks
: 6.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.31sResponse Time (max)2.08sResponse Time (total)5.25s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.52sResponse Time (max)2.05sResponse Time (total)3.04s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.56sResponse Time (max)5.56sResponse Time (total)5.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.18sResponse Time (max)1.24sResponse Time (total)2.37s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.31sResponse Time (max)1.39sResponse Time (total)3.92s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.41sResponse Time (max)3.41sResponse Time (total)3.41s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.15sResponse Time (max)1.19sResponse Time (total)2.31s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.29sResponse Time (max)1.56sResponse Time (total)3.87s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.90sResponse Time (max)3.90sResponse Time (total)3.90s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.01sResponse Time (max)5.01sResponse Time (total)5.01s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.81sResponse Time (max)5.65sResponse Time (total)7.62s
Coding
: 2.3 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)23.58sResponse Time (max)23.58sResponse Time (total)23.58s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.64sResponse Time (max)37.64sResponse Time (total)37.64s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.63sResponse Time (max)6.63sResponse Time (total)6.63s
Domain specific
: 5.8 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)121.79sResponse Time (max)121.79sResponse Time (total)121.79s
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)27.71sResponse Time (max)27.71sResponse Time (total)27.71s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.52sResponse Time (max)25.52sResponse Time (total)25.52s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)4.87sResponse Time (max)6.30sResponse Time (total)14.62s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)35.61sResponse Time (max)35.61sResponse Time (total)35.61s
Combined
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)53.14sResponse Time (max)53.14sResponse Time (total)53.14s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.93sResponse Time (max)5.03sResponse Time (total)9.86s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)24.14sResponse Time (max)45.83sResponse Time (total)72.43s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.30sResponse Time (max)6.00sResponse Time (total)8.59s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)10.19sResponse Time (max)14.92sResponse Time (total)20.37s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.31sResponse Time (max)6.31sResponse Time (total)6.31s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.API error: 8Wrong answer: 2Response Time (avg)15.25sResponse Time (max)43.55sResponse Time (total)182.96s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)11.69sResponse Time (max)19.37sResponse Time (total)35.08s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.95sResponse Time (max)34.95sResponse Time (total)34.95s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.95sResponse Time (max)15.40sResponse Time (total)29.90s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)22.08sResponse Time (max)43.55sResponse Time (total)66.23s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)3.40sResponse Time (max)3.40sResponse Time (total)3.40s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.API error: 2Response Time (avg)7.52sResponse Time (max)7.52sResponse Time (total)7.52s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.87sResponse Time (max)5.87sResponse Time (total)5.87s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 7.3 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)51.38sResponse Time (max)85.28sResponse Time (total)102.75s
Coding
: 4.1 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)215.89sResponse Time (max)281.00sResponse Time (total)431.77s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)71.37sResponse Time (max)71.37sResponse Time (total)71.37s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)49.78sResponse Time (max)49.78sResponse Time (total)49.78s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)137.29sResponse Time (max)137.29sResponse Time (total)137.29s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)92.47sResponse Time (max)92.47sResponse Time (total)92.47s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.74sResponse Time (max)31.74sResponse Time (total)31.74s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)83.95sResponse Time (max)83.95sResponse Time (total)83.95s
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.62sResponse Time (max)18.61sResponse Time (total)50.50s
Coding
: 6.6 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)165.39sResponse Time (max)168.22sResponse Time (total)330.78s
Combined
: 7.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)83.07sResponse Time (max)83.07sResponse Time (total)83.07s
Data parsing and extraction
: 3.5 A test is fully passed only if every run passed for that test.No answer: 2Response Time (avg)37.30sResponse Time (max)54.01sResponse Time (total)74.60s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)73.38sResponse Time (max)101.55sResponse Time (total)220.15s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.96sResponse Time (max)47.48sResponse Time (total)75.92s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)61.14sResponse Time (max)97.76sResponse Time (total)183.42s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.88sResponse Time (max)16.88sResponse Time (total)16.88s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)80.99sResponse Time (max)80.99sResponse Time (total)80.99s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 1Response Time (avg)1.09sResponse Time (max)2.97sResponse Time (total)21.79s…
Anti-AI Tricks
: 7.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.07sResponse Time (max)1.91sResponse Time (total)4.27s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.13sResponse Time (max)1.59sResponse Time (total)2.26s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.73sResponse Time (max)2.73sResponse Time (total)2.73s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)843msResponse Time (max)907msResponse Time (total)1.69s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)762msResponse Time (max)814msResponse Time (total)2.29s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)992msResponse Time (max)992msResponse Time (total)992ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)859msResponse Time (max)975msResponse Time (total)1.72s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.97sResponse Time (max)2.97sResponse Time (total)2.97s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)733msResponse Time (max)733msResponse Time (total)733ms
A test is fully passed only if every run passed for that test.Wrong answer: 8Did not follow instructions: 3Response Time (avg)2.27sResponse Time (max)14.63sResponse Time (total)43.20s…
Coding
: 7.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.29sResponse Time (max)3.06sResponse Time (total)4.58s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.28sResponse Time (max)3.28sResponse Time (total)3.28s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.47sResponse Time (total)2.21s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)6.48sResponse Time (max)14.63sResponse Time (total)19.43s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.07sResponse Time (max)1.07sResponse Time (total)1.07s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.58sResponse Time (max)2.58sResponse Time (total)2.58s