Test your models across 32 challenges that expose the limits of token prediction.
32 puzzles available