@fchollet
This is similar to how applying basic changes to how ARC tasks are encoded considerably degrades frontier model performance. If you're looking at the test for the first time, it really shouldn't matter what the encoding is. Unless you've studied specifically for the test, using a