Linear readability vs nonlinear heads: run summary

Interpretation: nonlinear heads are most informative as a diagnostic. Large positive deltas mean the frozen representation still contains label structure that is not linearly readable. Small deltas at high linear-probe accuracy mean the backbone has already done the geometric work. Conformal prediction adds a second diagnostic: at fixed coverage, did the nonlinear head actually reduce uncertainty, or did it only flip a few borderline point predictions?