BC
Boris Cherny@boris-cherny
Three layers of AI safety: (1) alignment/mechanistic interpretability — watching what neurons are doing, (2) evals — a lab Petri dish with synthetic scenarios, and (3) behavior in the wild. As models get more capable, layer 3 becomes the most important.
8hMar 20, 2026