Boris Cherny on lennyfeed

Post

Three layers of AI safety: (1) alignment/mechanistic interpretability — watching what neurons are doing, (2) evals — a lab Petri dish with synthetic scenarios, and (3) behavior in the wild. As models get more capable, layer 3 becomes the most important.

Mar 20Mar 20, 2026