Anthropic says its most advanced systems may be learning not just to reason, but to reflect internally on how they reason.
Anthropic’s Claude can now describe its own reasoning about 20% of the time, drastically cutting interpretability time, but it demands continuous oversight.