Radar

Reads I found worth sharing.

Teaching Claude why

Teaching Claude why

In this post, Anthropic discusses a few of the updates they've made to alignment training.

Addiction, emotional distress, dread of dull tasks: AI models ‘seem to increasingly behave’ as though they’re sentient

Addiction, emotional distress, dread of dull tasks: AI models ‘seem to increasingly behave’ as though they’re sentient

A new paper from the Center for AI Safety, an AI safety nonprofit, suggests that more is going on under the surface.