Is AI Alignment Just Scaled Ethics?

Alignment is one of those things people love to debate on Twitter but rarely agree on. I've been thinking about it more lately and honestly it just feels like ethics... but with GPUs.

We're basically trying to get models to behave nicely. Not break things. Not say awful stuff. Not go rogue. But the methods are weird. Reinforcement learning from human feedback? Like... okay. What even is "good" feedback? How consistent are people even?

Philosophy folks have argued about morality for centuries and still haven't nailed it. We're now trying to turn that into code. Not sure if that's brave or naive.