Coherent extrapolated volition
From Wikipedia the free encyclopedia
Coherent extrapolated volition (CEV) is a theoretical framework in the field of AI alignment, originally proposed by Eliezer Yudkowsky in the early 2000s as part of his work on friendly AI. It describes an approach by which a superintelligent AI would act not according to humanity's current individual or collective preferences, but instead based on what humans would want—if they were more knowledgeable, more rational, had more time to think, and had matured together as a society.[1]
Concept
[edit]CEV proposes that an advanced AI system should derive its goals by extrapolating the idealized volition of humanity. This means aggregating and projecting human preferences into a coherent utility function that reflects what people would desire under ideal epistemic and moral conditions. The aim is to ensure that AI systems are aligned with humanity's true interests, rather than with transient or poorly informed preferences.[2]
In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
— Eliezer Yudkowsky, Coherent Extrapolated Volition[3]
Criticism
[edit]Despite its philosophical appeal, CEV faces significant theoretical and practical challenges.
One central critique is that human values are not stable or fixed; rather, they are deeply shaped by context, culture, and environment. The extrapolation of values could therefore lead to distortions, as increasing rationality might change or even replace original desires. It has been warned that using rationality as a tool to define ends might inadvertently overwrite the very volition the AI is supposed to serve, leading to misalignment between AI actions and genuine human values.[4]
In a thought experiment-laden essay, another criticism questions CEV's assumptions about wisdom and extrapolation. It is noted that CEV lacks a theory of which kinds of entities can become wise, or how to model their volition meaningfully. The concern is that not all agents—human or otherwise—can be extrapolated toward rational or moral idealization, and that CEV does not adequately account for these limitations.[5]
In another review, a philosophical analysis explores CEV through the lens of social trust in autonomous systems. Drawing on Anthony Giddens' concept of "active trust", the author proposes an evolution of CEV into "Coherent, Extrapolated and Clustered Volition" (CECV). This formulation aims to better reflect the moral preferences of diverse cultural groups, thus offering a more pragmatic ethical framework for designing AI systems that earn public trust while accommodating societal diversity.[6]
Yudkowsky's later view
[edit]Almost immediately after publishing the idea in 2004, Eliezer Yudkowsky himself described the concept as outdated. He warned against conflating it with a practical strategy for AI alignment. While CEV may serve as a philosophical ideal, Yudkowsky emphasized that real-world alignment mechanisms must grapple with greater complexity, including the difficulty of defining and implementing extrapolated values in a reliable way.[7]
See also
[edit]References
[edit]- ^ "Coherent Extrapolated Volition". LessWrong. Retrieved 17 May 2025.
- ^ Bostrom, Nick (2014). "Coherent extrapolated volition". Superintelligence: paths, dangers, strategies. Oxford, United Kingdom: Oxford University Press. ISBN 978-0-19-967811-2.
- ^ Yudkowsky, Eliezer (2004). "Coherent Extrapolated Volition" (PDF).
- ^ XiXiDu (22 November 2011). "Objections to Coherent Extrapolated Volition". LessWrong. Retrieved 17 May 2025.
- ^ "Coherent Extrapolated Dreaming". Alignment Forum. Retrieved 17 May 2025.
- ^ Sołoducha, Krzysztof. "Analysis of the implications of the Moral Machine project as an implementation of the concept of coherent extrapolated volition for building clustered trust in autonomous machines". CEEOL. Copernicus Center Press. Retrieved 17 May 2025.
- ^ "Coherent Extrapolated Volition". LessWrong. Retrieved 17 May 2025.