Personal-Values Alignment Tech: Some Initial Motivations
AIs don't understand our individual values. We should change that.
In We’re Arguing About AI Safety Wrong, Helen Toner puts out a call for “dynamist vision[s] for safe superhuman AI” — visions of the future in which AI contributes to individual autonomy, allowing society to self-organize and rapidly adapt to future challenges. She presents dynamism in opposition to stasism (both terms from Virginia Postrel’s The Future and Its Enemies), with stasism being the approach to governance that favors top-down control. Toner asserts that many in the AI-alignment community once advanced stasist agendas (and that many still do), but that discourse has “shifted somewhat in [the] direction” of dynamism, citing writings such as a LessWrong post titled (and I quote) Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt.
I’m generally on board with dynamism, but what might an AI-empowered dynamist future look like in practice? And what kinds of tech/policy might support the evolution of a yet more patchy and polychromatic society? Co-author to that LessWrong post Seb Krier, who leads policy development and strategy at Google DeepMind, laid out one partial vision in his September 2025 essay Coasean Bargaining at Scale:
[…] consider AGI deployed as a vast ecology of personalized agents and systems. This emerging ecosystem is what Tomašev et al. (2025) characterize as the “virtual agent economy,” a new economic layer where agents transact and coordinate at scales and speeds beyond direct human oversight. While this ecology will contain countless specialized agents, let’s focus on the one that matters most from an individual’s perspective: your personal advocate. Think of it as a fiduciary extension of yourself: a tireless, extremely competent digital representative, closely tied to you, its principal.
What could such an agent do? In principle, it can negotiate, calculate, compare, coordinate, verify, monitor, and much more in a split second. Through many multi-turn conversations, tweaking knobs and sliders, and continuous learning, it could also develop an increasingly sophisticated (though never perfect) model of who you are, your preferences, personal circumstances, values, resources, and more. This should evolve over time - an agent’s alignment should follow the principal’s own evolution.
Krier spends much of the essay discussing certain bargaining scenarios, before returning to this core technical challenge:
The user should have immense freedom to tune their agent to their unique preferences and values. They should also have complete privacy and control over their “cognitive profile” developed by the agent, for obvious reasons. Practically speaking though, this is the hardest part: how do you design and evidence an agent (mostly) aligned to a user?
I might call this necessary piece of software a “values profile” and not a “cognitive profile,” as the former more strictly describes what’s required of it (to speak on behalf of the user’s values).
In any case, Krier doesn’t offer a solution and rather just points to some theoretical research on individual alignment preferences. And indeed this problem of aligning AIs to individual values remains largely unsolved! So I’d argue that this deserves more attention.
While I’ve backed my way into this motivation statement by starting from a “dynamist vision for safe superhuman AI,” one does not need to be AI-safety-concerned to recognize the value here. Indeed basically every AI application, from coding to personal projects, would straightforwardly benefit from this tech, since basically every AI use-case exists in the complex world of the user, and thus is ultimately judged in terms of alignment to their values (and to the values of organizations in which they participate). If built, AI agents that align well to individual values would be a win-win technology, existing in the sweet spot where alignment and capabilities overlap.
So how might an automated advocate be built?
High-level components of an automated personal advocate
This section is not a concrete proposal; rather a framework for discussing various approaches
When designing automated advocates we can leverage two methods of personalization:
Context engineering - influencing actions via providing value-related context that steers the AI towards the correct actions.
Training - influencing actions via training on data related to the user’s values. This could take many forms, for example, deliberative alignment where the specification used is user-specific.
An advocate would need to be paired with an advocate-maintenance system — a system which is responsible for efficiently bringing the advocate into sync with the user’s values, and maintaining that synchrony. (I believe that this aspect of the overall system is most ripe for innovation).
Advocate maintenance (or “value maintenance”) could be passive or interactive, and realistically would be some combination of both:
Passive maintenance - extracting value-relevant information automatically from personal data (e.g., computer interactions, conversations with AI, previous writings).
Interactive maintenance - eliciting value-relevant responses directly from the user.
Also, value maintenance likely involves both collection and synthesis of value-relevant data — processes which both could involve varying levels of interactivity.
We likely also want evaluation systems (evals) that can quantify alignment between the advocate and its principal.
(Interactive) Value Representations
Automated advocates might leverage interactive value-representations — representations of the user’s values that are both transparent to the user and easily modifiable — for use in context-engineering or training setups. We would likely also want version control such that the user can review changes made by automated maintainers, helping the user feel a sense of ownership.
The simplest value representation is a single plain-text document describing the interests of the principal, which we can call a “constitution,” just like the “constitutions” that frontier labs use to impart their values into their models (see Claude’s). However, there are many other kinds of value representations that might be useful, including:
Semi-structured text-based representations, such as moral graphs;
Non-text representations, such as diagrams, maps/routes, or various other forms of media.
The most effective value representations might compose a variety of textual and non-textual forms, though expressivity trades off against complexity.
To clarify, interactive value-representations are not strictly necessary for automated advocates, but I expect them to be useful and I believe they’re at least worth exploring.
Drawbacks of existing products
Earlier I said “this problem of aligning AIs to individual values remains largely unsolved,” but I didn’t provide evidence. It’s worth taking a look at what products already exist to better understand the gaps in the landscape. My initial research reveals the following main categories:
Mass-market personal-AI memory: Most personal-AI users rely on features built into their chat clients (e.g., ChatGPT, Claude) including “custom instructions,” “memories,” and in-context learning (i.e., statements of value within a single conversation thread). These features do not meaningfully add up to a value-aligned system: custom instructions are flexible and user-owned, but they are devoid of the sort of maintenance and evaluation systems I described above. “Memories” are, at least in ChatGPT, gradually maintained and synthesize large amounts of user context, but most memory systems seem oriented towards synthesizing facts rather than values, and critically these memories are opaque or largely opaque to the user — there is no sense of ownership. And finally, in-context learning is convenient, but is insufficient for more automated scenarios in which an agent needs to act on a user’s behalf with minimal oversight, or in situations with complex success criteria. Overall, the big labs are not solving for personal-values alignment.
Niche memory products, e.g., Mem0 or Letta, don’t offer much more: these memory layers might be more sophisticated in some ways than the default memory layers of chat apps, but still they serve to make facts accessible, not values.
Digital clones: Some products (also niche) try to create AI systems that speak in the style of a particular user. For example, soul.md from the OpenClaw ecosystem creates a “soul document” that’s extracted from the user’s online writings. Delphi uses similar methods to create public-facing simulacra of thought leaders. To evaluate these solutions we would need to answer two questions: first, to what extent are they actually recreating the speech patterns of the principal? And second, to what extent does that recreation yield something that makes value-aligned decisions? The latter question likely depends on whether the public writings and/or speech of the individual conveys their values well, and I believe that is not the case for most people.
Creed.space is a rare product that targets values directly — they describe their value-context protocol as “an open protocol that separates your values from the model's training, so your preferences travel with you to participating providers.” My understanding is that Creed’s products are aimed towards aligning AI agents to already-cooked values, but that they do not tackle the critical components of value synthesis and maintenance, so there are large gaps to fill before this kind of tech could enable automated personal advocates.
So I find the product landscape to be overall lacking, and very few products are even trying to act as good advocates for their users’ values. I think there’s a real market opportunity here, and as cheaper and more coherent AI agents become available, the question of how to best align these autonomous systems to our individual values will grow in urgency.
Open design questions
The design overview I presented leaves the following critical questions unanswered:
What kinds of data are needed to represent individual values well? And what kinds of data let us effectively evaluate individual-values alignment?
How would a product maximize the benefits of value elicitation while minimizing user friction? Notably I think some amount of friction here is necessary and even desirable: a good product will have people reflecting on their values in ways that they previously were not.
What kinds of context engineering or training are most effective for aligning a model to an individual’s values?
How much needs to be invested in globally enabling steerable alignment1 during model training? Or are current models sufficiently steerable already?
Thankfully, people have already begun thinking through these questions via proposals for personal-advocate-shaped software and via academic research into related areas such as preference modeling and value elicitation. I plan to review some of these solutions in future blog posts, in order to present a more concrete vision of a path forward.
See the pluralistic alignment paper for definitions of steerable vs. Overton vs. distributional alignment. Steerable alignment indicates aligning to a user’s preferences (as opposed to aligning to the average preferences of all users, for example).


