Vitalik's new article: the new paradigm of governance in the future 'AI engine + human steering wheel'

2025-03-04 08:01:30

Original Title: "AI as the engine, humans as the steering wheel"

Written by Vitalik, founder of Ethereum

Compilation: Baishui, Golden Finance

If you ask people what aspects they like about democratic structures, whether it's government, the workplace, or DAOs based on blockchain, you often hear the same arguments: they avoid centralization of power, they provide strong guarantees for users, because no one can unilaterally change the direction of the system, they can make higher quality decisions by collecting the views and wisdom of many people.

If you ask people what they don't like about democratic structures, they often give the same complaints: ordinary voters are not sophisticated enough because each voter has very little chance to influence the outcome, few voters engage in high-quality thinking in decision-making, and you often get low participation (making the system vulnerable to attack) or in fact centralization, as everyone defaults to trusting and replicating the views of some influential individuals.

The goal of this article is to explore a paradigm where perhaps AI can be used to allow us to benefit from democratic structures without negative effects. "AI is the engine, humans are the steering wheel". Humans provide only a small amount of information to the system, perhaps only a few hundred, but it is all well thought out and of extremely high quality. AI treats this data as an "objective function" and works tirelessly to make a lot of decisions to do its best to achieve these goals. In particular, this article will explore an interesting question: Can we do this without putting a single AI at the center, but instead relying on a competitive open market in which any AI (or human-machine hybrid) can freely participate?

directory

Why not just let an AI take charge directly?

Futarchy

Distill human judgment

Deep funding

Increase privacy

Benefits of Engine + Steering Wheel Design

Why not just let an AI be responsible?

The simplest way to incorporate human preferences into AI-based mechanisms is to create an AI model and have humans input their preferences in some way. There are simple ways to do this: you simply put a text file containing a list of instructions into the system prompt. Then, you can use one of many 'agent AI frameworks' to give AI the ability to access the internet, hand over the keys to your organization's assets and social media profiles, and you're all set.

After several iterations, this may be sufficient to meet the needs of many use cases, and I fully expect that in the near future, we will see many structures involving AI reading group instructions (even real-time reading group chats) and taking action.

This structure is not ideal as a governance mechanism for long-term institutions. One valuable attribute that long-term institutions should have is credibility neutrality. In my post introducing this concept, I listed four valuable attributes of credibility neutrality:

Do not write specific people or specific results into the mechanism

Open and verifiable execution

Keep it simple

Do not change frequently

The LLM (or AI agent) satisfies 0/4. The model inevitably codes a large number of specific people and outcome preferences during its training. Sometimes this leads to surprising AI preference directions, for example, looking at a recent study showing that major LLMs value life in Pakistan more than life in the United States (!!). ）。 It can be open-weighted, but that's far from open source; We don't really know what devil is hiding in the depths of the model. It's the opposite of simple: the Kolmogorov complexity of an LLM is tens of billions of bits, roughly equivalent to all U.S. laws (federal + state + local) combined. And because AI is evolving so fast, you have to change it every three months.

For this reason, I advocate another approach that has been explored in many use cases, which is to let a simple mechanism become the rules of the game and let AI be the player. It is this insight that makes the market so efficient: rules are a relatively stupid property system, edge cases are adjudicated by the court system, which slowly accumulates and adjusts precedents, and all intelligence comes from entrepreneurs operating 'on the edge'.

A single 'gamer' can be an LLM, a group of LLMs that interact with each other and call various Internet services, various AI + human combinations, and many other constructs; as a mechanism designer, you don't need to know. The ideal goal is to have a mechanism that can run automatically — if the goal of the mechanism is to choose what to fund, then it should be as much like Bitcoin or Ethereum block rewards as possible.

The advantage of this method is:

It avoids incorporating any single model into the mechanism; Instead, you get an open market made up of many different players and architectures, all of which have their own different biases. Open models, closed models, agent groups, human+AI hybrids, robots, infinite monkeys, etc. are all fair games; The mechanism does not discriminate against anyone.

The mechanism is open source. Although the players are not, the game is open source - and this is a pattern that has been quite well understood (for example, political parties and markets operate in this way).

The mechanism is very simple, so there are relatively few ways for the mechanism designer to encode their biases into the design.

The mechanism will not change, even from now until the singularity, the underlying participants' architecture needs to be redesigned every three months.

The goal of the guidance mechanism is to faithfully reflect the fundamental goals of the participants. It only needs to provide a small amount of information, but it should be of high quality.

You can think of this mechanism as taking advantage of the asymmetry between proposing an answer and validating it. This is similar to Sudoku which is difficult to solve, but it is easy to verify that the solution is correct. You (i) create an open marketplace for players to act as "problem solvers", and then (ii) maintain a human-run mechanism to perform the much simpler task of validating the proposed solution.

Futarchy

Futarchy was originally proposed by Robin Hanson, meaning 'vote for values, but bet on beliefs'. The voting mechanism selects a set of objectives (which can be any objectives, but they must be measurable), and then combines them into a measure M. When you need to make a decision (for simplicity, let's assume it's YES/NO), you will set up a conditional market: you ask people to bet if (i) will choose YES or NO, (ii) if YES is chosen, the value of M, otherwise zero, (iii) if NO is chosen, the value of M, otherwise zero. With these three variables, you can determine whether the market believes that YES or NO is more favorable to the value of M.

"Company stock price" (or for cryptocurrencies, token price) is the most commonly referenced indicator because it is easy to understand and measure, but this mechanism can support multiple indicators: monthly active users, median self-reported happiness of certain groups, some quantifiable decentralized indicators, etc.

Futarchy was originally invented before the era of artificial intelligence. However, Futarchy naturally fits the paradigm of 'complex solver, simple validator' described in the previous section, and traders in Futarchy can also be artificial intelligence (or a combination of human and artificial intelligence). The role of the 'solver' (prediction market traders) is to determine how each proposed plan will affect the value of future indicators. This is difficult. If the solver is correct, they will make money; if the solver is wrong, they will lose money. Validators (people who vote on indicators, and if they notice the indicators being 'manipulated' or becoming outdated, will adjust the indicators and determine the actual value of the indicators at some future time) only need to answer a simpler question: 'What is the current value of the indicator?'

Distilling human judgment

Distillation of human judgment is a class of mechanisms that works as follows. There are tons of (think: 1 million) questions that need to be answered. Natural examples include:

How much credit should each person in this list receive for their contribution to a project or task?

Which of these comments violate the rules of the social media platform (or sub-community)?

Which of these given Ethereum addresses represent real and unique individuals?

Which of these physical objects contribute positively or negatively to the aesthetics of their environment?

You have a team that can answer these questions, but the cost is to spend a lot of effort on each answer. You only ask the team to answer a few questions (for example, if the total list has 1 million items, the team may only answer 100 of them). You can even ask the team indirect questions: not asking 'What percentage of the total credit should Alice receive?', but 'Should Alice or Bob receive more credit, and by how many times?' When designing the jury mechanism, you can reuse proven mechanisms in the real world, such as appropriation committees, courts (determining the value of a judgment), evaluations, etc. Of course, jurors themselves can also use novel AI research tools to help them find answers.

Then, you allow anyone to submit a numerical list of answers to the entire set of questions (e.g., providing an estimated value of how much credit each participant should receive in the entire list). Participants are encouraged to use artificial intelligence to complete this task, but they can use any technology: artificial intelligence, human-computer hybrid, artificial intelligence that can access internet search and autonomously hire other human or artificial intelligence workers, control theory-enhanced monkeys, etc.

Once the full list providers and jurors have submitted their answers, the full list will be checked against the jury's answers, and some combination of the full list most compatible with the jury's answers will be selected as the final answer.

The human judgment mechanism of distillation differs from that of futarchy, but there are some important similarities:

In futarchy, 'oracles' make predictions, and the 'real data' (used to reward or punish oracles) on which their predictions are based is the output indicator value of the oracle, operated by the jury.

In distilled human judgment, the "solver" provides answers to a large number of questions, and the "real data" on which their predictions are based is the high-quality answers to a fraction of these questions provided by the jury.

A toy example for credit assignment of distilled human judgments, please refer to the Python code here. The script asks you to act as a jury and includes a complete list of AI-generated (and human-generated) answers that are pre-included in the code. The mechanism identifies the linear combination of the complete list that best fits the jury's answer. In this case, the winning combination is 0.199 * Claude's answer + 0.801 * Deepseek's answer; this combination is more consistent with the jury's answer than any single model. These coefficients will also be the reward given to the submitter.

In this example of "defeating Sauron," the aspect of "humans as the steering wheel" is manifested in two places. First, each issue applies high-quality human judgment, although it still uses the jury as "technical bureaucrats" for performance evaluation. Second, there is an implicit voting mechanism to determine whether "defeating Sauron" is the correct goal (instead of, for example, trying to ally with Sauron or giving all the territories east of a critical river to him as a peace concession). There are other distilled human judgment use cases, where the jury's task is more directly associated with values: for example, imagine a decentralized social media platform (or sub-community), where the jury's job is to mark randomly selected forum posts as compliant or non-compliant with community rules.

In the distillation of the human judgment paradigm, there are some open variables:

How to sample? The role of the complete list submitter is to provide a large number of answers; the role of the juror is to provide high-quality answers. We need to select jurors in such a way and select questions for jurors, that is, the ability of the model to match jurors' answers to the greatest extent indicates their overall performance. Some considerations include:

The Balance of Expertise and Bias: Skilled jurors typically specialize in their professional fields, so allowing them to select the content to be rated will result in higher-quality input. On the other hand, too many choices may lead to bias (jurors favoring content related to them) or weaknesses in sampling (certain content systematically unrated)

Anti-Goodhart: There will be content that tries to "play" with AI mechanics, for example, where contributors generate a lot of code that looks impressive but is useless. This means that the jury can detect this, but the static AI model won't unless they try hard. One possible way to capture this behavior is to add a challenge mechanism through which individuals can flag such attempts, guaranteeing a jury judgment on them (and thus incentivizing AI developers to ensure that they are captured correctly). If the jury agrees, the whistleblower will be rewarded, and if the jury disagrees, a fine will be paid.

What scoring function do you use? One idea used in the current deep funding pilot is to ask the juror 'A or B should get more credit, and by how much?' The scoring function is score(x) = sum((log(x[B]) - log(x[A]) - log(juror_ratio)) ** 2 for (A, B, juror_ratio) in jury_answers): that is, for each jury answer, it will ask how far the ratio provided by the jury is from the ratio in the complete list, and add a penalty proportional to the square of the distance (in logarithmic space). This is to show that the design space of the scoring function is rich, and the choice of the scoring function is related to the choice of questions you ask the jurors.

How do you reward full listing submitters? Ideally, you want to give non-zero rewards to multiple participants frequently to avoid monopolistic mechanisms, but you also want to meet the following property: participants can't increase the reward by submitting the same (or slightly modified) answer set multiple times. One promising approach is to directly calculate a linear combination of the full list of answers that are best suited to the jury (with non-negative coefficients and a sum of 1) and use these same coefficients to divide the reward. There may also be other methods.

In general, the goal is to adopt a human judgment mechanism that is known to be effective, minimizes bias, and stands the test of time (for example, imagine how the adversarial structure of the court system involves two disputing parties with a lot of information but biased, while the judge has little information but may not have bias), and use an open artificial intelligence market as a reasonably high-fidelity and very low-cost predictor for these mechanisms (similar to how the large prophecy model 'distillation' works).

Deep Funding

Depth financing is the application of distilled human judgment to the weighted question of "what percentage of X's credit belongs to Y?" on the graph.

The simplest way is to directly illustrate with an example:

Output of two-tier deep financing example: the origin of Ethereum's thought. Please check the Python code here.

The goal here is to allocate honors for philosophical contributions to Ethereum. Let's look at an example:

The simulated deep financing rounds shown here attribute 20.5% of the credit to the cypherpunk movement and 9.2% to technological progressivism.

At each node, you ask the question: to what extent is it an original contribution (and therefore it deserves credit for itself), and to what extent is it a recombination of other upstream influences? For the cypherpunk movement, it's 40% new and 60% dependent.

You can then look at the impact upstream of these nodes: liberal petty governmentism and anarchism earned 17.3% of the credit for the cypherpunk movement, but only 5% for direct democracy in Switzerland.

Note, however, that liberal petty government and anarchism also inspired Bitcoin's monetary philosophy, so it influenced Ethereum's philosophy in two ways.

To calculate the total contribution share of liberal small government and anarchism to Ethereum, you need to multiply the edges on each path and then add the paths together: 0.205 * 0.6 * 0.173 + 0.195 * 0.648 * 0.201 ~= 0.0466. So, if you had to donate $100 to reward all those who contributed to Ethereum's philosophy, liberal petty governmentists and anarchists would receive $4.66 according to this simulated deep funding round.

This method is designed to apply to fields where work is carried out on a highly structured basis built on previous work. The academic world (think: referencing diagrams) and open-source software (think: library dependencies and forks) are two natural examples.

The goal of a well-functioning depth funding system is to create and maintain a global graph, where any funder interested in supporting a specific project can send funds to the address representing that node, and the funds will automatically propagate to its dependencies based on the weights of the graph edges (and recursively to their dependencies, and so on).

You can imagine a decentralized protocol using an embedded deep financing device to issue its tokens: the decentralized governance within the protocol will select a jury, and the jury will operate the deep financing mechanism, as the protocol will automatically issue tokens and deposit them into the node corresponding to itself. In this way, the protocol programmatically rewards all its direct and indirect contributors, reminiscent of how Bitcoin or Ethereum block rewards reward a specific type of contributor (miner). By affecting the weights on the margin, the jury can continuously define the contribution types it values. This mechanism can serve as a decentralized and long-term sustainable alternative to mining, sales, or one-time airdrops.

Increased privacy

Usually, to make the correct judgment on the issues in the above example, it is necessary to have access to private information: internal chat records of the organization, information submitted secretly by community members, etc. One advantage of "using a single AI" is that, especially in smaller environments, it is easier for an AI to access information than to make it public to everyone.

In order for distillation's human judgment or deep funding to work in these cases, we can try to use cryptography to securely give AI access to private information. The idea is to use multi-party computation (MPC), fully homomorphic encryption (FHE), a trusted execution environment (TEE), or similar mechanisms to provide private information, but only if its only output is a "full list commit" that goes directly into the mechanic.

If you do this, then you must restrict the mechanism set to AI models (rather than human or AI + human combination, as you cannot let humans see the data), and specific to models running on certain bases (such as MPC, FHE, trusted hardware). A major research direction is to find a practical version that is recent enough to be effective and meaningful.

Advantages of engine + steering wheel design

There are many anticipated benefits to such designs. The most important benefit so far is that they allow for the creation of DAOs, enabling human voters to control the direction without being bogged down by excessive decision-making. They strike a balance where individuals do not have to make N decisions, but their power lies not only in making one decision (as is typically the case with delegation) but also in triggering nuanced preferences that are hard to express directly.

In addition, such a mechanism seems to have the incentive smoothing property. What I mean by "incentive smoothing" here is a combination of two factors:

Diffusion: Any single action taken by the voting mechanism will not have an excessive impact on the interests of any single participant.

Chaos: The relationship between voting decisions and how they affect the interests of participants is more complex and difficult to calculate.

The confusion and diffusion here are two terms taken from cryptography, and they are key properties of the security of encryption and hash functions.

A good example of incentive smoothing in today's real world is the rule of law: instead of regularly taking actions in the form of "$200 million for Alice's company", "$100 million for Bob's company" on a regular basis, the top of government passes rules designed to be applied evenly to a large number of participants, which are then interpreted by another group of actors. When this approach works, the benefit is that it greatly reduces the benefits of bribery and other forms of corruption. When it is violated, which often happens in practice, these problems are quickly magnified considerably.

AI is clearly going to be an important part of the future, and it will inevitably become an important part of future governance. However, if you involve AI in governance, there are obvious risks: AI is biased, it can be deliberately undermined during training, and AI technology is evolving so fast that "putting AI in power" may actually mean "putting in charge of upgrading the AI". Distilled human judgment offers an alternative path forward, allowing us to harness the power of AI in an open, free-market manner while maintaining human-controlled democracy.

Special thanks to Devansh Mehta, Davide Crapis, and Julian Zawistowski for their feedback and review, as well as Tina Zhen, Shaw Walters, and others for their discussions.

HEART-6.15%

ETH-3.57%

View Original

The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.