The purpose of this primer is to provide the reader with a basic understanding of the concepts underlying Proof of Stake networks and also of the unique properties of Solana's added Proof of History technology.
There are actually two thresholds of interest in such a network:
However, this desire for more nodes is often at odds with efficient functioning of the network, since the costs of transmitting data between large numbers of nodes can delay consensus considerably. It is great to have a million nodes in the network, but if that means that it takes four hours for consensus to be reached on a transaction, then the network isn't of much use to anyone.
Proof of Stake adds a few wrinkles to the general design of a consensus based network. It ensures that the participants have "skin in the game" by requiring that they provide their own "stake" which can be forfeited by nodes who provably misbehave. This then allows the network to be designed to work faster even if it gives opportunities for nodes to misbehave, with the penalties for misbehaving being assumed to prevent such misbehavior. Note that this is not used as a substitute for security -- the requirements of 1/3 dishonest nodes to halt the network and 2/3 dishonest nodes to reach consensus on bad transactions remain and are fundamentally what ensure that only valid transactions are accepted by the network. The "skin in the game" of stake simply allows for consequences for bad behavior by nodes that would temporarily disrupt the network or reduce its efficiency.
Proof of Stake also gives different levels of power to different nodes in the network, because nodes' votes are not all equal: they are weighted by stake. Therefore if node A with twice as much stake as nodes B and C votes for consensus on a group of transactions, this is equivalent to both B and C having voted for consensus. In other words, votes are weighted by stake and a total of dishonest stake-weight is required to cause problems on the network, not a total of number of dishonest nodes.
As an example, consider a network with four nodes and 100 total stake.
Node A has 10 stake.
Node B has 20 stake.
Node C has 30 stake.
Node D has 40 stake.
What can each grouping of dishonest nodes accomplish?
Dishonest Nodes | Honest Nodes | Dishonest Stake Weight | Honest Stake Weight | Can Halt the Network | Can Approve Bad Transactions |
---|---|---|---|---|---|
A, B, C, D | 100 | 0 | Yes | Yes | |
A, B, C | D | 60 | 40 | Yes | No |
A, B, D | C | 70 | 30 | Yes | Yes |
B, C, D | A | 90 | 10 | Yes | Yes |
A, B | C, D | 30 | 70 | No | No |
A, C | B, D | 40 | 60 | Yes | No |
A, D | B, C | 50 | 50 | Yes | No |
B, C | A, D | 50 | 50 | Yes | No |
B, D | A, C | 60 | 40 | Yes | No |
C, D | A, B | 70 | 30 | Yes | Yes |
A | B, C, D | 10 | 90 | No | No |
B | A, C, D | 20 | 80 | No | No |
C | A, B, D | 30 | 70 | No | No |
D | A, B, C | 40 | 60 | Yes | No |
A, B, C, D | 0 | 100 | No | No |
As we can see, the set of nodes required to halt the network is as small as 1 -- node D -- even though 1 node is only 25% of the total number of nodes, and the set of nodes required to write bad transactions is as small as 2 -- nodes C and D combined -- even though 2 nodes is only 50% of the total number of nodes. Both of these factors are due to the large stake-weight of several of the validators.
A final note about those 1/3 and 2/3 thresholds: the 1/3 stake-weight threshold of dishonest nodes that can halt the network can be achieved in a combination of dishonest nodes + compromised nodes + blocked nodes. The 2/3 threshold can only be achieved by dishonest nodes + compromised nodes. What this means is that the 1/3 halt threshold is easier to achieve because the network can be halted by 'blocking' some honest nodes, thus eliminating their ability to vote, in addition to simply running dishonest nodes that refuse to vote and/or compromising ("hacking") honest nodes and forcing them not to vote.
The 2/3 stake-weight needed to write arbitrary transactions cannot rely on blocking honest nodes, because a full 2/3 of stake weight is needed to actively vote for bad transactions. The only option for an attacker who wants to write bad transactions is to run dishonest nodes and/or compromise enough nodes to accumulate a total of 2/3 stake-weight. This is a much harder proposition because blocking honest nodes, which is often easier than hacking them, does not help.
An example of behavior that is suboptimal is for a node to emit more than one block at a time. If it does this, then it creates two forks within the network and the consensus mechanism then has to work to choose one fork over the other to permanently commit to the blockchain. Although the functioning of the honest nodes will eventually commit one of the two blocks to the chain (assuming that the block is otherwise valid), and thus the network will progress despite the fork, the extra cost of fork resolution will slow down the network. Because it should be easy for a node to avoid emitting two different blocks at the same time, doing so can only be interpreted as deliberate attempt to slow the network down. Such an act would be punished by slashing.
Another purpose of stake is that it gives an opportunity for stakeholders to choose which validators are the most trustworthy and/or are providing the best service to the network. If a validator has great performance, never gets slashed (i.e. is always honest), and has a good relationship with its stakers, it should expect to receive more stake than a validator without one or more of these traits. As a result, the staking mechanism tends to promote the authority of well behaved nodes over that of poorly behaved nodes, further strengthening the performance and safety of the overall network.
In Solana and many other Proof of Stake networks, validators themselves earn rewards proportional to their stake-weight, which means that it is in their interest to attract stake, and thus behave in the ways that are most likely to attract and retain stake - i.e., with good performance, good returns, good relationships with stakers, transparency, and honesty. Thus the financial rewards of validators are in line with the interests of the network as expressed by the choices of stakers who delegate their stake.
Note that this can only be carried so far: even the best validator should not receive an overly large portion of stake, because that would put too much authority in one node that could be hacked or could unexpectedly change its behavior in a bad way ("go rogue"), so there is always a fine balance between rewarding a good validator with enough stake to magnify their positive effect on the network, without rewarding them with so much as to create a single point of failure for the network. How much is too much stake in one validator? There is no specific answer to this question and ultimately it is up to the collective set of stakers to decide by virtue of how they place their stake. But stakers should definitely be aware of these factors so that they can truly make the best choices for themselves and the network overall.
Proof of Stake by itself can guarantee consensus as long as most nodes are honest, but it can't necessarily guarantee that it happens quickly or with high throughput. Proof of History is a mechanism that is used to enhance Solana's Proof of Stake implementation to allow it to be the fastest Proof of Stake network while still retaining the safety needed to ensure that the network is secure.
Because Solana is a network based on a blockchain, transactions are grouped into blocks and the blocks chain together into a cryptographically verifiable sequence. The consensus mechanism of votes is what allows any block to "prove" that it is part of the Solana blockchain by virtue of having received 66% of stake-weighted votes. But someone has to create the blocks that are proposed to be voted on and potentially make it into the blockchain.
There must be an orderly sequence of blocks submitted, otherwise multiple nodes would submit blocks at the same time, and the process of deciding which block should be the next block would be a costly and lengthy sequence based on fork resolution algorithms. Therefore the network can only be fast if only one node at a time submits the next block to be considered for inclusion in the blockchain.
Each block is produced during a unit of time called a "slot" and one validator is assigned to each slot in the leader schedule. The leader schedule thus gives an ordered sequence of validators, and the only validator which may propose a block during a particular slot in the leader schedule is the validator that was assigned that slot. In this way, the potential blocks are already ordered by validator ahead of time, which makes it easy to detect and reject a block that was proposed out-of-order.
For example, if there are four validators, A, B, C, and D, then a leader schedule might look like:
B - D - A - D - C - C - A - B - D
This means that the only allowed ordering of blocks are those submitted by validators in that order, with the caveat that a slot might be "skipped" if the validator in that slot doesn't generate a block at all, or generates a bad block that the rest of the validators do not vote on, or produces a block too late for others to see it before the next leader's slot.
The slots are organized into groups of 432,000, and such a group of 432,000 slots is called an "epoch". The leader schedule for the current epoch is determined by data from the epoch two epochs prior to this epoch. Therefore, the leader schedule for the current epoch has already been known for one full epoch before the current epoch even started. This gives plenty of time for all validators to come to know the leader schedule and thus no validator is ever unsure about who the current leader is. The current leader is known to all validators at all times because the leader schedule is always known well in advance.
"Every slot takes half a second, so every validator simply takes half a second to gather up its transactions and then propose its block right before that half of a second is up. Then the next validator will see that block, and build on it during its own slot, similarly taking half a second to do so." |
But the problem here is that there is no universally accepted clock that all validators can run on so that they are all perfectly synchronized on those half-second boundaries. It is true that there are methods for establishing synchronized time between distant computers, but those methods require cooperation between nodes, and a dishonest node could provide false time information, or just go out of turn and claim that it is everyone else whose clock is wrong while their own clock is the correct one. A given node that sees a block emitted by some leader that went "too early" now has to decide if it's that other node whose clock is correct, or their own clock.
These problems can be worked around to a large degree through a variety of techniques that establish trustless synchronization of time-based clocks throughout the network, but the more precise they attempt to be, the slower they become. For example imagine a solution that simply makes slots one year in length. Now if a node sees a block that is from the "wrong year" it can be pretty sure that the problem is with the leader that proposed it since it's very unlikely that the local node has a clock that is a year off. But a one year slot would not be workable because no one wants to wait a year for their transaction to be verified. The shorter the time interval in a slot, and thus the shorter the time that users have to wait for their transactions to be verified, the more likely it is that a node cannot tell whether its clock is off by that little bit or the other leader's clock is off by that little bit. And without agreement, nodes cannot properly sequence the proposal of blocks so that they do not overlap or collide.
Solana thus measures time not on human scales, but in terms of number of iterative SHA-256 hashes per second (in actuality, it measures in units of called "ticks", which are a number of SHA-256 hash iterations approximating a small fraction of a second). All validators run this "clock" on their CPU constantly, by running the iterative SHA-256 hash function as quickly as they can. Points in time are thus represented by values in the iterative SHA-256 hash sequence. The value produced after, say, 1 million iterations might represent "1 second", and the value produced after 2 million iterations might represent "2 seconds", etc. Therefore, by providing an initial value, and the value computed after 1 million SHA-256 iterations starting from that initial value, a validator can prove that it has spent about 1 second of time running the iterative SHA-256 hash function.
It is important to remember that while this description describes time frames measured in human seconds, the Solana network itself does not measure time in seconds. It measures in ticks, which are designed to closely approximate the desired human time interval, but may be slower or faster than intended depending on the speed of SHA-256 hashing being achieved on average by nodes within the network. Faster nodes will make ticks shorter in duration from a human perspective, and slower nodes will make ticks longer, but from the network's perspective the same number of ticks have passed, and thus the same amount of time within the network, regardless of how much human scale time has been taken.
Validators sample their iterative SHA-256 hash function "clock" once per tick which provides a sequence of numbers for each slot that all together represent the total duration of time spent within the slot; this allows the sequence to be more quickly invalidated should it prove to be wrong. For example, sampling 100 values per slot and providing all of them means that a validator that is checking that a one slot duration has really been observed, can accept or reject sequences much more quickly since each number can be checked in parallel in 1/100th the time that it took the original validator to compute the sequence. This is an important property because it means that legitimate validators cannot easily be swamped by dishonest validators; if it took just as long to verify a block as it did to make it, then a bad actor could just emit a lot of bad blocks and waste legitimate validators' time in verifying them. But when blocks can be checked in 1/100th the time it takes to emit them, then the attacker has to spend 100x the resources of those it is attacking, which is a much less appealing and often impractical proposition.
On Solana, blocks are "streamed" by the leader as the leader verifies transactions. In this way, the time spent transmitting the proposed block of transactions overlaps with the time spent actually verifying transactions, which is a powerful form of pipelining that contributes to Solana's low latency. The Proof of History iterative SHA-256 sequence starts with the last SHA-256 value computed from the previous block's Proof of History, and the validator includes the Proof of History sequence values that it is computing along with the transactions in the proposed block as it streams them. The validator finishes the block when its slot is up, and the sequence of SHA-256 hash values that has included as its proof of having run for its entire slot is verified by other validators as being complete before they will vote on the block. The next leader then does the same thing using the just-emitted block as its starting point, and the process repeats. In this way the Proof of History values provide an ongoing, verifiable proof that each leader has started where the previous leader left off and has taken the full slot's time to generate its block.
This Proof of History data included with the block thus shows that the block was emitted during its slot and not at any other time, which is quickly and easily verifiable by any validator. Also the block of course contains the signature of the leader who emitted it, allowing other validators to quickly prove that the block was emitted by the proper leader for that slot. And so blocks can easily be checked to ensure that they were emitted by the correct leader and at the correct time. This feature is what allows Solana leaders to follow in leader sequence with minimal overhead, which is the main factor that makes the Solana network so fast. And because the leader schedule must be respected according to these rules, it is censorship resistant, whch means that a validator cannot exclude another validator by emitting a competing block in the legitimate leader's slot.
A - B - C
Meaning that first A is leader, then B is leader, then C.
Once A has emitted its block, it is B's turn. But what if C decide to try to "cheat" and emit a block of its own while it is B's turn?
If C wants to emit its block during B's slot, then that means it must directly chain off of A's block. But the leader schedule says that B goes after A, and if C emits a block that has the same Proof of History duration that B would have, then it will be obvious to any other validator that C is cheating since it's emitted its block during B's slot. So C must make it appear that B never emitted a block at all. In other words, it would have to make the sequence of blocks look like:
A - (missing block B) - C
If B actually were missing, then an honest C would have spent all of B's slot producing a Proof of History sequence proving that it waited for the full duration of B's slot before beginning its own slot, and then it would start emitting its block after that, including the continuation of the Proof of History sequence into and then through its slot. Other validators will accept C's block as valid since it is the correct leader (C) in the correct slot (two slots after A's slot).
Therefore, C must compute enough Proof of History hashes to show that it waited all of B's slot before starting its own block. But in the meantime, B is already submitting its own block that chains off of A, and so from the network's perspective the following occurs:
There is another important aspect of C's attempt to censor B: if C loses, then it is not allowed to emit any block for its own slot at all. This is because once C emits a block that shows it chaining directly off of A (trying to pretend that B never produced a block and thus was skipped), if it loses (and with high probability, it will lose), then any attempt to submit a different block for its slot, chained directly off of B's block, means producing two blocks for C's slot (an A - C block and a B - C block). That is against the rules and would result in slashing.
Therefore, C's attempt to censor B by competing in B's slot would almost certainly fail, and for every failed attempt, C loses out on any chance to emit a block during its own slot. On balance, it's a losing strategy for C the vast majority of the time, so C is very unlikely to even attempt it.