Shinobi Systems'
Solana Proof of Stake + Proof of History Primer


Introduction

Solana is a Proof of Stake network. This short phrase - "Proof of Stake" - represents a much larger concept with considerable complexity behind it, and even more so for Solana, which adds the unique properties of Proof of History to the mix to enable fast, low-latency transactions while still maintaining censorship resistance.

The purpose of this primer is to provide the reader with a basic understanding of the concepts underlying Proof of Stake networks and also of the unique properties of Solana's added Proof of History technology.

What is a Proof of Stake Network?

Generally speaking, a Proof of Stake network is one in which a set of independent entities that do not necessarily trust one another can achieve consensus (i.e. agree) about the validity of a sequence of transactions. Fundamentally this is based on the techniques described for achieving this consensus in a representative problem called the Byzantine Generals Problem. The conclusion of research on this topic is that consensus can safely be reached in such a network as long as less than 1/3 of the participants (hereafer called "nodes" or "validators") are dishonest.

There are actually two thresholds of interest in such a network:

  1. If the number of dishonest nodes comprises 1/3 or more of the total nodes then the network may be "halted" -- those dishonest nodes may simply refuse to participate, and the remaining nodes will never be able to achieve the 2/3 supermajority required for consensus. In this situation, the network does not produce invalid transactions, it simply produces no transactions at all.
  2. If the number of dishonest nodes comprises 2/3 or more of the total nodes then those dishonest nodes may collude to produce arbitrary transactions of any kind. This is the worst case scenario as the network is no longer functioning to produce accurate transactions but instead is producing whatever transactions the 2/3 dishonest supermajority wants.
It is ideal for the collection of nodes to be as large as possible so that the total number of nodes that must be corrupted to achieve either of these thresholds is impractically large.

However, this desire for more nodes is often at odds with efficient functioning of the network, since the costs of transmitting data between large numbers of nodes can delay consensus considerably. It is great to have a million nodes in the network, but if that means that it takes four hours for consensus to be reached on a transaction, then the network isn't of much use to anyone.

Proof of Stake adds a few wrinkles to the general design of a consensus based network. It ensures that the participants have "skin in the game" by requiring that they provide their own "stake" which can be forfeited by nodes who provably misbehave. This then allows the network to be designed to work faster even if it gives opportunities for nodes to misbehave, with the penalties for misbehaving being assumed to prevent such misbehavior. Note that this is not used as a substitute for security -- the requirements of 1/3 dishonest nodes to halt the network and 2/3 dishonest nodes to reach consensus on bad transactions remain and are fundamentally what ensure that only valid transactions are accepted by the network. The "skin in the game" of stake simply allows for consequences for bad behavior by nodes that would temporarily disrupt the network or reduce its efficiency.

Proof of Stake also gives different levels of power to different nodes in the network, because nodes' votes are not all equal: they are weighted by stake. Therefore if node A with twice as much stake as nodes B and C votes for consensus on a group of transactions, this is equivalent to both B and C having voted for consensus. In other words, votes are weighted by stake and a total of dishonest stake-weight is required to cause problems on the network, not a total of number of dishonest nodes.

As an example, consider a network with four nodes and 100 total stake.

Node A has 10 stake.
Node B has 20 stake.
Node C has 30 stake.
Node D has 40 stake.

What can each grouping of dishonest nodes accomplish?

Dishonest NodesHonest NodesDishonest Stake WeightHonest Stake WeightCan Halt the NetworkCan Approve Bad Transactions
A, B, C, D1000YesYes
A, B, CD6040YesNo
A, B, DC7030YesYes
B, C, DA9010YesYes
A, BC, D3070NoNo
A, CB, D4060YesNo
A, DB, C5050YesNo
B, CA, D5050YesNo
B, DA, C6040YesNo
C, DA, B7030YesYes
AB, C, D1090NoNo
BA, C, D2080NoNo
CA, B, D3070NoNo
DA, B, C4060YesNo
A, B, C, D0100NoNo

As we can see, the set of nodes required to halt the network is as small as 1 -- node D -- even though 1 node is only 25% of the total number of nodes, and the set of nodes required to write bad transactions is as small as 2 -- nodes C and D combined -- even though 2 nodes is only 50% of the total number of nodes. Both of these factors are due to the large stake-weight of several of the validators.

A final note about those 1/3 and 2/3 thresholds: the 1/3 stake-weight threshold of dishonest nodes that can halt the network can be achieved in a combination of dishonest nodes + compromised nodes + blocked nodes. The 2/3 threshold can only be achieved by dishonest nodes + compromised nodes. What this means is that the 1/3 halt threshold is easier to achieve because the network can be halted by 'blocking' some honest nodes, thus eliminating their ability to vote, in addition to simply running dishonest nodes that refuse to vote and/or compromising ("hacking") honest nodes and forcing them not to vote.

The 2/3 stake-weight needed to write arbitrary transactions cannot rely on blocking honest nodes, because a full 2/3 of stake weight is needed to actively vote for bad transactions. The only option for an attacker who wants to write bad transactions is to run dishonest nodes and/or compromise enough nodes to accumulate a total of 2/3 stake-weight. This is a much harder proposition because blocking honest nodes, which is often easier than hacking them, does not help.

What is the Purpose of Stake in a Proof of Stake Network?

As mentioned in the previous section, one purpose of stake is to ensure that nodes have "skin in the game" and thus are discouraged from behavior that would reduce the perfomance of the network under the threat of eliminating stake from the stakers if such behavior is detected. The loss of stake when this behavior is detected is called "slashing", and it has the effect of reducing the validator's influence on the network by reducing its stake-weight, in addition to encouraging stakers to find a more honest node to stake with.

An example of behavior that is suboptimal is for a node to emit more than one block at a time. If it does this, then it creates two forks within the network and the consensus mechanism then has to work to choose one fork over the other to permanently commit to the blockchain. Although the functioning of the honest nodes will eventually commit one of the two blocks to the chain (assuming that the block is otherwise valid), and thus the network will progress despite the fork, the extra cost of fork resolution will slow down the network. Because it should be easy for a node to avoid emitting two different blocks at the same time, doing so can only be interpreted as deliberate attempt to slow the network down. Such an act would be punished by slashing.

Another purpose of stake is that it gives an opportunity for stakeholders to choose which validators are the most trustworthy and/or are providing the best service to the network. If a validator has great performance, never gets slashed (i.e. is always honest), and has a good relationship with its stakers, it should expect to receive more stake than a validator without one or more of these traits. As a result, the staking mechanism tends to promote the authority of well behaved nodes over that of poorly behaved nodes, further strengthening the performance and safety of the overall network.

In Solana and many other Proof of Stake networks, validators themselves earn rewards proportional to their stake-weight, which means that it is in their interest to attract stake, and thus behave in the ways that are most likely to attract and retain stake - i.e., with good performance, good returns, good relationships with stakers, transparency, and honesty. Thus the financial rewards of validators are in line with the interests of the network as expressed by the choices of stakers who delegate their stake.

Note that this can only be carried so far: even the best validator should not receive an overly large portion of stake, because that would put too much authority in one node that could be hacked or could unexpectedly change its behavior in a bad way ("go rogue"), so there is always a fine balance between rewarding a good validator with enough stake to magnify their positive effect on the network, without rewarding them with so much as to create a single point of failure for the network. How much is too much stake in one validator? There is no specific answer to this question and ultimately it is up to the collective set of stakers to decide by virtue of how they place their stake. But stakers should definitely be aware of these factors so that they can truly make the best choices for themselves and the network overall.

How does Proof of History in Solana Enhance its Proof of Stake Implementation?

Solana aims to be fast, both in terms of total throughput, which is the number of transactions that can be processed in a unit of time, and latency, which is how long it takes between the time that a transaction is first submitted to the network to the time it becomes confirmed by the network and incorporated into the network's permanent state.

Proof of Stake by itself can guarantee consensus as long as most nodes are honest, but it can't necessarily guarantee that it happens quickly or with high throughput. Proof of History is a mechanism that is used to enhance Solana's Proof of Stake implementation to allow it to be the fastest Proof of Stake network while still retaining the safety needed to ensure that the network is secure.

Because Solana is a network based on a blockchain, transactions are grouped into blocks and the blocks chain together into a cryptographically verifiable sequence. The consensus mechanism of votes is what allows any block to "prove" that it is part of the Solana blockchain by virtue of having received 66% of stake-weighted votes. But someone has to create the blocks that are proposed to be voted on and potentially make it into the blockchain.

There must be an orderly sequence of blocks submitted, otherwise multiple nodes would submit blocks at the same time, and the process of deciding which block should be the next block would be a costly and lengthy sequence based on fork resolution algorithms. Therefore the network can only be fast if only one node at a time submits the next block to be considered for inclusion in the blockchain.

Introducing the Leader Schedule

This is where the "leader schedule" comes in. In Solana, the set of validators that could vote is known: validators "register" themselves as voting nodes and stakers "delegate" stake to the voting nodes which then allows those nodes to contribute stake-weighted votes to the system and help select the blocks that end up in the blockchain. Because the set of validators that could submit blocks is known, it is possible to organize the network such that only one specifically identified validator is allowed to submit a block at one time. (As an aside, and to allay any privacy concerns, validators are not required to identify themselves in any way except by using their anonymous public key; the humans who run the validators remain as anonymous as they want to be.)

Each block is produced during a unit of time called a "slot" and one validator is assigned to each slot in the leader schedule. The leader schedule thus gives an ordered sequence of validators, and the only validator which may propose a block during a particular slot in the leader schedule is the validator that was assigned that slot. In this way, the potential blocks are already ordered by validator ahead of time, which makes it easy to detect and reject a block that was proposed out-of-order.

For example, if there are four validators, A, B, C, and D, then a leader schedule might look like:

B - D - A - D - C - C - A - B - D

This means that the only allowed ordering of blocks are those submitted by validators in that order, with the caveat that a slot might be "skipped" if the validator in that slot doesn't generate a block at all, or generates a bad block that the rest of the validators do not vote on, or produces a block too late for others to see it before the next leader's slot.

The slots are organized into groups of 432,000, and such a group of 432,000 slots is called an "epoch". The leader schedule for the current epoch is determined by data from the epoch two epochs prior to this epoch. Therefore, the leader schedule for the current epoch has already been known for one full epoch before the current epoch even started. This gives plenty of time for all validators to come to know the leader schedule and thus no validator is ever unsure about who the current leader is. The current leader is known to all validators at all times because the leader schedule is always known well in advance.

How the Leader Schedule is Selected

The leader schedule itself is not fixed. It is determined by a random number generator that selects validators at random according to their stake-weight, which means that while all validators will get a number of slots roughly in proportion to their stake-weight, which slots a validator gets is random and thus not predictable ahead of time. To compute the leader schedule, the random number generator is seeded by an accumulation of values extracted from the state of the network at the end of every epoch. These values are not predicatable or controllable by any one validator, but come from the combined actions of many validators that cannot be coordinated or forced. Because the leader schedule is always computable from data that is available to every validator as a part of the natural network function, and because it uses the same methodology every time with fixed algorithms that do not change and that are known to all validators, every validator can independently compute the leader schedule without any additional messages between validators. They all just look at the same data already available to them at the end of every epoch, make the same calculations, and end up knowing the same leader schedule for the upcoming epoch.

The Problem of Enforcing the Leader Schedule

Now that a leader schedule has been established, it is up to every validator to follow that schedule and propose blocks when it is "their turn" as dictated by the schedule, and vote on blocks proposed by others. However, a problem remains: what prevents validators from emitting blocks out of order? The naive implementation would go something like:

"Every slot takes half a second, so every validator simply takes half a second to gather up its transactions and then propose its block right before that half of a second is up. Then the next validator will see that block, and build on it during its own slot, similarly taking half a second to do so."

But the problem here is that there is no universally accepted clock that all validators can run on so that they are all perfectly synchronized on those half-second boundaries. It is true that there are methods for establishing synchronized time between distant computers, but those methods require cooperation between nodes, and a dishonest node could provide false time information, or just go out of turn and claim that it is everyone else whose clock is wrong while their own clock is the correct one. A given node that sees a block emitted by some leader that went "too early" now has to decide if it's that other node whose clock is correct, or their own clock.

These problems can be worked around to a large degree through a variety of techniques that establish trustless synchronization of time-based clocks throughout the network, but the more precise they attempt to be, the slower they become. For example imagine a solution that simply makes slots one year in length. Now if a node sees a block that is from the "wrong year" it can be pretty sure that the problem is with the leader that proposed it since it's very unlikely that the local node has a clock that is a year off. But a one year slot would not be workable because no one wants to wait a year for their transaction to be verified. The shorter the time interval in a slot, and thus the shorter the time that users have to wait for their transactions to be verified, the more likely it is that a node cannot tell whether its clock is off by that little bit or the other leader's clock is off by that little bit. And without agreement, nodes cannot properly sequence the proposal of blocks so that they do not overlap or collide.

The Proof of History Solution

Proof of History is Solana's solution to this problem. Instead of using a clock based on human timescales, or "wall clock time", it uses a "clock" based on the amount of time it takes to run an iterative SHA-256 hash function a certain number of times. Because computers have already converged on a commonly attainable "fastest SHA-256 hash implementation", all computers which use the fastest generally avilable processors will be able to compute only a certain number of SHA-256 hashes per second. It is true that there is some variability there, with some CPUs being able to compute SHA-256 faster than others. But the differences among the fastest computers are small, and a common upper bound has already been reached, despite the tremendous amount of time and effort poured into making exactly this function as fast as possible thanks to Bitcoin's reliance on it. Therefore, we can bound the number of iterative SHA-256 hash invocations per second by, on the lower end, a number that is known to be achievably by fast modern CPUs, and on the upper end, by what is known to be true for the fastest CPU implementation known to mankind. This is already known to be a suprisingly tight range, and therefore, the iterative SHA-256 hash function run a certain number of times can be said to closely approximate a second's duration.

Solana thus measures time not on human scales, but in terms of number of iterative SHA-256 hashes per second (in actuality, it measures in units of called "ticks", which are a number of SHA-256 hash iterations approximating a small fraction of a second). All validators run this "clock" on their CPU constantly, by running the iterative SHA-256 hash function as quickly as they can. Points in time are thus represented by values in the iterative SHA-256 hash sequence. The value produced after, say, 1 million iterations might represent "1 second", and the value produced after 2 million iterations might represent "2 seconds", etc. Therefore, by providing an initial value, and the value computed after 1 million SHA-256 iterations starting from that initial value, a validator can prove that it has spent about 1 second of time running the iterative SHA-256 hash function.

It is important to remember that while this description describes time frames measured in human seconds, the Solana network itself does not measure time in seconds. It measures in ticks, which are designed to closely approximate the desired human time interval, but may be slower or faster than intended depending on the speed of SHA-256 hashing being achieved on average by nodes within the network. Faster nodes will make ticks shorter in duration from a human perspective, and slower nodes will make ticks longer, but from the network's perspective the same number of ticks have passed, and thus the same amount of time within the network, regardless of how much human scale time has been taken.

Validators sample their iterative SHA-256 hash function "clock" once per tick which provides a sequence of numbers for each slot that all together represent the total duration of time spent within the slot; this allows the sequence to be more quickly invalidated should it prove to be wrong. For example, sampling 100 values per slot and providing all of them means that a validator that is checking that a one slot duration has really been observed, can accept or reject sequences much more quickly since each number can be checked in parallel in 1/100th the time that it took the original validator to compute the sequence. This is an important property because it means that legitimate validators cannot easily be swamped by dishonest validators; if it took just as long to verify a block as it did to make it, then a bad actor could just emit a lot of bad blocks and waste legitimate validators' time in verifying them. But when blocks can be checked in 1/100th the time it takes to emit them, then the attacker has to spend 100x the resources of those it is attacking, which is a much less appealing and often impractical proposition.

On Solana, blocks are "streamed" by the leader as the leader verifies transactions. In this way, the time spent transmitting the proposed block of transactions overlaps with the time spent actually verifying transactions, which is a powerful form of pipelining that contributes to Solana's low latency. The Proof of History iterative SHA-256 sequence starts with the last SHA-256 value computed from the previous block's Proof of History, and the validator includes the Proof of History sequence values that it is computing along with the transactions in the proposed block as it streams them. The validator finishes the block when its slot is up, and the sequence of SHA-256 hash values that has included as its proof of having run for its entire slot is verified by other validators as being complete before they will vote on the block. The next leader then does the same thing using the just-emitted block as its starting point, and the process repeats. In this way the Proof of History values provide an ongoing, verifiable proof that each leader has started where the previous leader left off and has taken the full slot's time to generate its block.

This Proof of History data included with the block thus shows that the block was emitted during its slot and not at any other time, which is quickly and easily verifiable by any validator. Also the block of course contains the signature of the leader who emitted it, allowing other validators to quickly prove that the block was emitted by the proper leader for that slot. And so blocks can easily be checked to ensure that they were emitted by the correct leader and at the correct time. This feature is what allows Solana leaders to follow in leader sequence with minimal overhead, which is the main factor that makes the Solana network so fast. And because the leader schedule must be respected according to these rules, it is censorship resistant, whch means that a validator cannot exclude another validator by emitting a competing block in the legitimate leader's slot.

A Proof of History Example

To analyze this property in a little more detail, let's assume that there are three leaders and that the leader schedule is:

A - B - C

Meaning that first A is leader, then B is leader, then C.

Once A has emitted its block, it is B's turn. But what if C decide to try to "cheat" and emit a block of its own while it is B's turn?

If C wants to emit its block during B's slot, then that means it must directly chain off of A's block. But the leader schedule says that B goes after A, and if C emits a block that has the same Proof of History duration that B would have, then it will be obvious to any other validator that C is cheating since it's emitted its block during B's slot. So C must make it appear that B never emitted a block at all. In other words, it would have to make the sequence of blocks look like:

A - (missing block B) - C

If B actually were missing, then an honest C would have spent all of B's slot producing a Proof of History sequence proving that it waited for the full duration of B's slot before beginning its own slot, and then it would start emitting its block after that, including the continuation of the Proof of History sequence into and then through its slot. Other validators will accept C's block as valid since it is the correct leader (C) in the correct slot (two slots after A's slot).

Therefore, C must compute enough Proof of History hashes to show that it waited all of B's slot before starting its own block. But in the meantime, B is already submitting its own block that chains off of A, and so from the network's perspective the following occurs:

  1. A completes a valid block with proper Proof of History.
  2. B immediately starts streaming its block and the network sees that B is active and producing a block properly chained off of A and with proper Proof of History.
Now one of two things can happen:
  1. If C is no faster at computing PoH than B, B will complete its block right about the same time that C starts emitting its own block chained directly off of A. The network has already seen B's block by this point though so it just rejects C's block. Note that some validators due to network latencies may get both blocks late and may not be able to tell whether to accept B's block or C's block; but 2/3 of the validators (by stake weight) would have to see C's complete block well before any of B's block in order to accept it. So this would be a rare occurrence that would require very poor overall network performance.
  1. If C is faster at computing PoH than B, then it can start sending its block sometime before B is done sending its block. It is very, very unlikely for C to be so much faster than B that it can begin its block before a significant fraction of B's block is already done and seen by the network, however, so C is more likely to win than it was in the previous case, especially in the case of network latencies making B's block late, but still unlikely.
Therefore, C's relative speed compared to B determines how likely it is to compete successfully with (i.e. censor) B, but C would have to be much, much faster than B to compete effectively and have any reasonable chance of censoring B. And because all validators use CPUs that are known to be at or close to the fastest available for computing iterative SHA-256, it is very, very unlikely for C to be so much faster than B that it can significantly compete with B for a block in B's slot.

There is another important aspect of C's attempt to censor B: if C loses, then it is not allowed to emit any block for its own slot at all. This is because once C emits a block that shows it chaining directly off of A (trying to pretend that B never produced a block and thus was skipped), if it loses (and with high probability, it will lose), then any attempt to submit a different block for its slot, chained directly off of B's block, means producing two blocks for C's slot (an A - C block and a B - C block). That is against the rules and would result in slashing.

Therefore, C's attempt to censor B by competing in B's slot would almost certainly fail, and for every failed attempt, C loses out on any chance to emit a block during its own slot. On balance, it's a losing strategy for C the vast majority of the time, so C is very unlikely to even attempt it.

Conclusion

In summary, Solana is a Proof of Stake network which uses the voting power of stake-weighted nodes to achieve consensus on transactions, which is a system proven to work reliably as long as more than 1/3 of nodes are honest. The speed of the network is enhanced through the natural fork avoidance mechanisms built into the leader schedule, and further enhanced by the Proof of History mechanism that ensures that the leader schedule is followed while maintaining high speed and low latency.

About Shinobi Systems

Shinobi Systems runs one of the most reliable and highest-returns validators on the Solana network. We are Solana enthusiasts helping to build a solid foundation for Solana with an honest, trustworthy, performant, and reliable validator node. Read more at our website.