Analysis of the effects of a "timely vote credits" mechanism on Solana mainnet-beta

Introduction

The goal of this change is to award vote credits based partially on the latency with which they are cast. Votes that "land" closer to the slot they are voting on would earn more credits, votes that land "further" from the slot that they are voting on would earn fewer credits. This will incentivize voting as quickly as possible in order to reduce the incentive to "lag" votes, whereby a validator intentionally holds back on voting in order to wait to see what slots are being voted on by other validators before casting the validator's own vote.

The Problem

The current vote credit mechanism (hereafter referred to as "normal" vote credits) awards 1 credit per successfully voted on slot. A successfully voted on slot is one which: The problem with normal vote credits is that they are awarded equally for a validator who votes promptly as a validator who votes after a long delay. It is known that there are voting strategies involving intentionally lagging votes which accrue more vote credits (due to voting on "the wrong fork" less often), and if taken to the extreme, this strategy would lead to a cluster halt as all validators would wait for other validators to vote before casting their own vote.

The Timely Credits Solution

The proposed solution to this problem is to award more credits to votes that land in a "more timely" fashion. This is hereafter referred to as "timely" vote credits.

The earliest a vote can possibly land is in the next slot after the slot being voted on. Votes that land at this slot are awarded maximum credits.

For every slot after that, the credits awarded are reduced by some amount. This means that votes cast sooner will earn more credits.

This will make intentional vote lagging unattractive: every vote that intentially lags will "give up" some number of credits due to not being cast sooner and thus not earning as many credits.

It is important that validators in locations "distant" from the "network epicenter(s)", and thus subject to the greatest average network latencies, are not penalized if they vote in a timely manner with respect to their view of the network, even if their votes tend to arrive at leaders later because of increased latency due to network distance. For this reason, there is a "grace" period given so that all votes that land within a certain latency threshold are awarded full credit. Analysis of voting patterns on mainnet-beta show that the vast majority of validators land votes with an average latency of less than 4 slots, with most of those having an average latency of less than 3 slots.

The parameters which may be tuned to provide the desired reward structure for timely vote credits are:

Numerous combinations of these parameters were studied for their effects on vote credits across all validators and data centers. The following parameters were eventually selected as having the best characteristics: Grace Period 4, Maximum Credits 16, Credit Reduction Factor 1.0.

The analysis considered three major factors:

  1. Discouraging lagging
  2. Tempered impact on validators with "reasonable" performance
  3. Minimized impact on validators in Japan and Singapore (generally regarded as the "most network distant" of the common data center regions)

Comparing Normal Vote Credits to Timely Vote Credits on mainnet-beta

The voting history of all mainnet-beta validators from epochs 299 through 311 (inclusive) has been gathered and the results collated into tables that show the normal vote credit earnings compared to timely vote credit earnings given the exact same voting patterns by those validators of the same epochs. In addition to tables showing the effects on individual validators, the average results for all validators in each data center is computed and provided in data center specific tables. Aggregating results by data center allows analysis of the effects of timely vote credits on data centers in different regions of the world, which should reveal aspects of the proposed change that cause disparate results for data centers in more remote locations.

Each table consists of the following columns:

Normal RankingTRIcon or PopulationNameAverage Vote LatencyNormal PctDiffTimely PctNameIcon or PopulationNRTimely Ranking

It's important to realize that this table is essentially two ranked lists conjoined into one: on the left half is the validator with the given Normal Ranking given on the left half of the table (which is the same as the NR value on the right half of the table); on the right half is the validator with the given Timely Ranking.

Because of this conjoinment it is possible to see a view of what the rankings look like for normal vote credits and compare those to what they look like for timely voting, since the table simultaneously shows both rankings at the same ime. In addition, it is easy for a given validator to see the change in ranking when normal and timely voting is used; for example, a validator with Normal Ranking (NR) of 7 and Timely Ranking (TR) of 18 would "lose" 11 places of rank if timely vote credits were used instead of normal vote credits. Similarly a validator with Normal Ranking (NR) of 63 and Timely Ranking (TR) of 22 would "gain" 41 places of rank if timely vote credits were used instead of normal vote credits.

It's very important to realize that Diff is the difference between timely and normal vote credits relative to the leader for the validator on the left half of the table. The Diff column is not relevant to the right half of the table.

Here are the tables showing the results averaged over all epochs, and also results per epoch, for validators and data centers:

Questions and Answers

Can this proposal "break consensus"?

This proposal would not alter anything about the consensus mechanism of Solana and cannot cause a chain halt by itself. It does change validator voting incentives slightly, which could lead validators to a different voting pattern that could have an effect on consensus; but it's hard to envision a voting strategy that would result from this change to the voting incentives that would cause validators to vote less often or more slowly -- this proposal incentivizes the opposite (voting more frequently and more quickly).

Does this proposal encourage "geographic centralization"?

This proposal includes a grace period of 4 slots; all votes that land within 4 slots of the slot being voted on achieve full credits. The data shows that all regions of the world in which validators are currently present are capable of landing votes with average latency less than 4. Therefore, there is no reason that any region of the world should be inacapable of hosting a validator with performance as good as any other. However, more distant regions do have less of a "buffer" for packet delay, and so it can be harder for further regions to consistently get full vote credits for votes cast. The disadvantage of distance is very minor though: validators in Singapore, one of the "most distant" regions, on average are only very, very slightly affected by this proposal: Singaporean data centers on average lose 1.075% credits relative to the leader. There are other "closer" regions of the world with similar losses: the "24961-DE-Europe/Berlin" data center loses 1.162% credits, for example. Therefore, the results for any particular validator and data center are much more a factor of validator and network quality than they are of geographic location.

Doesn't the "grace period" still allow vote lagging for a few slots?

Yes, it does - any validator that currently uses a lagging strategy can still do so but only has a maximum of 3 slots of "lag" before losing vote credits. And delaying a vote even one slot increases the chances of that vote landing beyond the grace period, because of natural latencies in the network. A lagging strategy would be very difficult to execute after this proposal, and the gains would be minimal.

Are laggers punished by this proposal?

The worst lagger, VymD, loses an average of 21.821% credits relative to the leader. That is huge and will absolutely eliminate this lagging. The second worst lagger, Private Validator, loses 6.598% on average, which also will completely discourage that lagging, as it will reduce Private Validator's vote credits to well below average (from 98.925% of the leader's vote credits to 92.398%).

Are fast voters rewarded by this proposal?

Yes, but minimally. Ignoring all validators ranked near the bottom of the vote earners (whose results are heavily skewed by having such low vote totals), the "best" result is KTKD who gains 1.593% credits relative to the leader. This is a significant, but very modest, increase. Other winners include GERvalidator who gains 1.449%, and Block Logic who gains 1.400%.

Are slow voters penalized by this proposal?

Yes, but there are very few validators in this category, because the grace period protects most validators from significant effects. Approximately 100 validators are likely to experience vote credit reductions, the vast majority of those experiencing very small reductions less than 1%.

Where is the source code that was used to generate the data?

https://github.com/bji/solana_timely_vote_credits

Is the raw data available?

Yes - please read the README.txt in the github repo to understand what the format of these files are. The "raw data", as fetched from Google Bigtables via the fetch_data.sh script, is available (Please note, these files are large, approximately 40 GB each):

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/299.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/300.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/301.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/302.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/303.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/304.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/305.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/306.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/307.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/308.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/309.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/310.gz

https://www.shinobi-systems.com/timely_voting_proposal/raw_data/311.gz

In addition, the "processed data" as produced by the process_data.sh script, is available (these files are only about 400 KB each):

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/299

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/300

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/301

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/302

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/303

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/304

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/305

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/306

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/307

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/308

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/309

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/310

https://www.shinobi-systems.com/timely_voting_proposal/data_processed/311