Consider a chain-based PoS scheme where every block has a proposer, which is randomly sampled from the validator set, and SLOT_COUNT (eg. 12) collation slots. For each slot, a validator index is sampled, and a shard ID is randomly chosen; this random sampling happens LOOKAHEAD (eg. 5) blocks in the past (ie. the validator indices and shard IDs for block N are chosen during block N - LOOKAHEAD.

Each collation slot can be either empty, or taken up by a collation header, created by the specified validator. A collation header simultaneously serves three functions:

Representing a collation in the specified shard.

“Soft-confirming” the parent block.

Carrying a Casper FFG vote for the most recent checkpoint.

(1) is the same as collations in the current sharding proposal, except it is tightly coupled; that is, a block cannot be valid unless all collations it contains are available, and the same is true for all ancestors of that block. Additionally, the sharding is now internally fork-free: the collation must be built on top of the previous collation for the same shard mentioned in the chain.

Regarding (2), we have two properties. First, each collation that confirms the parent block adds 1 point to the score of the parent block; hence, these collations are the dominant factor that determines which chain is the longest. Second, a single validator creating two collations with the same index at the same height is a slashable offense; hence, if you see a collation with N > SLOT_COUNT/2 slots full, then you know that any competing chain will either require 2N - SLOT_COUNT slashed validators, or it would have to revert an entire LOOKAHEAD blocks to get different validators; either condition is hard, so this gives a kind of “soft finality” that can be reached within a single block. This can essentially be Ethereum’s answer to the market desire for blockchains that offer confirmations within a few seconds.

And (3) is self-explanatory. Note that the numbers match up quite conveniently; for example, if there are 100 shards, and we have an implied “period length” of 5, then this implies SLOT_COUNT = 20; with 20 votes per block, which with 2000 validators gives an equivalent “epoch length” of 100 blocks.

The following describes how “tightly coupled” sharding might work, in a way that simultaneously provides full Casper, along with fast semi-confirmations.

How are users going to submit pending messages to validators? If validators are going to be chosen randomly from a global pool one probably does not want to store all pending messages on all validators …

If a user submits a pending message to validator X, and the message is not included into a collation before this validator becomes inactive for the particular shard the message is destined to, then the user will need to resubmit the message to another validator, unless there is some kind of a gossiping scheme which passes pending messages from validator to validator …

There would be a separate gossip network per shard, which would be between clients of that shard and proposers on that shard. Proposers and validators are separate, see Separating proposing and confirmation of collations ; proposers and executors are shard-specific and validators are chosen from a global pool.

Either data availability proofs, or that nodes simply directly check availability of collations a few blocks back

Let’s assume we do not have data availability proofs. By “directly check availability of collations” do you mean download the full collation bodies of the corresponding collation headers?

vbuterin:

a block cannot be valid unless all collations it contains are available

This means that for checking the validity of a block it is necessary for a fully validating node of the main shard to download the full collation bodies for SLOT_COUNT (in the worst case) collation headers. Is that right?

vbuterin:

if there are 100 shards, and we have an implied “period length” of 5, then this implies SLOT_COUNT = 20

If the period length is 5, and the child shards process roughly as much as the main shard, then we can expect collation bodies to be about 5 times the size (in bytes) of a block. So with SLOT_COUNT = 20, verifying the validity of a block will involve downloading the equivalent of about SLOT_COUNT * 5 = 100 blocks. It seems fully validating nodes of the main shard would need to have the downstream bandwidth for the collation bodies of all shards combined.

Let’s assume we do not have data availability proofs. By “directly check availability of collations” do you mean download the full collation bodies of the corresponding collation headers?

Basically, if a node is considering whether or not to create a block, then it fully downloads and verifies the data availability of (i) all collations it wants to include, (ii) all collations in ancestors going back up to WINDBACK / SLOT_COUNT blocks. If a node is considering whether or not to build a collation on top of block B, then it performs the same availability check for block B.

So if SLOT_COUNT = 20, then it might make sense to check three blocks back. The key insight here is that because collations are now serving the double duty of also being claims on the availability of prior collations in all shards, regardless of the degree of parallelization between collations (ie. SLOT_COUNT), the number of collations you need to look back should not change (likely except in the extreme case where SLOT_COUNT >= WINDBACK).

It seems fully validating nodes of the main shard would need to have the downstream bandwidth for the collation bodies of all shards combined.

Yes, but being a fully validating node of the main shard in this model is equivalent to being an super-full node in the old model; the system doesn’t require any such nodes to actually exist. I expect the preferred configuration to be not verifying collation availability in the normal case, and only verifying it WINDBACK periods back when you get called to create a block or a collation.