1 node that created the blockchain and a stream, which I used to create a stream and add 2M items into it, with random off-chain content (images).

2 nodes that join the blockchain after all the data has been imported through the master node, so on startup they begin to catch up with the master node at their own pace.

At the very beginning when the 2 new joined nodes are initially catching up, I can issue commands to any of the 3 nodes using the multichain-cli tool. For instance, I send this command to the master node and I get the reply back immediately:

However, at some point in time, when the volume of data being already synced up grows, the multichain-cli tool stops responding for all the nodes, and as the volume of synced data still grows, the nodes start to degrade dramatically until they just do not respond to commands.

Both 2 nodes join the blockchain through the DNS name of the master and they just subscribe to the stream to be able to find data by key, that's all.

Then I stop everything (without deleting the data) and restart first the master node. It starts to respond again to multichain-cli commands. But when I start again the first of the 2 additional nodes, I start getting this from it:

1 Answer

+1 vote

Best answer

Thanks for more details in the comment. We've had previous reports of memory issues with Kubernetes – would you be able to re-run your experiment directly on host operating systems, so we can determine if that's the cause of the problem?

And then just after subscribing, I send a query by key and it replies immediately. Then I leave it running for a while, get back, issue a new query, and it takes some minutes to reply (which replies immediately in the master, though).

Just a comment for any others following this discussion – the performance issue was caused by the stream having ~100 different offchain data payloads, each of which appears 1000+ times in the stream. These heavily duplicated (and therefore identically hashed) stream items are an unusual usage pattern but we've now resolved the performance issue in this case. The fix will be in the 2.0 production release.