Modelling distributed systems on nature

I’ve developed several distributed systems that runs over multiple domains / networks that are non-heterogeneous in nature. Usually there are several basic problems that needs to be solved, in particular:

Component lifecycle model – how is the lifecycle of each component governed? For example, who starts and stops these components? Do they start themself? Where do they get their configuration data from?

Communication model – how do each component talk to one another? What types or signals / messages are needed ? What is the messaging protocol? What is the means of transport for these messages?

Social behavior – are there any smart social behavior that need to be embodied by each component? This is also relevant to building distributed systems that embodies error-recovery characteristics, where components can be in standby mode when they are not yet complete as a group, and then become active once the group is complete and they can collectively perform their task as a unit.

The talk

Firstly, go and watch the video. Don’t take off your invisible nerd goggles, and when she spoke about bacterias, think components and messaging mechanism in distributed systems. Enjoy!

Bonnie Bassler discovered that bacteria “talk” to each other, using a chemical language that lets them coordinate defense and mount attacks. The find has stunning implications for medicine, industry — and our understanding of ourselves.

Thoughts

This is a picture of a simplistic network management system that is distributed in nature. Distributed here meaning that each ‘yellow box’ can be running on a separate machine to other yellow box, and that they communicate to each other through some sort of messaging system.

The conventional model is to build these components one by one with their own respective objective in mind. That is, data collectors component does exactly that, collecting data. When it’s done its respective duty against the data, it’ll pass it on to the next one, and so on.

But drawing pictures like this is easy, in reality, there are other questions that needs to be answered, such as:

configuration – who configures these yellow boxes? how did the data collector box know that it need to pass on the data to the ETL box?

social behavior in case of fault – when the ETL data fails because of some unforseen reason, can data collector & data warehouse component know about it? Otherwise, data collectors will continue to fetch data and passing it on to an ETL component that is supposed to function, but either went missing or continuously crashing. Data warehouse will also expect the next batch of data to be loaded from the ETL, but these data obviously won’t be coming anytime soon.

Now, on to the concept of Quorum Sensing:

Quorum sensing is a type of decision-making process used by decentralized groups to coordinate behavior…

… Quorum sensing can function as a decision-making process in any decentralized system, as long as individual components have (a) a means of assessing the number of other components they interact with and (b) a standard response once a threshold number of components is detected.

… Bacteria that use quorum sensing constantly produce and secrete certain signaling molecules (called autoinducers or pheromones). These bacteria also have a receptor that can specifically detect the signaling molecule (inducer). When the inducer binds the receptor, it activates transcription of certain genes, including those for inducer synthesis. There is a low likelihood of a bacterium detecting its own secreted inducer. Thus, in order for gene transcription to be activated, the cell must encounter signaling molecules secreted by other cells in its environment.

What if we introduce the concept of Quorum Sensing – that is, if we introduce a means for each component to query the status of the network as a whole and decide what to do. For example, each component could send a broadcast to the network that simply say “identify yourself”, and upon receiving this message, each component could say “aye, I am component X”.

If we do this over a broadcast channel (e.g. a topic on a message bus) then each component will receive a list of other components attached to that messaging backbone. We can then build basic behavior around this model, such as: Within Data collectors, pause execution and send alert if you notice that ETL is out of action or not responding (or responding with a bad status). I’ve yet to do some experiments on this, but I suspect we can do some serious error-recovery mechanism if we adopt this model.

In short, this model could be seen as “each component watching every other component’s back”

Another use of Quorum Sensing would be simultaneous start without explicit trigger. That is, the start of a processing event is not triggered by an administration component, but rather by the each component checking if everyone else is there. But this concept introduces another layer of complexity involved, such as which business rule / use case to trigger, which workflow, and so on.

Anyway, enough speculation for now. When I have experimented with this concept further, I’ll post more update.