By monitoring every service dependency pair, Flowmill answers questions such as “which of these 30 services is likely the cause for this incident?”​ in seconds, making it possible to direct escalations to fewer, more relevant engineers. This expedites triage, focuses mitigation efforts, and dramatically shrinks war room staffing and engineer burnout. Flowmill monitoring has negligible overhead, no sampling, no per-service configuration or code changes, and can be deployed in less than 20 minutes of configuration management.

The PhD research revolved around enabling fast detection of and reaction to undesirable incidents in datacenter and cloud networks, by designing extremely fine granulrity, low overhead, low latency monitoring, processing, and control of service interactions. The systems produced mostly controlled network transfers, as this use-case provides extreme challenges for the technology. Fastpass aims for high utilization with zero queueing: a logically centralized arbiter controls and orchestrates all network transfers. Flowtune assigns shares of network throughput to pairs of applications according to organizational policy, maximizing the organization’s utility.