Recent Articles

Exclusive For all their orderly aisles of flickering servers and the reassuring hum of tens of thousands of spinning drives, data centers are terrifically rowdy places.

The trick is to hide the constant bar fight going on between applications and the underlying IT resources with layers and layers of software, and one of the best ways to do this is with an efficient job scheduler and resource assignment system. Now, a pair of Stanford researchers have unveiled a system that deals with this problem, unleashing even greater levels of resource utilization within data centers.

Quasar is designed to schedule and assign resources to applications, and is meant to plug-into mammoth cluster management technologies, ranging from Google and Microsoft's proprietary Omega/Borg and Autopilot systems, to Twitter's Apache open source 'Mesos' project, to parts of the OpenStack management system.

"The whole approach is inspired in some sense from work I did at Microsoft and later at Google analyzing resource usage," Stanford professor Christos Kozyrakis and co-author of the research tells El Reg.

Quasar is a new approach to earmarking resources for specific tasks that promises to be far more efficient, avoiding the standard bit barn problem of millions upon millions of dollars of computer equipment standing to attention waiting for jobs that will never arrive.

Quasar tackles this problem by employing some workload classification techniques based on the approach Netflix uses for its personalized movie recommendations.

"How can I learn everything I need about the program without having to run it 1,000 times? We try to use machine learning technique to be able to learn everything about the program," explains Kozyrakis.

This means the system can take a few bits of data from an application, then merge those into a large group of amalgamated performance data, and work out roughly what resources need to be set aside for it to attain optimum performance.

The technology "uses fast classification techniques to determine the impact of different resource allocations and assignments on workload performance", they write.

What this means is it takes a few samples of a new application, then stops the application, compares the samples against other previously run applications, and makes resource assignment assumptions based on this combined representation. It takes these samples across four vectors – scale-up, scale-out, interference, and heterogeneity, and passes the output to a scheduler.

"You can imagine we need 1000 numbers to describe the behavior of the program and only get one or two. The way we get the other 998 is using classification. We use the model in the same way to be able to recommend books or movies," Kozyrakis tells us.

The overall system can deal with dynamic workloads, allowing it to assign new resources as an application grows (or shrinks). "The scheduler's objective is to allocate the least amount of resources needed to satisfy a workload's performance target," they write.

The approach "improves server utilization at steady stat by 47 per cent on average at high load in the 200-server cluster, while also improving performance of individual workloads compared to the alternative schemes," they write.

In the future, the researchers plan to merge the Quasar classification and scheduling algorithms into cluster management framework's like OpenStack or Mesos, they write.

Quasar is yet another example of the feverish pace of development in large-scale cluster management, and likely bears similarities to as-yet secretive schemes underway at Google, Amazon, and Microsoft. The rise of virtualization gave the IT industry a spending reprieve by increasing hardware utilization, and now systems like Quasar look set to let admins save even more bucks. ®