Abstract

Scientific applications are becoming more complex and more I/O demanding than ever. For such applications, the system with dedicated I/O nodes does not provide enough scalability. Rather, a serverless approach is a viable alternative. However, with the serverless approach, a job’s execution time is decided by whether it is co-located with the file blocks it needs. Gang scheduling (GS), which is widely used in supercomputing centers to schedule parallel jobs, is completely not aware of the application’s spatial preferences.

In this paper, we show that gang scheduling does not do a good job scheduling I/O intensive applications. We extend gang scheduling by adding different levels of I/O awareness, and propose three schemes. We show that all these three new schemes are better than gang scheduling for I/O intensive jobs. One of them, with the help of migration, outperforms the others significantly for all the workloads we look at.