Job Scheduling in Java

On some projects, you find you need to execute certain jobs and tasks at an exactly specified time or at regular time intervals. In this article we will see how Java developers can implement such a requirement using the standard Java Timer API, and then we will focus on Quartz, an open source library for those who need some extra features in their scheduling system.

Let's, for a start, go through few common use cases that will help you recognize situations when you could need this kind of behavior. Then we will see how to find the best solution according to your functional request.

Almost every business application has some reports and statistics that are needed by its users. You can hardly imagine such a system without these reports, because the common purpose for someone to invest in such a system is the ability to collect a large amount of data and see it in a manner that can help with further business planning. A problem that can exist in creating these reports is the large amount of data that needs to be processed, which would generally put a heavy load on the database system. This heavy load decreases overall application performance and affects users that only use the system for data collecting, making it practically useless while it's generating the reports. Also, if we think about being user-friendly, a report that takes ten minutes to generate is not an example of a good response time.

We will now focus on the type of reports that can be cached, meaning those that are not needed in real time. The good news is that most reports fall into this category -- statistics on some product sales in last June, or company income in January. For these cacheable reports, a simple solution is possible: schedule report creation for times when the system is idle, or at least when the load on the database system is minimal. Let's say that you are creating an application for a local book reseller that has many offices (all in the same time zone) and that you need to generate a (possibly large) report for weekly income. You could schedule this task of generating necessary data for a time when the system isn't being used, such as every Sunday at midnight, and cache it in the database. This way, no sale operators will notice any performance problem in your application and company management will have all necessary data in a moment.

A second example that I will mention here is sending all kinds of notifications to application users, such as account expirations. This could be done using date fields in the user's data row, and creating a thread to check users on that condition, but using a scheduler in this case is surely a more elegant solution, and better from the aspect of overall system architecture (and that's important, right?). In a complex system you would have a lot of these notifications, and could find that you need a scheduler-like behavior in many other cases, so building a specific solution for every case makes a system harder to change and maintain. Instead, you should use one API to handle all of your application scheduling. That is the topic of the rest of this article.

In-House Solution

To implement a basic scheduler in your Java application, you don't need any external library. As of J2SE 1.3, Java contains the java.util.Timer and java.util.TimerTask classes that can be used for this purpose. Let's first make a little example that will help us describe all of the possibilities of this API.

All example code for this article can be downloaded from links at the end of the article.

The code above implements the example from the introduction, scheduling report generation for a time when the system is idle (in this case, Sunday at midnight).

First, we should implement a "worker" class that will actually do the scheduled job. In our example this is ReportGenerator. This class must extend from java.util.TimerTask, which implements java.lang.Runnable. All that you need to do is to override run() method with the report generation code.

Then, we schedule this object's execution using one of the Timer's scheduling methods. In this case, we use the schedule() method that accepts the date of the first execution and the period of subsequent executions in milliseconds (since we repeat this report generation every week).

As we use the scheduling features, we must be aware of the real-time guarantees that the scheduling API provides. Unfortunately, because of the nature of Java and its implementations on various platforms, thread scheduling implementations in various JVMs are inconsistent. Thus, the Timer cannot guarantee that our Task will be executed at exactly the specified moment. Our Tasks are implemented as Runnable objects and are put to sleep (with wait) for some time. Timer then wakes them up at a specified moment, but the exact moment of execution depends on the JVM's scheduling policy and how many threads are currently waiting for the processor. There are two common cases that can cause our tasks to be executed with a delay. First, a large number of threads might be waiting to be executed; second, a delay could be caused by garbage collection activity. All of these influences could be minimized using different techniques (such as running a scheduler within a different JVM or tuning some options for the garbage collector) but that is beyond the topic of this article.

In this new light, we can explain the existence of two different scheduling method groups in the Timer class: scheduling with fixed delay (schedule()) and scheduling with fixed rate (scheduleAtFixedRate()). When you are using methods from the first group, every delay in the task execution will be propagated to the subsequent task executions. In the latter case, all subsequent executions are scheduled according the time of the initial task execution, hopefully minimizing this delay. Which methods you use depends on which parameter is more important in your system.

One more thing is very important to say here: every Timer object starts a thread in the background. This behavior is not desirable in a managed environment, such as a J2EE application server, because these threads are not in the scope of the container.