The length of the heartbeat cycle should be configurable.

Details

Introduced a configuration parameter, mapred.heartbeats.in.second, as an expert option, that defines how many heartbeats a jobtracker can process in a second. Administrators can set this to an appropriate value based on cluster size and expected processing time on the jobtracker to achieve a balance between jobtracker scalability and latency of jobs.

Introduced a configuration parameter, mapred.heartbeats.in.second, as an expert option, that defines how many heartbeats a jobtracker can process in a second. Administrators can set this to an appropriate value based on cluster size and expected processing time on the jobtracker to achieve a balance between jobtracker scalability and latency of jobs.

Description

Currently, the hearbeat cycle is set to (# nodes / 100) in seconds. This can be too long for clusters that need to run low latency jobs. We should make the number of heartbeats that should arrive a second configurable.

Amareshwari Sriramadasu
added a comment - 21/May/09 05:17 Current heartbeat interval is set to clusterSize / 100 , and minimum interval is capped at 3seconds.
It assumes that JT can process 100 heartbeats in a second. See http://issues.apache.org/jira/browse/HADOOP-1900?focusedCommentId=12542530&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12542530
Now, if we make number of heartbeats that should arrive in a second configurable (with default value as 100) , heartbeat interval can be calculated as
heartbeatInterval = max((clusterSize / #heartbeats in a second), HEARTBEAT_INTERVAL_MIN) ;
Thoughts?

[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no tests are needed for this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

It is difficult to write unit test for this.
Tested the patch by running sort on 500 nodes with mapred.heartbeats.in.second=200.

Amareshwari Sriramadasu
added a comment - 25/May/09 10:58 test-patch result:
[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no tests are needed for this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
It is difficult to write unit test for this.
Tested the patch by running sort on 500 nodes with mapred.heartbeats.in.second=200.

Wondering which one is more intuitive, number-of-heartbeats-per-sec or heartbeat-interval. The title says heartbeat-interval should be configurable whereas the description states number-of-heartbeats-per-sec should be configurable. I personally think heartbeat-interval is easier to set and play around. Thoughts?

Regarding the test case, cant we spoof tasktracker status and invoke JobTracker.heartbeat() ? This way we can increment the tracker count and query the jobtracker for the current heartbeat interval? Thoughts?

Amar Kamat
added a comment - 27/May/09 06:17 Wondering which one is more intuitive, number-of-heartbeats-per-sec or heartbeat-interval . The title says heartbeat-interval should be configurable whereas the description states number-of-heartbeats-per-sec should be configurable. I personally think heartbeat-interval is easier to set and play around. Thoughts?
Regarding the test case, cant we spoof tasktracker status and invoke JobTracker.heartbeat() ? This way we can increment the tracker count and query the jobtracker for the current heartbeat interval? Thoughts?

Owen O'Malley
added a comment - 27/May/09 06:24 This looks good, but I wish there was a good way to set up a test case. I guess the best way would be to create a JobTracker and call the heartbeat method and observe the requested heartbeat interval.