{"url":"https://api.github.com/repos/apache/spark/pulls/24497","id":274744743,"node_id":"MDExOlB1bGxSZXF1ZXN0Mjc0NzQ0NzQz","html_url":"https://github.com/apache/spark/pull/24497","diff_url":"https://github.com/apache/spark/pull/24497.diff","patch_url":"https://github.com/apache/spark/pull/24497.patch","issue_url":"https://api.github.com/repos/apache/spark/issues/24497","number":24497,"state":"closed","locked":false,"title":"[SPARK-27630][CORE] Properly handle task end events from completed stages","user":{"login":"cxzl25","id":3898450,"node_id":"MDQ6VXNlcjM4OTg0NTA=","avatar_url":"https://avatars0.githubusercontent.com/u/3898450?v=4","gravatar_id":"","url":"https://api.github.com/users/cxzl25","html_url":"https://github.com/cxzl25","followers_url":"https://api.github.com/users/cxzl25/followers","following_url":"https://api.github.com/users/cxzl25/following{/other_user}","gists_url":"https://api.github.com/users/cxzl25/gists{/gist_id}","starred_url":"https://api.github.com/users/cxzl25/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/cxzl25/subscriptions","organizations_url":"https://api.github.com/users/cxzl25/orgs","repos_url":"https://api.github.com/users/cxzl25/repos","events_url":"https://api.github.com/users/cxzl25/events{/privacy}","received_events_url":"https://api.github.com/users/cxzl25/received_events","type":"User","site_admin":false},"body":"## What changes were proposed in this pull request?\r\nTrack tasks separately for each stage attempt (instead of tracking by stage), and do NOT reset the numRunningTasks to 0 on StageCompleted.\r\n\r\nIn the case of stage retry, the `taskEnd` event from the zombie stage sometimes makes the number of `totalRunningTasks` negative, which will causes the job to get stuck.\r\nSimilar problem also exists with `stageIdToTaskIndices` & `stageIdToSpeculativeTaskIndices`.\r\nIf it is a failed `taskEnd` event of the zombie stage, this will cause `stageIdToTaskIndices` or `stageIdToSpeculativeTaskIndices` to remove the task index of the active stage, and the number of `totalPendingTasks` will increase unexpectedly.\r\n## How was this patch tested?\r\nunit test properly handle task end events from completed stages\r\n","created_at":"2019-04-30T14:02:56Z","updated_at":"2019-06-25T19:34:34Z","closed_at":"2019-06-25T19:34:34Z","merged_at":null,"merge_commit_sha":"cb3303afd49399d4164940ac1e2c277da70af44e","assignee":null,"assignees":[],"requested_reviewers":[],"requested_teams":[],"labels":[{"id":1405801482,"node_id":"MDU6TGFiZWwxNDA1ODAxNDgy","url":"https://api.github.com/repos/apache/spark/labels/SPARK%20CORE","name":"SPARK CORE","color":"ededed","default":false,"description":null}],"milestone":null,"draft":false,"commits_url":"https://api.github.com/repos/apache/spark/pulls/24497/commits","review_comments_url":"https://api.github.com/repos/apache/spark/pulls/24497/comments","review_comment_url":"https://api.github.com/repos/apache/spark/pulls/comments{/number}","comments_url":"https://api.github.com/repos/apache/spark/issues/24497/comments","statuses_url":"https://api.github.com/repos/apache/spark/statuses/b91965098c3c2b71d5adda9d501ee46cf14831ed","head":{"label":"cxzl25:fix_stuck_job_follow_up","ref":"fix_stuck_job_follow_up","sha":"b91965098c3c2b71d5adda9d501ee46cf14831ed","user":{"login":"cxzl25","id":3898450,"node_id":"MDQ6VXNlcjM4OTg0NTA=","avatar_url":"https://avatars0.githubusercontent.com/u/3898450?v=4","gravatar_id":"","url":"https://api.github.com/users/cxzl25","html_url":"https://github.com/cxzl25","followers_url":"https://api.github.com/users/cxzl25/followers","following_url":"https://api.github.com/users/cxzl25/following{/other_user}","gists_url":"https://api.github.com/users/cxzl25/gists{/gist_id}","starred_url":"https://api.github.com/users/cxzl25/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/cxzl25/subscriptions","organizations_url":"https://api.github.com/users/cxzl25/orgs","repos_url":"https://api.github.com/users/cxzl25/repos","events_url":"https://api.github.com/users/cxzl25/events{/privacy}","received_events_url":"https://api.github.com/users/cxzl25/received_events","type":"User","site_admin":false},"repo":{"id":119043888,"node_id":"MDEwOlJlcG9zaXRvcnkxMTkwNDM4ODg=","name":"spark","full_name":"cxzl25/spark","private":false,"owner":{"login":"cxzl25","id":3898450,"node_id":"MDQ6VXNlcjM4OTg0NTA=","avatar_url":"https://avatars0.githubusercontent.com/u/3898450?v=4","gravatar_id":"","url":"https://api.github.com/users/cxzl25","html_url":"https://github.com/cxzl25","followers_url":"https://api.github.com/users/cxzl25/followers","following_url":"https://api.github.com/users/cxzl25/following{/other_user}","gists_url":"https://api.github.com/users/cxzl25/gists{/gist_id}","starred_url":"https://api.github.com/users/cxzl25/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/cxzl25/subscriptions","organizations_url":"https://api.github.com/users/cxzl25/orgs","repos_url":"https://api.github.com/users/cxzl25/repos","events_url":"https://api.github.com/users/cxzl25/events{/privacy}","received_events_url":"https://api.github.com/users/cxzl25/received_events","type":"User","site_admin":false},"html_url":"https://github.com/cxzl25/spark","description":"Mirror of Apache Spark","fork":true,"url":"https://api.github.com/repos/cxzl25/spark","forks_url":"https://api.github.com/repos/cxzl25/spark/forks","keys_url":"https://api.github.com/repos/cxzl25/spark/keys{/key_id}","collaborators_url":"https://api.github.com/repos/cxzl25/spark/collaborators{/collaborator}","teams_url":"https://api.github.com/repos/cxzl25/spark/teams","hooks_url":"https://api.github.com/repos/cxzl25/spark/hooks","issue_events_url":"https://api.github.com/repos/cxzl25/spark/issues/events{/number}","events_url":"https://api.github.com/repos/cxzl25/spark/events","assignees_url":"https://api.github.com/repos/cxzl25/spark/assignees{/user}","branches_url":"https://api.github.com/repos/cxzl25/spark/branches{/branch}","tags_url":"https://api.github.com/repos/cxzl25/spark/tags","blobs_url":"https://api.github.com/repos/cxzl25/spark/git/blobs{/sha}","git_tags_url":"https://api.github.com/repos/cxzl25/spark/git/tags{/sha}","git_refs_url":"https://api.github.com/repos/cxzl25/spark/git/refs{/sha}","trees_url":"https://api.github.com/repos/cxzl25/spark/git/trees{/sha}","statuses_url":"https://api.github.com/repos/cxzl25/spark/statuses/{sha}","languages_url":"https://api.github.com/repos/cxzl25/spark/languages","stargazers_url":"https://api.github.com/repos/cxzl25/spark/stargazers","contributors_url":"https://api.github.com/repos/cxzl25/spark/contributors","subscribers_url":"https://api.github.com/repos/cxzl25/spark/subscribers","subscription_url":"https://api.github.com/repos/cxzl25/spark/subscription","commits_url":"https://api.github.com/repos/cxzl25/spark/commits{/sha}","git_commits_url":"https://api.github.com/repos/cxzl25/spark/git/commits{/sha}","comments_url":"https://api.github.com/repos/cxzl25/spark/comments{/number}","issue_comment_url":"https://api.github.com/repos/cxzl25/spark/issues/comments{/number}","contents_url":"https://api.github.com/repos/cxzl25/spark/contents/{+path}","compare_url":"https://api.github.com/repos/cxzl25/spark/compare/{base}...{head}","merges_url":"https://api.github.com/repos/cxzl25/spark/merges","archive_url":"https://api.github.com/repos/cxzl25/spark/{archive_format}{/ref}","downloads_url":"https://api.github.com/repos/cxzl25/spark/downloads","issues_url":"https://api.github.com/repos/cxzl25/spark/issues{/number}","pulls_url":"https://api.github.com/repos/cxzl25/spark/pulls{/number}","milestones_url":"https://api.github.com/repos/cxzl25/spark/milestones{/number}","notifications_url":"https://api.github.com/repos/cxzl25/spark/notifications{?since,all,participating}","labels_url":"https://api.github.com/repos/cxzl25/spark/labels{/name}","releases_url":"https://api.github.com/repos/cxzl25/spark/releases{/id}","deployments_url":"https://api.github.com/repos/cxzl25/spark/deployments","created_at":"2018-01-26T11:26:14Z","updated_at":"2019-06-17T14:50:48Z","pushed_at":"2020-05-04T07:47:34Z","git_url":"git://github.com/cxzl25/spark.git","ssh_url":"git@github.com:cxzl25/spark.git","clone_url":"https://github.com/cxzl25/spark.git","svn_url":"https://github.com/cxzl25/spark","homepage":null,"size":318919,"stargazers_count":0,"watchers_count":0,"language":"Scala","has_issues":false,"has_projects":true,"has_downloads":true,"has_wiki":false,"has_pages":false,"forks_count":0,"mirror_url":null,"archived":false,"disabled":false,"open_issues_count":0,"license":{"key":"apache-2.0","name":"Apache License 2.0","spdx_id":"Apache-2.0","url":"https://api.github.com/licenses/apache-2.0","node_id":"MDc6TGljZW5zZTI="},"forks":0,"open_issues":0,"watchers":0,"default_branch":"master"}},"base":{"label":"apache:master","ref":"master","sha":"b7b445255370e29d6b420b02389b022a1c65942e","user":{"login":"apache","id":47359,"node_id":"MDEyOk9yZ2FuaXphdGlvbjQ3MzU5","avatar_url":"https://avatars0.githubusercontent.com/u/47359?v=4","gravatar_id":"","url":"https://api.github.com/users/apache","html_url":"https://github.com/apache","followers_url":"https://api.github.com/users/apache/followers","following_url":"https://api.github.com/users/apache/following{/other_user}","gists_url":"https://api.github.com/users/apache/gists{/gist_id}","starred_url":"https://api.github.com/users/apache/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/apache/subscriptions","organizations_url":"https://api.github.com/users/apache/orgs","repos_url":"https://api.github.com/users/apache/repos","events_url":"https://api.github.com/users/apache/events{/privacy}","received_events_url":"https://api.github.com/users/apache/received_events","type":"Organization","site_admin":false},"repo":{"id":17165658,"node_id":"MDEwOlJlcG9zaXRvcnkxNzE2NTY1OA==","name":"spark","full_name":"apache/spark","private":false,"owner":{"login":"apache","id":47359,"node_id":"MDEyOk9yZ2FuaXphdGlvbjQ3MzU5","avatar_url":"https://avatars0.githubusercontent.com/u/47359?v=4","gravatar_id":"","url":"https://api.github.com/users/apache","html_url":"https://github.com/apache","followers_url":"https://api.github.com/users/apache/followers","following_url":"https://api.github.com/users/apache/following{/other_user}","gists_url":"https://api.github.com/users/apache/gists{/gist_id}","starred_url":"https://api.github.com/users/apache/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/apache/subscriptions","organizations_url":"https://api.github.com/users/apache/orgs","repos_url":"https://api.github.com/users/apache/repos","events_url":"https://api.github.com/users/apache/events{/privacy}","received_events_url":"https://api.github.com/users/apache/received_events","type":"Organization","site_admin":false},"html_url":"https://github.com/apache/spark","description":"Apache Spark - A unified analytics engine for large-scale data processing","fork":false,"url":"https://api.github.com/repos/apache/spark","forks_url":"https://api.github.com/repos/apache/spark/forks","keys_url":"https://api.github.com/repos/apache/spark/keys{/key_id}","collaborators_url":"https://api.github.com/repos/apache/spark/collaborators{/collaborator}","teams_url":"https://api.github.com/repos/apache/spark/teams","hooks_url":"https://api.github.com/repos/apache/spark/hooks","issue_events_url":"https://api.github.com/repos/apache/spark/issues/events{/number}","events_url":"https://api.github.com/repos/apache/spark/events","assignees_url":"https://api.github.com/repos/apache/spark/assignees{/user}","branches_url":"https://api.github.com/repos/apache/spark/branches{/branch}","tags_url":"https://api.github.com/repos/apache/spark/tags","blobs_url":"https://api.github.com/repos/apache/spark/git/blobs{/sha}","git_tags_url":"https://api.github.com/repos/apache/spark/git/tags{/sha}","git_refs_url":"https://api.github.com/repos/apache/spark/git/refs{/sha}","trees_url":"https://api.github.com/repos/apache/spark/git/trees{/sha}","statuses_url":"https://api.github.com/repos/apache/spark/statuses/{sha}","languages_url":"https://api.github.com/repos/apache/spark/languages","stargazers_url":"https://api.github.com/repos/apache/spark/stargazers","contributors_url":"https://api.github.com/repos/apache/spark/contributors","subscribers_url":"https://api.github.com/repos/apache/spark/subscribers","subscription_url":"https://api.github.com/repos/apache/spark/subscription","commits_url":"https://api.github.com/repos/apache/spark/commits{/sha}","git_commits_url":"https://api.github.com/repos/apache/spark/git/commits{/sha}","comments_url":"https://api.github.com/repos/apache/spark/comments{/number}","issue_comment_url":"https://api.github.com/repos/apache/spark/issues/comments{/number}","contents_url":"https://api.github.com/repos/apache/spark/contents/{+path}","compare_url":"https://api.github.com/repos/apache/spark/compare/{base}...{head}","merges_url":"https://api.github.com/repos/apache/spark/merges","archive_url":"https://api.github.com/repos/apache/spark/{archive_format}{/ref}","downloads_url":"https://api.github.com/repos/apache/spark/downloads","issues_url":"https://api.github.com/repos/apache/spark/issues{/number}","pulls_url":"https://api.github.com/repos/apache/spark/pulls{/number}","milestones_url":"https://api.github.com/repos/apache/spark/milestones{/number}","notifications_url":"https://api.github.com/repos/apache/spark/notifications{?since,all,participating}","labels_url":"https://api.github.com/repos/apache/spark/labels{/name}","releases_url":"https://api.github.com/repos/apache/spark/releases{/id}","deployments_url":"https://api.github.com/repos/apache/spark/deployments","created_at":"2014-02-25T08:00:08Z","updated_at":"2020-06-07T08:54:04Z","pushed_at":"2020-06-07T08:02:48Z","git_url":"git://github.com/apache/spark.git","ssh_url":"git@github.com:apache/spark.git","clone_url":"https://github.com/apache/spark.git","svn_url":"https://github.com/apache/spark","homepage":"https://spark.apache.org/","size":331789,"stargazers_count":26298,"watchers_count":26298,"language":"Scala","has_issues":false,"has_projects":true,"has_downloads":true,"has_wiki":false,"has_pages":false,"forks_count":21876,"mirror_url":null,"archived":false,"disabled":false,"open_issues_count":212,"license":{"key":"apache-2.0","name":"Apache License 2.0","spdx_id":"Apache-2.0","url":"https://api.github.com/licenses/apache-2.0","node_id":"MDc6TGljZW5zZTI="},"forks":21876,"open_issues":212,"watchers":26298,"default_branch":"master"}},"_links":{"self":{"href":"https://api.github.com/repos/apache/spark/pulls/24497"},"html":{"href":"https://github.com/apache/spark/pull/24497"},"issue":{"href":"https://api.github.com/repos/apache/spark/issues/24497"},"comments":{"href":"https://api.github.com/repos/apache/spark/issues/24497/comments"},"review_comments":{"href":"https://api.github.com/repos/apache/spark/pulls/24497/comments"},"review_comment":{"href":"https://api.github.com/repos/apache/spark/pulls/comments{/number}"},"commits":{"href":"https://api.github.com/repos/apache/spark/pulls/24497/commits"},"statuses":{"href":"https://api.github.com/repos/apache/spark/statuses/b91965098c3c2b71d5adda9d501ee46cf14831ed"}},"author_association":"CONTRIBUTOR","merged":false,"mergeable":null,"rebaseable":null,"mergeable_state":"unknown","merged_by":null,"comments":41,"review_comments":32,"maintainer_can_modify":false,"commits":19,"additions":104,"deletions":55,"changed_files":5}