[ https://issues.apache.org/jira/browse/FLINK-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669775#comment-15669775
]
ASF GitHub Bot commented on FLINK-5073:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/2815
[FLINK-5073] Use Executor to run ZooKeeper callbacks in ZooKeeperStateHandleStore
Use dedicated Executor to run ZooKeeper callbacks in ZooKeeperStateHandleStore instead
of running it in the ZooKeeper client's thread. The callback can be blocking because it
discards state which might entail deleting files from disk.
Review @uce.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink fixZooKeeperDelete
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2815.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2815
----
commit 63ec894de907f38ffff362572c87aef1808062d0
Author: Till Rohrmann <trohrmann@apache.org>
Date: 2016-11-15T21:45:04Z
[FLINK-5073] Use Executor to run ZooKeeper callbacks in ZooKeeperStateHandleStore
Use dedicated Executor to run ZooKeeper callbacks in ZooKeeperStateHandleStore instead
of running it in the ZooKeeper client's thread. The callback can be blocking because it
discards state which might entail deleting files from disk.
----
> ZooKeeperCompleteCheckpointStore executes blocking delete operation in ZooKeeper client
thread
> ----------------------------------------------------------------------------------------------
>
> Key: FLINK-5073
> URL: https://issues.apache.org/jira/browse/FLINK-5073
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination
> Affects Versions: 1.2.0, 1.1.3
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> When deleting completed checkpoints from the {{ZooKeeperCompletedCheckpointStore}}, one
first tries to delete the meta state handle from ZooKeeper and then deletes the actual checkpoint
in a callback from the delete operation. This callback is executed by the ZooKeeper client's
main thread which is problematic, because it blocks the ZooKeeper client. If a delete operation
takes longer than it takes to complete a checkpoint, then it might even happen that delete
operations of outdated checkpoints are piling up because they are effectively executed sequentially.
> I propose to execute the delete operations by a dedicated {{Executor}} so that we keep
the client's main thread free to do ZooKeeper related work.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)