Description

Gets an instance of a DurablePositionTracker and is not calling close() on it before then creating another instance... thus not calling close on the underlying DataFileWriter. This left the file locked in Windows and hence any delete attempts were failing...causing the Agent to spin around and around trying to delete this file and creating endless numbers of the temporary tracker files.

Phil Scala
added a comment - 13/Apr/13 00:34 Attaching my implementation of a minor change to close the DurablePositionTracker, before creating a new instance so that the underlying streams are closed.

Hari Shreedharan
added a comment - 13/Apr/13 00:44 Hi Phil,
Could you please submit a patch, rather than the file itself? You can get more info here: https://cwiki.apache.org/FLUME/how-to-contribute.html

Israel Ekpo
added a comment - 13/Apr/13 01:02 - edited Phil, in addition to the wiki link from Hari above, you can follow the steps below to create a review for the patch.
Check first to make sure that it follows the formatting guidelines and recommendations specified in the wiki page.
I will be updating the wiki page soon with these steps but in the time being, here are the steps for submitting the review for this patch:
(1) Get the latest code from GIT
git clone http://git-wip-us.apache.org/repos/asf/flume.git flume
cd flume
git checkout flume-1.4
The current branch for the next release is flume-1.4; this is why we are using "flume-1.4".
(2) Apply your changes to the code you have checked out
(3) Try to compile the code changes, create a tarball and run the unit tests to make sure that all is well with your changes
mvn clean install
It is very important to test the changes to make sure the work satisfactorily before submitting the patch
So you can configure the generated tarball and run it to see if it works for you first
(4) If all is well, then create the patches:
git diff --no-prefix > /path/to/ FLUME-0011 .patch
Use the JIRA issue ID for the patch files you generate.
If you have more than one pending patch (for different issues) to submit, you can specify the path to the file or directory where the changes are
For example:
git diff --no-prefix flume-ng-core/src/main/java/org/apache/flume/serialization/LineDeserializer.java > /path/to/ FLUME-0001 .patch
git diff --no-prefix flume-ng-doc/sphinx/Flume* > /path/to/ FLUME-0002 .patch
git diff --no-prefix flume-ng-core/src/main/java/org/apache/flume/sink/ > /path/to/ FLUME-0003 .patch
(5) Upload the patche file(s) to the JIRA issue and set the status to PATCH AVAILABLE.
(6) Create a review for your patch files here:
https://reviews.apache.org/
Pick "flume-git" as the repository and "Flume" as the group. The current branch is "flume-1.4"
(7) You can take a look at other code reviews here for some examples:
https://reviews.apache.org/groups/Flume/
Thank you for your contribution

Attached alternate fix. Instead of explicitly closing the file don't create the new instance unless we deleted the old one. Also check for file existence before attempting delete when getting next instance to appease Windows.

Note: tried to follow the instructions for creating a code review, but the review web app would not let me attach the patch "The selected file does not appear to be a diff."

Paul Chavez
added a comment - 15/Apr/13 18:18 - edited Attached alternate fix. Instead of explicitly closing the file don't create the new instance unless we deleted the old one. Also check for file existence before attempting delete when getting next instance to appease Windows.
Note: tried to follow the instructions for creating a code review, but the review web app would not let me attach the patch "The selected file does not appear to be a diff."

Hi Paul Chavez, I think you got to the bottom of this one. Yep, it's a bug and I agree with your change to ReliableSpoolingFileEventReader.java. However, I am unclear on why you made the change you did to DurablePositionTracker.java. As far as I can tell, we already check at the top of getInstance() whether the file exists, so an additional check should be unnecessary.

By the way, I had trouble applying your patch as well. I think you have a UTF BOM at the beginning of the file or something because it's messing with all my tools. I was able to manually remove those bytes and eventually apply your patch, but I bet that's why Review Board didn't like it.

Mike Percy
added a comment - 18/Apr/13 00:29 Hi Paul Chavez , I think you got to the bottom of this one. Yep, it's a bug and I agree with your change to ReliableSpoolingFileEventReader.java. However, I am unclear on why you made the change you did to DurablePositionTracker.java. As far as I can tell, we already check at the top of getInstance() whether the file exists, so an additional check should be unnecessary.
By the way, I had trouble applying your patch as well. I think you have a UTF BOM at the beginning of the file or something because it's messing with all my tools. I was able to manually remove those bytes and eventually apply your patch, but I bet that's why Review Board didn't like it.

Mike, I agree the change to DurablePositionTracker.java is not required. I put that in there thinking a code path called delete on an already deleted file and then later a co-worker found the real issue. I should have backed it out before generating the diff. Thanks for the feedback on the file attach issue, now I think I can suppress the BOM with the windows tools I use.

Paul Chavez
added a comment - 18/Apr/13 03:28 Mike, I agree the change to DurablePositionTracker.java is not required. I put that in there thinking a code path called delete on an already deleted file and then later a co-worker found the real issue. I should have backed it out before generating the diff. Thanks for the feedback on the file attach issue, now I think I can suppress the BOM with the windows tools I use.

Paul's fix is cleaner than mine, I have tested it locally and confirm that the spool source is working on my windows 7 machine. My run of the unit tests failed, though I don't believe they ever 100% passed, if Paul Chavez has confirmed the unit tests work, then I am happy.

Phil Scala
added a comment - 18/Apr/13 16:16 Paul's fix is cleaner than mine, I have tested it locally and confirm that the spool source is working on my windows 7 machine. My run of the unit tests failed, though I don't believe they ever 100% passed, if Paul Chavez has confirmed the unit tests work, then I am happy.

Update patch is attached. I have not attempted to run the unit tests, only did functional testing for this specific issue on Server 2008 R2. Review request submitted: https://reviews.apache.org/r/10607/

Paul Chavez
added a comment - 18/Apr/13 16:37 Update patch is attached. I have not attempted to run the unit tests, only did functional testing for this specific issue on Server 2008 R2. Review request submitted: https://reviews.apache.org/r/10607/

Mike Percy
added a comment - 18/Apr/13 20:03 Paul Chavez , +1 on your latest patch.
Phil Scala , thanks for reporting this issue as well as reviewing and testing this patch.
I think there are a few unit tests (particularly for Exec source) that fail on Windows, so for now since you guys have tested this manually let's go ahead with it.
Since you worked together on this I'll attribute you both with in the commit. I'm also adding you both as Flume contributors in JIRA, so you can assign JIRA tickets to yourselves now.
Paul, since we settled on your version of the change, I'm assigning this JIRA to you.

Mike Percy
added a comment - 18/Apr/13 22:49 Ah, you deleted the old one and replaced it. Looks good.
I just pushed this change to trunk and flume-1.4 branches. Thanks for your contributions Paul and Phil!