I uploaded a patch that contains the test. The plan is to send a fixed number of events in a known sequence, and then confirm that the sink has received all of them and in the same order. The rolling file sink can be used for this. I'm not sure I understand your second question: "Can the cope of integration overlay Source output and Sink input?". Could you please clarify?

Will McQueen
added a comment - 18/Aug/12 01:33 Hi Denny,
I uploaded a patch that contains the test. The plan is to send a fixed number of events in a known sequence, and then confirm that the sink has received all of them and in the same order. The rolling file sink can be used for this. I'm not sure I understand your second question: "Can the cope of integration overlay Source output and Sink input?". Could you please clarify?
Cheers,
Will

Sorry for my spilling mistake. The second question is "Can the scope of integration overlay Source output and Sink input?". I have reviewed your patch, thus I know the answer is Yes.

You simulate the regular events flow. What my interesting is the failure both file channel and Sink in batch operation(writing batch events to file channel or consuming batch events to HDFS Sink). I would like to confirm one primary basic point : batch operation is integrated at single transaction. Either all events wrote to file channel successfully, or all events roll-back from failure. It can delete all existent events from file channel at whole transaction.

Denny Ye
added a comment - 18/Aug/12 06:16 Sorry for my spilling mistake. The second question is "Can the scope of integration overlay Source output and Sink input?". I have reviewed your patch, thus I know the answer is Yes.
You simulate the regular events flow. What my interesting is the failure both file channel and Sink in batch operation(writing batch events to file channel or consuming batch events to HDFS Sink). I would like to confirm one primary basic point : batch operation is integrated at single transaction. Either all events wrote to file channel successfully, or all events roll-back from failure. It can delete all existent events from file channel at whole transaction.

If I understand correctly, you would like to see an integration test that confirms the transactional nature of batches used by a source or sink, in the context of a file channel. Is that right? So some test cases might be:

Part 1: Testing transaction nature of source and file channel
******

Part 1-1
========
1. Configure the file channel with a capacity of 10 events, a source, and no sink (we're staging the file channel with events here).
2. The source sends 9 events to the file channel
3. The source then sends batch request to put 2 events into FC.

The expectations are:
1. The batch request sent by the source (containing the 2 events) should fail. For this test, ensure that the source does not attempt to retry the request.
2. The channel at this point should contain only 9 events.

Part 1-2:
========
1. Reconfigure the agent so that the source is removed, and a sink is added.
2. The sink takes all events from the file channel.

The expectations for the sink are:
1. The sink receives only 9 events
2. Those events are the same events that were sent by the source (same payload, same headers)
3. Those events arrive in the same order as how the source sent them to the file channel.

Part 2: Testing transaction nature of file channel and sink
======
Briefly, the test could be to set the FC's capacity to 10 events, stage the file channel with 2 events, then have the sink attempt to send those 2 events in a batch but fail. This should result in the file channel still containing those same 2 events, in the same order. One way to verify this might be to setup 2 sinks in a failover group, where the 2 events are first sent to sink#1 (higher priority), which should fail and cause sink#2 to receive those same events and put them into a file (eg, using FILE_ROLL sink) so that the number of events, event payload, and event ordering can be verified in the file that's written-out by sink#2.

Please let me know if this is what you had in mind. If so, I can open a separate ticket for these tests.

Will McQueen
added a comment - 20/Aug/12 19:31 Hi Denny,
If I understand correctly, you would like to see an integration test that confirms the transactional nature of batches used by a source or sink, in the context of a file channel. Is that right? So some test cases might be:
Part 1: Testing transaction nature of source and file channel
******
Part 1-1
========
1. Configure the file channel with a capacity of 10 events, a source, and no sink (we're staging the file channel with events here).
2. The source sends 9 events to the file channel
3. The source then sends batch request to put 2 events into FC.
The expectations are:
1. The batch request sent by the source (containing the 2 events) should fail. For this test, ensure that the source does not attempt to retry the request.
2. The channel at this point should contain only 9 events.
Part 1-2:
========
1. Reconfigure the agent so that the source is removed, and a sink is added.
2. The sink takes all events from the file channel.
The expectations for the sink are:
1. The sink receives only 9 events
2. Those events are the same events that were sent by the source (same payload, same headers)
3. Those events arrive in the same order as how the source sent them to the file channel.
Part 2: Testing transaction nature of file channel and sink
======
Briefly, the test could be to set the FC's capacity to 10 events, stage the file channel with 2 events, then have the sink attempt to send those 2 events in a batch but fail. This should result in the file channel still containing those same 2 events, in the same order. One way to verify this might be to setup 2 sinks in a failover group, where the 2 events are first sent to sink#1 (higher priority), which should fail and cause sink#2 to receive those same events and put them into a file (eg, using FILE_ROLL sink) so that the number of events, event payload, and event ordering can be verified in the file that's written-out by sink#2.
Please let me know if this is what you had in mind. If so, I can open a separate ticket for these tests.
Cheers,
Will

Part 1-1
========
Source wants to write 10 events to FC. There has file failure from FC while Source has wrote 5 events already. Roll back should happen.

Expect result:
5 events that have been recorded into file should be deleted.

Part 1-2
========
Sink is consuming events into downstream, or HDFS asynchronously.All of events retrieved from FC can be put into 'takeList'. Sink failure from downstream can make roll back from takeLisk.

Denny Ye
added a comment - 21/Aug/12 04:17 hi Well, test plan of mentioned above might be looks like :
Part 1-1
========
Source wants to write 10 events to FC. There has file failure from FC while Source has wrote 5 events already. Roll back should happen.
Expect result:
5 events that have been recorded into file should be deleted.
Part 1-2
========
Sink is consuming events into downstream, or HDFS asynchronously.All of events retrieved from FC can be put into 'takeList'. Sink failure from downstream can make roll back from takeLisk.
Expect result:
No loss, no repeated event

Let's take these tests one at a time. This initial patch provides a foundation for more coverage going forward. Certainly we want additional coverage on the File Channel integration tests.

Denny, I don't want to lose your thoughts here - I agree that we need coverage on the mid-transaction failure case if we don't already have it. So let's take that discussion to a new JIRA with that scope.

Mike Percy
added a comment - 23/Aug/12 09:20 Let's take these tests one at a time. This initial patch provides a foundation for more coverage going forward. Certainly we want additional coverage on the File Channel integration tests.
Denny, I don't want to lose your thoughts here - I agree that we need coverage on the mid-transaction failure case if we don't already have it. So let's take that discussion to a new JIRA with that scope.