HTTP source handler doesn't allow for responses

Details

Type: Improvement

Status:Patch Available

Priority: Major

Resolution:
Unresolved

Affects Version/s:
None

Fix Version/s:
None

Component/s:
None

Labels:

None

Description

Existing HTTP source handlers recieve events via a HTTPServletRequest. This works, but because the handler doesn't have access to the HTTPServletResponse, there is no ability to return a response. This makes it unsuitable for some sort of protocol that relies on bidirectional communication.

My solution: In addition to the existing HTTPSource interface, I've added a BidirectionalHTTPSource interface that is provided the servlet response as a parameter. I've made some changes in the HTTP source allow for both types to co-exist, and my changes shouldn't affect anyone who is already using the existing interface.

Jeremy Karlson
added a comment - 24/Mar/14 21:30 Hari Shreedharan :
Is there anything else I need to do to this? I'd like to get it into 1.5 if possible, but if it needs further review or work, I'd like to keep that process moving along anyway.
Thanks!

Gabriel Commeau
added a comment - 13/Mar/14 06:24 I’m done with this patch: I modified the unit test (and added one test case), as well as the documentation.
Please feel free to modify it further, but as far as I’m concerned, this is ready to go.

Jeremy Karlson
added a comment - 12/Mar/14 05:15 Hey, sorry for the delay. Unfortunately, I can't figure out the second patch... I think it's missing a file.
If we can get this sorted out by Monday, do you think it will make 1.5?

Gabriel Commeau
added a comment - 01/Mar/14 04:51 So here is my contribution. I updated the javadoc, but not the documentation (yet). Also, it’s worth a few additional unit tests, but again, let’s worry about that after we agree on the changes.

As discussed on the mailing list, here are a few additional requirements for this ticket:

Add in the BidirectionalHTTPSourceHandler interface methods to handle the result of the servlet committing the batch of events to the channel: whether the commit is successful or throws an exception, the HTTP response comes from the BidirectionalHTTPSourceHandler.

Allow through configuration for many handlers (HTTPSourceHandler and the BidirectionalHTTPSourceHandler) to be added to different paths. In order not to break the backward compatibility, keep the “.handler” configuration parameter, and add to it “.handlers”, which basically takes a list of handlers.

Gabriel Commeau
added a comment - 01/Mar/14 04:26 As discussed on the mailing list, here are a few additional requirements for this ticket:
Add in the BidirectionalHTTPSourceHandler interface methods to handle the result of the servlet committing the batch of events to the channel: whether the commit is successful or throws an exception, the HTTP response comes from the BidirectionalHTTPSourceHandler.
Allow through configuration for many handlers (HTTPSourceHandler and the BidirectionalHTTPSourceHandler) to be added to different paths. In order not to break the backward compatibility, keep the “.handler” configuration parameter, and add to it “.handlers”, which basically takes a list of handlers.

In this case, I am not talking about handlers in terms of servlet handlers, I am talking about HTTP Source Handler instances. The problem with the HttpResponse going into the HTTP Source Handler and that writing stuff before the source calls sendError() is that the data could already have been pushed out from the HttpResponse buffer (I believe that this is a buffered stream and the response can get pushed out, even without calling flush). I think the Flume source should call sendError() anyway, but should also allow the handler to be able to send a response. Also, nothing really stops the Source handler from calling flush (unless we wrap the response object in a FlumeHttpResponse class object that ignores flush calls).

I am not a web developer, so my knowledge about HTTP servlets is pretty limited.

Hari Shreedharan
added a comment - 28/Feb/14 20:50 In this case, I am not talking about handlers in terms of servlet handlers, I am talking about HTTP Source Handler instances. The problem with the HttpResponse going into the HTTP Source Handler and that writing stuff before the source calls sendError() is that the data could already have been pushed out from the HttpResponse buffer (I believe that this is a buffered stream and the response can get pushed out, even without calling flush). I think the Flume source should call sendError() anyway, but should also allow the handler to be able to send a response. Also, nothing really stops the Source handler from calling flush (unless we wrap the response object in a FlumeHttpResponse class object that ignores flush calls).
I am not a web developer, so my knowledge about HTTP servlets is pretty limited.

My understanding of the servlet spec is that the handler can write anything it wants to the response stream. In the event sendError() is called, that response is discarded. (As long as flush() was not previously called on the stream, which I don't think they should be doing anyway.) So the transactional error cases end up doing this:

passes the input and output to the handler

assuming the events are well formed, handler generates a list of events and writes a success response to the output

source attempts to write events to the channel and fails

source calls sendError() on the output, which aborts any response the handler may have made and returns a HTTP error code

In my mind, this is actually a feature - clearer separation of concerns. The handler is only concerned with receiving and generating events. It does't know, and doesn't have to know, that something went wrong in the channel writing. That is returned as a 500 (server error) or 503 (server unavailable), which I think actually makes sense... It's not an error internal to the handler protocol, it's something in the server (Flume).

I suppose I should have mentioned the "don't call flush()" thing in the doc update.

Jeremy Karlson
added a comment - 28/Feb/14 20:34 I understand your concerns. I did think about the error conditions and considered something like what you proposed. In the end I submitted it as-is because of this:
http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletResponse.html#sendError%28int%29
My understanding of the servlet spec is that the handler can write anything it wants to the response stream. In the event sendError() is called, that response is discarded. (As long as flush() was not previously called on the stream, which I don't think they should be doing anyway.) So the transactional error cases end up doing this:
passes the input and output to the handler
assuming the events are well formed, handler generates a list of events and writes a success response to the output
source attempts to write events to the channel and fails
source calls sendError() on the output, which aborts any response the handler may have made and returns a HTTP error code
In my mind, this is actually a feature - clearer separation of concerns. The handler is only concerned with receiving and generating events. It does't know, and doesn't have to know, that something went wrong in the channel writing. That is returned as a 500 (server error) or 503 (server unavailable), which I think actually makes sense... It's not an error internal to the handler protocol, it's something in the server (Flume).
I suppose I should have mentioned the "don't call flush()" thing in the doc update.
What do you think?

I like the idea of this jira, but I am not entirely sure of having a custom handler. I'd prefer not move the transactional semantics out of the source - I don't want broken handlers causing data loss etc. Flexibility is good until custom code starts breaking guarantees.

One of the reasons I did not make the handler bidrectional was that there was no way to tell the handler that the channel writes may have failed, and so the last message which the handler may have sent to the client may have been incorrect. Even now, you'd have to call the handler a 2nd time saying that things failed and the handler must send out an "amendment" to the previous message - which is why the handler is limited currently.

One way around this would be to add a 2nd Servlet for bidirectional handlers, that:

passes the input to the handler, and gets a list of events

writes events to the channel

successful commit: calls a success method in the handler (which gets the HttpResponse object as a parameter)

failed commit: calls a failure method in the handler (also gets the HttpResponse object).
This ensures that the handler can send the correct message based on successful or failed commit.

Hari Shreedharan
added a comment - 28/Feb/14 19:59 I like the idea of this jira, but I am not entirely sure of having a custom handler. I'd prefer not move the transactional semantics out of the source - I don't want broken handlers causing data loss etc. Flexibility is good until custom code starts breaking guarantees.
One of the reasons I did not make the handler bidrectional was that there was no way to tell the handler that the channel writes may have failed, and so the last message which the handler may have sent to the client may have been incorrect. Even now, you'd have to call the handler a 2nd time saying that things failed and the handler must send out an "amendment" to the previous message - which is why the handler is limited currently.
One way around this would be to add a 2nd Servlet for bidirectional handlers, that:
passes the input to the handler, and gets a list of events
writes events to the channel
successful commit: calls a success method in the handler (which gets the HttpResponse object as a parameter)
failed commit: calls a failure method in the handler (also gets the HttpResponse object).
This ensures that the handler can send the correct message based on successful or failed commit.
Does that make sense?