Bryan Duxbury
added a comment - 20/Jun/11 17:57 Actually, readFrame() is free to read the frame in chunks of any size, avoiding the need for upfront allocation
This is true for FramedTransport, but not for the nonblocking servers themselves. We make an important distinction between the IO part of the request and the deserialization part of the request.
this can be solved for all transports by having the deserializer ask for chunks instead of walking a whole frame
I don't think you should underestimate the amount of work this would be. It's a completely different scheme for deserialization.

Actually, readFrame() is free to read the frame in chunks of any size, avoiding the need for upfront allocation, though when (if) fully read they would need to be assembled into a single byte[] given the current deserializer interface.

It seems to me that this can be solved for all transports by having the deserializer ask for chunks instead of walking a whole frame: it's trivial to pretend a complete buffer is a nonblocking stream but pretending a blocking stream is a buffer leads to difficulties like those discussed here.

Toby Thain
added a comment - 18/Jun/11 15:41 Actually, readFrame() is free to read the frame in chunks of any size, avoiding the need for upfront allocation, though when (if) fully read they would need to be assembled into a single byte[] given the current deserializer interface.
It seems to me that this can be solved for all transports by having the deserializer ask for chunks instead of walking a whole frame: it's trivial to pretend a complete buffer is a nonblocking stream but pretending a blocking stream is a buffer leads to difficulties like those discussed here.

Bryan Duxbury
added a comment - 18/Jun/11 01:45 I think 1MB is too small. This is a really tricky issue, because the bottom line is that people are definitely going to bump their heads into it before they go looking for the option, if they ever do.

@Ryan - I'd recommend less, maybe 1MB, and make this limit (and how to lift it) clear in the doc, along with the recommendation that Thrift not be used public-facing, due to these DoS risks. Or, better, float the proposal on the mailing list; I bet that even 1MB covers the vast majority of use cases.

Toby Thain
added a comment - 18/Jun/11 00:52 @Ryan - I'd recommend less, maybe 1MB, and make this limit (and how to lift it) clear in the doc, along with the recommendation that Thrift not be used public-facing, due to these DoS risks. Or, better, float the proposal on the mailing list; I bet that even 1MB covers the vast majority of use cases.

The default maxLength is now 2GB. I propose we set it to something like 64MB (or even less). I think that's still significantly bigger than the common use case. We should make the normal case (relatively small frames) safer and put the burden of customization on the rare cases.

Ryan King
added a comment - 17/Jun/11 19:13 Bryan-
The default maxLength is now 2GB. I propose we set it to something like 64MB (or even less). I think that's still significantly bigger than the common use case. We should make the normal case (relatively small frames) safer and put the burden of customization on the rare cases.

The "partial allocation" approach actually isn't workable. The Thrift deserialization code depends on the call stack for keeping state, so if it runs out of data, there's no nonblocking way for it to give up control and wait for more data to arrive. The whole premise of the nonblocking server is that we can know when all the bytes for a message have arrived without doing any actual deserialization, hence the requirement for framed transport.

Bryan Duxbury
added a comment - 17/Jun/11 16:03 The "partial allocation" approach actually isn't workable. The Thrift deserialization code depends on the call stack for keeping state, so if it runs out of data, there's no nonblocking way for it to give up control and wait for more data to arrive. The whole premise of the nonblocking server is that we can know when all the bytes for a message have arrived without doing any actual deserialization, hence the requirement for framed transport.

I agree that (apart from a default limit, which should be a decent workaround), solving this is tricky. I haven't looked at the code yet, but it might be possible to guard against this by simply not allocating the entire buffer immediately, but rather a smaller interim buffer, and waiting to see if a valid message is forthcoming. This would work best if the rest of the unmarshalling code had thorough validity checks. This should avoid the attempt to allocate a meaninglessly large buffer in the case that the message can be rejected on other grounds. Of course, an attacker who's willing to craft a valid, extremely large Thrift message can still eat up server resources

Toby Thain
added a comment - 17/Jun/11 13:09 @Bryan - yes, I inferred that from the nature of the crash.
I agree that (apart from a default limit, which should be a decent workaround), solving this is tricky. I haven't looked at the code yet, but it might be possible to guard against this by simply not allocating the entire buffer immediately, but rather a smaller interim buffer, and waiting to see if a valid message is forthcoming. This would work best if the rest of the unmarshalling code had thorough validity checks. This should avoid the attempt to allocate a meaninglessly large buffer in the case that the message can be rejected on other grounds. Of course, an attacker who's willing to craft a valid, extremely large Thrift message can still eat up server resources
I can look at this approach if you think it has merit.

@Toby - The problem is that "typing a bunch of characters" is indistinguishable from the first few bytes of a really long message. I'd really like to have a good solution to this problem so I can point to that in the future, but so far no one has managed to propose a workable solution.

@Ryan - I'd accept a patch for decreasing the default size to something reasonably large, if you spun a good argument for it.

Bryan Duxbury
added a comment - 17/Jun/11 04:33 @Toby - The problem is that "typing a bunch of characters" is indistinguishable from the first few bytes of a really long message. I'd really like to have a good solution to this problem so I can point to that in the future, but so far no one has managed to propose a workable solution.
@Ryan - I'd accept a patch for decreasing the default size to something reasonably large, if you spun a good argument for it.

Toby Thain
added a comment - 17/Jun/11 01:07 Fair enough, but:
it will inevitably be used in this way, unless (even if) there is a big disclaimer somewhere (did I miss it?)
doesn't Facebook use it for public facing services? What about their puzzle server?

So you're saying that it's acceptable for the server to crash if anyone merely telnets into a public facing Thrift server (0.6.1) and types a couple of characters? And you don't consider that a security bug?

Toby Thain
added a comment - 17/Jun/11 00:49 So you're saying that it's acceptable for the server to crash if anyone merely telnets into a public facing Thrift server (0.6.1) and types a couple of characters? And you don't consider that a security bug?