[beast] Supporting ICY 200 OK

[beast] Supporting ICY 200 OK

Bjorn brought up the point that there are some niche cases where
applications do not strictly follow the HTTP message syntax for
responses. Specifically, for Shoutcast(?) / Icecast mp3 servers the
status-line in the response looks something like this:

ICY 200 OK\r\n

I have thought about ways to support this directly in Beast but I have
decided against it. "ICY 200" is not HTTP according to the RFC, and
trying to accommodate this use case places needless technical debt in
public interfaces.

However, I came up with a very simple technique to make this work with
Beast by writing just a tiny bit of extra code. The solution is to
implement a stream wrapper which meets the requirements of
SyncReadStream[1] and/or AsyncReadStream[2] and forwards the calls to
read to the "next layer" object, which will be the actual socket or
ssl::stream.

The implementation of the icy_stream wrapper can check to see if the
first 3 bytes received from the next layer match "ICY" and if so,
return a buffer to the caller which replaces those characters with
"HTTP/1.1" (or "HTTP/1.0"). With this wrapper, calls to
beast::http::read or beast::http::async_read can parse the custom
response as a regular HTTP message. The wrapper can embed state
information to inform the caller if the "ICY" string was not received.

Re: [beast] Supporting ICY 200 OK

On 10/02/2017 12:06 AM, Vinnie Falco wrote:

> However, I came up with a very simple technique to make this work with
> Beast by writing just a tiny bit of extra code. The solution is to
> implement a stream wrapper which meets the requirements of
> SyncReadStream[1] and/or AsyncReadStream[2] and forwards the calls to
> read to the "next layer" object, which will be the actual socket or
> ssl::stream.

Re: [beast] Supporting ICY 200 OK

> However, I came up with a very simple technique to make this work with
> Beast by writing just a tiny bit of extra code. The solution is to
> implement a stream wrapper which meets the requirements of
> SyncReadStream[1] and/or AsyncReadStream[2] and forwards the calls to
> read to the "next layer" object, which will be the actual socket or
> ssl::stream.
>
> The implementation of the icy_stream wrapper can check to see if the
> first 3 bytes received from the next layer match "ICY" and if so,
> return a buffer to the caller which replaces those characters with
> "HTTP/1.1" (or "HTTP/1.0"). With this wrapper, calls to
> beast::http::read or beast::http::async_read can parse the custom
> response as a regular HTTP message. The wrapper can embed state
> information to inform the caller if the "ICY" string was not received.
>
> The declaration for such a wrapper might look like this:
>
> template<class NextLayer>
> class icy_stream;
>
> Usage for the wrapper:
>
> asio::ip::tcp::socket sock{ios};
> icy_stream<asio::ip::tcp::socket&> stream{sock};
> beast::http::response<beast::http::empty_body> res;
> beast::multi_buffer buffer;
> beast::http::read(stream, buffer, res);
>
> This technique may of course be generalized to support any desired
> similar but non-standard HTTP requests or responses in a fashion that
> is completely transparent to Beast.
>

You could also try to standardize HTTP/1.1 and HTTP/2.0 under the same
interface.

The version attribute in the message model[1] might not make sense to all
HTTP backends.

This discussion reminds me of the difference between inductive reasoning
and deductive reasoning. Only the immediate problem is pursued and no
attention is given trying to solve the general problem at hand.

Also, if beast::http::read is what I'll call, then this implies I need to
throw my abstractions in `beast::http` namespace (not a problem, just an
observation).

However, a problem is the fact that this translation in the stream needs to
happen to actually use the Beast message framework. Why a “virtual HTTP
stream” needs to exist beyond the virtual field? Why does it need to go to
memory? Why isn't just limited to the API?

Boost.Http managed to expose this amount of feature without exposing a
parser at all.

Re: [beast] Supporting ICY 200 OK

On Tue, Oct 3, 2017 at 6:06 PM, Vinícius dos Santos Oliveira
<[hidden email]> wrote:
> You could also try to standardize HTTP/1.1 and HTTP/2.0 under the same
> interface.

That is incorrect. All I'm talking about is a simple wrapper which
meets the requirements of SyncReadStream and AsyncReadStream which
checks the first incoming 3 bytes of data on the stream against the
string "ICY" and translates it into "HTTP/1.1" to allow the existing
beast parser to recognize a *slightly* non-standard response format.
Note the emphasis on the term *slightly*. I have designed a reasonable
solution to enable this niche use-case to work without incurring
significant technical debt for something that is clearly outside the
scope of rfc7230.

And you are suggesting that the same technique should be used to make
HTTP/2 appear like HTTP/1.1? Or am I not understanding your statement?

> The version attribute in the message model[1] might not make sense to all
> HTTP backends.

I'm not sure what an "HTTP backend" means, but for practical purposes
there are three versions of HTTP:

HTTP/1.0
HTTP/1.1
HTTP/2

While the content of the messages is the same (start line, zero or
more fields, optional body) the semantics of the status-line and the
fields differ between versions. For example, the absence of the
Connection field in HTTP/1.1 implies keep-alive, while the absence of
the Connection field in HTTP/1.0 implies closure. While in HTTP/2 the
Connection field is illegal and may not appear at all.

The beast message container was designed from the ground up to work
with the three versions of HTTP that we have today. In order to
capture all of the information available in a message, the container
must reflect the HTTP version that is implicitly or explicitly stated
in the message. In order to empower authors to write libraries that
work with any message (a goal of Beast), the version field must be
present.

> This discussion reminds me of the difference between inductive reasoning and
> deductive reasoning. Only the immediate problem is pursued and no attention
> is given trying to solve the general problem at hand.

That is incorrect. The beast message container is designed to be HTTP/2-ready.

> Also, if beast::http::read is what I'll call, then this implies I need to
> throw my abstractions in `beast::http` namespace

That is incorrect. Beast HTTP stream operations use abstractions for
the following objects:

Beast provides several general implementations for most of these
concepts, in the beast namespace, but objects may be passed from types
in any namespace. For example, beast stream operations work with
streams of type `boost::asio::ip::tcp::socket`, which is clearly not
in the `beast::http` namespace. This obvious counterexample disproves
your claim.

> However, a problem is the fact that this translation in the stream needs to
> happen to actually use the Beast message framework.

That is incorrect. The translation for converting the three character
sequence "ICY" at the beginning of a read stream, into "HTTP/1.1" is
not necessary to use the beast message container. Rather, it is
necessary in order to use the HTTP/1 parsers which come with beast.
The parser implementation strictly adheres to rfc7230. As the text
"ICY 200 OK\r\n" is not a valid status-line according to rfc7230, I'm
sure that you can understand why a strict parser would reject it.

Since the vast majority of consumers of HTTP libraries care about
reading actual valid HTTP and not SHOUTcast or Icecast MP3 streaming
server-specific modifications of the HTTP protocol, my solution for
developing a stream adapter to support the non-standard status-line is
elegant and pragmatic. In particular, it does not require modifying
beast library code which could introduce new security vulnerabilities.

Re: [beast] Supporting ICY 200 OK

> On Tue, Oct 3, 2017 at 6:06 PM, Vinícius dos Santos Oliveira
> <[hidden email]> wrote:
> > You could also try to standardize HTTP/1.1 and HTTP/2.0 under the same
> > interface.
>
> That is incorrect. All I'm talking about is a simple wrapper which
> meets the requirements of SyncReadStream and AsyncReadStream which
> checks the first incoming 3 bytes of data on the stream against the
> string "ICY" and translates it into "HTTP/1.1" to allow the existing
> beast parser to recognize a *slightly* non-standard response format.
> Note the emphasis on the term *slightly*. I have designed a reasonable
> solution to enable this niche use-case to work without incurring
> significant technical debt for something that is clearly outside the
> scope of rfc7230.
>
> And you are suggesting that the same technique should be used to make
> HTTP/2 appear like HTTP/1.1? Or am I not understanding your statement?
>

I thought you were trying to generalize `beast::http::read`. Now it's clear
that you're focusing on the parser. It was a misunderstanding of your
intent on my behalf.

> The version attribute in the message model[1] might not make sense to all
> > HTTP backends.
>
> I'm not sure what an "HTTP backend" means, but for practical purposes
> there are three versions of HTTP:
>
> HTTP/1.0
> HTTP/1.1
> HTTP/2
>

Add ZeroMQ to the list. The point is making applications that answer HTTP
requests (if you're writing a server). For some, this meant writing plugins
for existing HTTP servers. From my point of view, there is no need restrict
to this set of HTTP versions. You should abstract the meaning/behaviour
beyond that. Just like I wrote above:

This discussion reminds me of the difference between inductive reasoning
> and deductive reasoning. Only the immediate problem is pursued and no
> attention is given trying to solve the general problem at hand.
>

The complaint remains valid. A lack of abstract thinking.

While the content of the messages is the same (start line, zero or
> more fields, optional body) the semantics of the status-line and the
> fields differ between versions. For example, the absence of the
> Connection field in HTTP/1.1 implies keep-alive, while the absence of
> the Connection field in HTTP/1.0 implies closure. While in HTTP/2 the
> Connection field is illegal and may not appear at all.
>

I'm aware of connection differences.

The beast message container was designed from the ground up to work
> with the three versions of HTTP that we have today. In order to
> capture all of the information available in a message, the container
> must reflect the HTTP version that is implicitly or explicitly stated
> in the message. In order to empower authors to write libraries that
> work with any message (a goal of Beast), the version field must be
> present.
>

The “In order to [...] in a message, the container must reflect the HTTP
version” sentence is very interesting. It means I need to expose exactly
the same message received from the wire to the user... like you need
different API to support HTTP/2.0. Is my understanding of your sentence
correct?

Because the container doesn't need to reflect the HTTP version. In my view,
there are capabilities, like `native_stream`. Even the HTTP/2.0
multiplexing behaviour can be supported under the same simple API with no
additional complexities in the message container.

> This discussion reminds me of the difference between inductive reasoning
> and
> > deductive reasoning. Only the immediate problem is pursued and no
> attention
> > is given trying to solve the general problem at hand.
>
> That is incorrect. The beast message container is designed to be
> HTTP/2-ready.
>

“that is incorrect” + “message container is designed to be HTTP/2-ready”
seems to me that you didn't understand my complaint.

> However, a problem is the fact that this translation in the stream needs
> to
> > happen to actually use the Beast message framework.
>
> That is incorrect. The translation for converting the three character
> sequence "ICY" at the beginning of a read stream, into "HTTP/1.1" is
> not necessary to use the beast message container. Rather, it is
> necessary in order to use the HTTP/1 parsers which come with beast.
> The parser implementation strictly adheres to rfc7230. As the text
> "ICY 200 OK\r\n" is not a valid status-line according to rfc7230, I'm
> sure that you can understand why a strict parser would reject it.
>

I got the wrong impression. I thought your first email was more about the
message model than the parser.

My apologies but I am unfamiliar with these new terms "virtual HTTP
> stream" and "virtual field" that you have introduced. Could you
> perhaps define them so that I can discuss them on equal footing?
>

virtual, adj.:
Being or pertaining to a tangible, nonexistent object.
"I can see it, but it's not there."
-- Lady Macbeth.

Virtual field is not to be confused with HTTP field. There is the objective
reality and the virtual reality, a model you create from it, in your brain.
There is the _field_ — not to be confused with the word/semantic “field”
from HTTP messages — of real things and virtual things.

A good example to understand this separation, closer to the examples you're
used to is the `-o loop` option from the `mount` UNIX command. Your
application thinks it's a real file, but it's not, it's “emulated”/virtual.

Getting back to the thread, “ICY 200 OK\r\n” is real, “HTTP/1.1 200 OK\r\n”
is fake/virtual. “HTTP/1.1 200 OK\r\n” is what the application thinks it
has happened.

> Why does it need to go to memory?
>
> My apologies but I do not understand this question. What is the "it"
> that you think "needs" to "go to memory?"
>
> > Why isn't just limited to the API?
>
> My apologies but I do not understand this question at all. Perhaps you
> could provide a small code example which illustrates your question or
> counter-example?
>

To be limited to the virtual field, it'd mean that no
memory/RAM/array<char> translation from “ICY” to “HTTP” would happen. The
“HTTP interaction” of what happened would be limited to the function
signatures/contracts. No “HTTP” char-array in memory would need to exist
(i.e. it'd be virtual, not real).

However, like I stated in the beginning of the email, I thought you were
talking about the Beast message model. However, you're talking about the
parser. In this case (reusing the parser), I'll limit myself to Bjørn
words: “Sounds like a good solution.”

> Boost.Http managed to expose this amount of feature without exposing a
> > parser at all.
>
> Does Boost.Http expose a feature which allows it to interpret "ICY 200
> OK\r\n" as a valid HTTP status-line according to rfc7230?
>

Once again: I thought you were talking about Beast message model, not the
parser.

I think the solution for Boost.Http message model would be the same within
Boost.Beast message model. And the message model is not the solution
mentioned in the beginning of this thread.

As per the parser, I'm writing lots of documentations, so better translate
whatever answer I'd write here in the documentation of the parser.

Hmm...I'll restate it. The beast::http::read algorithm is a generic
algorithm which operates on any object meeting the SyncReadStream
requirements. A typical setup for un-encrypted connections might look
like this:

beast::http::read <-- boost::asio::ip::tcp::socket

The read algorithm receives data directly from the socket and provides
it to the parser. My solution to recognizing "ICY 200 OK\r\n" is
simply to add a stream adapter into the middle of this pipeline:

beast::http::read <-- icy_stream <-- boost::asio::ip::tcp::socket

The stream adapter replaces "ICY" with "HTTP/1.1" if it appears as the
first the characters of the input. This adapter would work with any
algorithm which operates on the SyncReadStream or AsyncReadStream
concept.

>> I'm not sure what an "HTTP backend" means, but for practical purposes
>> there are three versions of HTTP:
>>
>> HTTP/1.0
>> HTTP/1.1
>> HTTP/2
>
> Add ZeroMQ to the list.

Are you saying that HTTP has four versions 1.0, 1.1, 2, and ZeroMQ? A
casual search
of rfc2616[1], rfc7230[2], and rfc7540[3] does not turn up any matches
for "ZeroMQ."
Could you please provide a link to the document which describes the
ZeroMQ version
of HTTP?

> The point is making applications that answer HTTP requests (if you're writing a server).

Okay, so I think by this statement you are defining "HTTP backend" thusly:

An algorithm which calculates the HTTP response for a given HTTP
request and optional, additional state information associated with the
connection or application.

The style I am promoting for these types of algorithms, is evidenced
in the example HTTP
servers which come with Beast [4]. If you look at those servers, you
will notice that although
each server offers different features (plain, SSL, synchronous,
asynchronous, support for
websocket) they all contain an identical copy of a function with this signature:

// This function produces an HTTP response for the given
// request. The type of the response object depends on the
// contents of the request, so the interface requires the
// caller to pass a generic lambda for receiving the response.
template<
class Body, class Allocator,
class Send>
void
handle_request(
boost::beast::string_view doc_root,
http::request<Body, http::basic_fields<Allocator>>&& req,
Send&& send);

More formally, the style of requesting handling is expressed as a pure function:

{m', s'} = f(m, s)

where

f is the function
m is the HTTP request
s is the initial state
m' is the HTTP response
s' is the final state

Authors can write pure functions such as `handle_request` above, and since they
are using beast::message as a common container for HTTP messages, those pure
functions can then be composed. This allows higher level libraries to
be composed
from lower level ones to arbitrary degree.

That all the different Beast example servers work using the same logic
for processing
requests is evidence that the design achieves its goal.

> For some, this meant writing plugins for existing HTTP servers.

Of course, the `handle_request` signature I provided above can only be
called directly
for code that executes within the same process. In order to delegate
request processing
to another process, it is necessary to serialize the HTTP message,
deliver it to the other
process, and then deserialize it back into the message container. That
is why Beast stream
algorithms operate on concepts. So you can implement "ostream_socket" which
meets the requirements of SyncWriteStream, and then use that stream with
beast::http::write to deliver an HTTP message to another process connected via
an output file descriptor. Still, the design remains the same, but a
layer to deliver
the message to another process is required.

> From my point of view, there is no need restrict to this set of HTTP versions.
> You should abstract the meaning/behaviour beyond that, Just like I wrote above:

I must disagree. The semantics of an HTTP message can change depending on
the version. I also disagree that a representation of HTTP-version needs to be
broad enough to include ZeroMQ. What does that even mean? ZeroMQ is not
an HTTP version.

I will restate: There are currently three meaningful HTTP versions:

HTTP/1.0
HTTP/1.1
HTTP/2

These are documented in RFCs and well defined.

> A lack of abstract thinking.

No, my design does not represent a lack of abstract thinking. Quite
the opposite.
I have used abstractions *where appropriate*. Examples:

I think your preoccupation with backends has good intentions. And it should be
clear to you now given my explanation and example code that Beast was
specifically
designed to allow for unlimited flexibility in how users choose to consume HTTP
messages. If you still disagree I would kindly ask that you provide a
counter-example
in the form of pseudo-code that demonstrates your case.

> The “In order to [...] in a message, the container must reflect the HTTP
> version” sentence is very interesting. It means I need to expose exactly the
> same message received from the wire to the user... like you need different
> API to support HTTP/2.0. Is my understanding of your sentence correct?

"support HTTP/2.0" is a vague question, so I will decompose it into
two questions:

* Does Beast require a different container to represent HTTP/2 messages?

* Does Beast require different interfaces to serialize and deserialize
HTTP/2 messages on streams?

Note that I've answered these questions already both on the list and
in the Beast documentation but for the sake being fully informed the
answers to those questions are "No" and "Yes."

> Because the container doesn't need to reflect the HTTP version.

Again I have to disagree. The interpretation of HTTP field values can
be different depending on the HTTP-version (which can be 1.0, 1.1, or
2).

> In my view, there are capabilities, like `native_stream`.

Okay. I don't know what that is. Can you provide a link to where the
"native_stream"
HTTP feature is explained?

> Even the HTTP/2.0 multiplexing behaviour can be supported under the same
> simple API with no additional complexities in the message container.

Are you changing the subject from message containers to stream
algorithms? When you
refer to "simple API" are you talking about a stream algorithm? A
serialization/deserialization
algorithm?

I agree that it is possible to design an interface to stream
operations which can
be agnostic to whether the underlying connection uses HTTP/1 versus HTTP/2.
However, such a library would by definition not be a low-level library. It could
be decomposed into four parts:

Beast provides 1 and 2 above. And I have plans to provide 3. I have no plans to
provide 4, although with 1, 2, and 3 it could certainly be implemented
and it would
be much easier than having to write everything.

My intuition is that users who say they want a unified interface, and
get a unified
interface, will later realize that they didn't need it at all. But the
pitch of a unified
stream operations interface that works with HTTP/1 and HTTP/2 is of course
quite appealing because it creates the illusion of getting HTTP/2
support "for free."
But as I said I think there are problems with it which will only become apparent
after someone tries to offer such an interface, and users actually try
to use it.

Regardless, a unified interface can be decomposed. And any library which CAN
be decomposed, SHOULD be decomposed strictly on the principle of separation
of concerns.

> “that is incorrect” + “message container is designed to be HTTP/2-ready”
> seems to me that you didn't understand my complaint.

Perhaps I did not understand your complaint. If you could repeat it in
clear terms that could help.

It looks like an implementation detail. My experience with Beast is that the
vast majority of users don't care about how the parser is implemented other
than that it works and that it is reasonably fast or at least does not produce
a visible slowdown in the application. Users want these operations:

parse_header() // parse just the header
parse() // parse whatever is left

Thus far no one has expressed a desire to interact with HTTP message
tokens as they appear. However, should someone wish to do so the Beast
parser abstracts the post-processing of tokens by delegating them to a
derived class using CRTP. So for example, when the day comes that someone
wants to discard fields they don't recognize or care about, they can do so.
And they will find ample documentation on this subject along with examples:

> However, like I stated in the beginning of the email, I thought you were
> talking about the Beast message model. However, you're talking about the
> parser. In this case (reusing the parser), I'll limit myself to Bjørn words:
> “Sounds like a good solution.”

Oh.. yeah. The conversion from ICY to HTTP/1.1 is just a rewriting of
the buffer as an object inserted into the pipeline between the socket
and the read algorithm.

I don't really think much of it. Seems like a case of over-engineering to
me. HTTP parsing is relatively straightforward and in the case of Beast
it is a solved problem. Beast's parser works, has extensive tests and
code coverage, has been through a gauntlet of fuzzed inputs, and is
currently undergoing a security audit by a third party company.

Note that HTTP grammar does not require backtracking so I am not
seeing an immediate benefit from implementing a parser combinator.
This might not be true for some of the fields though, so perhaps a
parser combinator might be useful there. However, parsing the
values of fields (other than Connection, Proxy-Connection,
Upgrade, Transfer-Encoding, and Content-Length) is strictly
out-of-scope for Beast and would be the subject of a different library.

URIs require backtracking for authority elements missing the slash
but Beast doesn't parse the request-target, it just presents it as a
string_view to the caller. So the same rationale about being out of
scope applies.

I always welcome these discussions since they offer the possibility
of improvements. Or in this case they further cement the justifications
for the design decisions made in Beast.