I usually don’t blog about updates on the Boost.Http project because I want as much info as possible in code and documentation (or even git history), not here. However, I got a stimulus to change this habit. A new parser I’ve been writing replaced the NodeJS parser in Boost.Http and here is the most appropriate place to inform about the change. This will be useful info also if you’re interested in using NodeJS parser, any HTTP parser or even designing a parser with a stable API unrelated to HTTP.

EDIT (2016/08/07): I tried to clarify the text. Now I try to make it clear whether I’m refering to the new parser (the new parser I wrote) or the old parser (NodeJS parser) everywhere in the text. I’ll also refer to Boost.Http with new parser as new Boost.Http and Boost.Http with old parser as old Boost.Http.

What’s wrong with NodeJS parser?

I started developing a new parser because several users wanted a header-only library and the parser was the main barrier for that in my library. I took the opportunity to also provide a better interface which isn’t limited to C language (inconvenient and lots of unsafe type casts) and uses a style that doesn’t own the control flow (easier to deal with HTTP upgrade and doesn’t require lots of jump’n’back among callbacks).

The old parser is so hard to use that I wouldn’t dare to use the same tricks I’ve used in the new Boost.Http to avoid allocations on the old Boost.Http. So the NodeJS parser doesn’t allocate, but dealing with it (old Boost.Http) is so hard that you don’t want to reuse the buffer to keep incomplete tokens at all (forcing allocation or a big-enough secondary buffer to hold them in old Boost.Http).

HTTP upgrade is also very tricky and the lack of documentation for the NodeJS parser is depressing. So I only trust my own code as an usage reference for NodeJS parser.

However, I’ve hid all this complexity from my users. My users wanted a different parser because they wanted a header-only library. I personally only wanted to change the parser because the NodeJS parser only accepts a limited set of HTTP methods and it was tricky to properly not perform any allocation. The new parser even makes it easier to reject an HTTP element before decoding it (e.g. a URL too long will exhaust the buffer and then the new Boost.Http can just check the `expected_token` function to know it should reply with 414 status code instead concatenating a lot of URL pieces until it detect the limit was reached).

If you aren’t familiar enough with HTTP details, you cannot assume the NodeJS parser will abstract HTTP framing. Your code will get the wrong result and it’ll go silent for a long time before you know it.

The new parser

EDIT(2016/08/09): The new parser is almost ready. It can be used to parse request messages (it’ll be able to parse response messages soon). It’s written in C++03. It’s header-only. It only depends on boost::string_ref, boost::asio::const_buffer and a few others that I may be missing from memory right now. The new parser doesn’t allocate data and returns control to the user as soon as one token is ready or an error is reached. You can mutate the buffer while the parser maintains a reference to it. And the parser will decode the tokens, so you do not need ugly hacks as NodeJS parser requires (removing OWS from the end of header field values).

I want to tell you that the new parser was NOT designed to Boost.Http needs. I wanted to make a general parser and the design started. Then I wanted to replace NodeJS parser within Boost.Http and parts have fit nicely. The only part that didn’t fit perfectly at the time to integrate pieces was a missing end_of_body token that was easy to add in the new parser code. This was the only time that I, as the author of Boost.Http and as a user of the new parser, used my power, as the author of the parser itself, to push my needs on everybody else. And this token was a nice addition anyway (using NodeJS API you’d use http_body_is_final).

There is also the “parser combinators” part of this project (still not ready) that I’ve only understood once I’ve watched a talk from Scott Wlaschin. Initially I was having a lot of trouble because I wanted stateful miniparsers to avoid “reparsing” certain parts, but you rarely read 1-sized chunks and I was only complicating things. The combinators part is tricky to deliver, because the next expected token will depend on the value (semantic, not syntax) of current token and this is hard to represent using expressions like Boost.Spirit abstractions. Therefore, I’m only going to deliver the mini-parsers, not the combinators. Feel free to give me feedback/ideas if you want to.

Needless to say the new parser should have the same great features from NodeJS parser like no allocations or syscals behind the scenes. But it was actually easier to avoid and decrease allocations on Boost.Http thanks to the parser’s design of not forcing the user to accumulate values on separate buffers and making offsets easy to obtain.

I probably could achieve the same effect of decreased buffers in Boost.Http with NodeJS parser, but it was quite hard to work with NodeJS parser (read section above). And you should know that the old Boost.Http related to the parser was almost 3 times bigger (it’d be almost 4 times bigger, but I had to add code to detect keep alive property because the new parser only care about message framing) than the new Boost.Http code related to the parser.

On the topic of performance, the new Boost.Http tests consume 7% more time to finish (using a CMake Release build with GCC under my machine). I haven’t spent time trying to improve performance and I think I’ll only try to improve memory usage anyway (the size of the parser structure).

A drawback (is it?) is that the new parser only cares about structuring the HTTP stream. It doesn’t care about connection state (exception: receiving http 1.0 response body/connection close event). Therefore, you need to implement the keep-alive yourself (which the Boost.Http higher-level layers do).

I want to emphasize that the authors of the NodeJS parser have done a wonderful job with what they had in hands: C!

Migrating code to use the new parser

First, I haven’t added the code to parse the status line yet, so the parser is limited to HTTP requests. It shouldn’t take long (a few weeks until I finish this and several other tasks).

Tufão project has been using NodeJS parser improperly for ages and it’d be hard to fix that. Therefore, I’ll replace “Tufão’s parser” with this new shiny one in the next Tufão releaseTufão 1.4.0 has been refactored to use this new parser. It’ll finally gain It finally received support for HTTP pipelining and plenty of bugfixes that nobody noticed will land landed. Unfortunately I got the semantics for HTTP upgrade within Tufão wrong and it kind of has “forced HTTP upgrade” (this is something I got right in Boost.Http thanks to RFC7230 clarification).

Next steps

I may have convinced you to prefer Boost.Http parser over NodeJS parser when it comes to C++ projects. However, I hope to land a few improvements before calling it ready.

Test wise I can already tell you more than 80% of all the code written for this parser are tests (like 4 lines of test for each 1 line of implementation). However I haven’t run the tests in combination with sanitizers (yet!) and there a few more areas where tests can be improved (include coverage, allocate buffer chunks separately so sanitizers can detect invalid access attempts, fuzzers…) and I’ll work on them as well.

I can add some code to deduce the optimum size for indexes and return a parser with a little less overhead memory-wise.

Share this:

Curtir isso:

Since Tufão 0.4, I’ve been using CMake as the Tufão build system, but occasionally I see some users reimplementing the qmake-based project files and I thought it’d be a good idea to explain/document why such rewrite is a bad idea. This is the post.

Simply and clear (1 reason)

It means *nothing* to your qmake-based project.

What your qmake-based project needs is a *.pri file to you include in your *.pro file. And such *.pri file *is* generated (and properly included in your Qt-installation) by Tufão. You’ll just write the usual “CONFIG += TUFAO” without *any* pain.

Why won’t I use qmake in Tufão ever again (long answer)

Two reasons why is it a bad idea:

You define only one target per file. You need subdirs. It’s hard.

The Tufão unit testing based on Qt Test requires the definition of separate executables per test and the “src/tests/CMakeLists.txt” CMake file beautifully defines 13 tests. And with the CMake-based system, all you need to do to add a new test is add a single line to the previously mentioned file. QMake is so hard that I’d define a dumb test system that only works after you install the lib, just to free myself from the qmake-pain.

There is no easy way to preprocess files.

And if you use external commands that you don’t provide yourself like grep, sed or whatever, then your build system will be less portable than autotools. Not every Windows developer likes autotools and your approach (external commands) won’t be any better.

All in all, it becomes hard to write a build system that will install the files required by projects that use QMake, CMake or PKG-CONFIG (Tufão supports all three).

The reasons above are the important reasons, but there are others like the fact that documentation is almost as bad as the documentation for creation of QtCreator plugins.

The ever growing distance from QMake

When Tufão grows, sometimes the build system becomes more complicated/demanding, and when it happens, I bet the QMake-based approach will become even more difficult to maintain. The most recent case of libtufao.pro that I’m aware of had to include some underdocumented black magic like the following just to meet the Qt4/5 Tufão 0.x demand:

You like a CMake-based Tufão

The current CMake-based build system of Tufão provides features that you certainly enjoy. At least the greater stability of the new unit testing framework requires CMake and you certainly want a stable library.

QMake hate

In the beginning, QMake met Tufão requirements of a build system, but I wouldn’t use it again for demanding projects.

But I don’t hate QMake and I’d use it again in a Qt-based project if, and only if, I *know* it won’t have demanding needs. Of course I’ll hate QMake if people start to overuse it (and creating me trouble).

And if you still wants to maintain a QMake-based Tufão project file, at least you’ve been warned about the *pain* and inferior solution you’ll end up with.

Share this:

Curtir isso:

Today I’ve spent some minutes of my time to fix the metadata of the Tufão git repo. The issue makes difficult to gather who author owns which lines of the code, which can be a problem if you want give merit for the real coders (maybe this is related to ethics) and if you want to contact the authors later for actions that only can be done by the copyright owner (eg. change license). This incorrect info was introduced by misuse of the git tool.

First, I must admit that I was wrong and the issue was entirely caused by ignorance of my git’s knowledge at the time. The issue was not made on purpose. You can see an example here, where I mentioned the original author on the commit message to give the appropriate credit (my intent). But there are also cases where I failed to even make such mention.

Every commit you make using the git tool has an associated author and if you don’t specify the author explicitly, git will use the global config. This metadata is used by commands such as git shortlog -s and git blame. The solution is simply: Just set the author explicitly using the –author argument.

Now the git repo history mentions authors such as Paul Maseberg and Marco Molteni.

Lastly, but not least, I want to let you know that I believe the problem is solved. If you find something that I missed, just fill a bug on the github and I’ll fix it.

Share this:

Curtir isso:

After a long time developing Tufão, it finally reached 1.0 version some hours ago. I’ve spent a lot of time cleaning up the API and exploring the features provided by C++11 and Qt5 to release this version.

This is the first Tufão release that:

… breaks config files, so you’ll need to update your config files to use them with the new version

… breaks ABI, so you’ll need to recompile your projects to use the new version

… breaks API, so you’ll need to change your source code to be able to recompile your previous code and use the new version

… breaks previous documented behaviour, so you’ll need to change the way you use features that were available before. But don’t worry, because the list of these changes are really small and are well documented below.

Porting to Tufão 1.0

From now on, you should link against tufao1 instead of tufao. The PKGCONFIG, qmake and CMake files were renamed also, so you can have different Tufão libraries in the same system if their major version differs.

The list of behavioural changes are:

Headers are being stored using a Hash-table, so you can’t easily predict (and shouldn’t) the order of the headers anymore. I hope this change will improve the performance.

HttpServerRequest::ready signal auto-disconnects all slots connected to the HttpServerRequest::data and HttpServerRequest::end slots before being emitted.

HttpFileServer can automatically detect the mime type from the served files, so if you had your own code logic to handle the mimes, you should throw it away and enjoy a lesser code base to maintain.