MQTT definitely has a smaller size on the wire. It’s also simpler to parse (let’s face it, Huffman isn’t that easy to implement) and provides guaranteed delivery to cater to shaky wireless networks. On the other hand, it’s also not terribly extensible. There aren’t a whole lot of headers and options available, and there’s no way to make custom ones without touching the payload of the message.

It seems that HTTP/2 could definitely serve as a reasonable replacement for MQTT. It’s reasonably small, supports multiple paradigms (pub/sub & request/response) and is extensible. Its also supported by the IETF (whereas MQTT is hosted by OASIS). From conversations I’ve had with industry leaders in the embedded software and chip manufacturing, they only want to support standards from the IETF. Many of them are still planning to support MQTT, but they’re not happy about it.

I think MQTT is better at many of the things it was designed for, but I’m interested to see over time if those advantages are enough to outweigh the benefits of HTTP. Regardless, MQTT has been gaining a lot of traction in the past year or two, so you may be forced into using it while HTTP/2 catches up.

Vaurien is basically a Chaos Monkey for your TCP connections. Vaurien acts as a proxy between your application and any backend. You can use it in your functional tests or even on a real deployment through the command-line.

Vaurien is a TCP proxy that simply reads data sent to it and pass it to a backend, and vice-versa. It has built-in protocols: TCP, HTTP, Redis & Memcache. The TCP protocol is the default one and just sucks data on both sides and pass it along.

Having higher-level protocols is mandatory in some cases, when Vaurien needs to read a specific amount of data in the sockets, or when you need to be aware of the kind of response you’re waiting for, and so on.

Vaurien also has behaviors. A behavior is a class that’s going to be invoked everytime Vaurien proxies a request. That’s how you can impact the behavior of the proxy. For instance, adding a delay or degrading the response can be implemented in a behavior.

Both protocols and behaviors are plugins, allowing you to extend Vaurien by adding new ones.

Last (but not least), Vaurien provides a couple of APIs you can use to change the behavior of the proxy live. That’s handy when you are doing functional tests against your server: you can for instance start to add big delays and see how your web application reacts.

'A constant throughput, correct latency-recording variant of wrk. This is a must-have when measuring network service latency -- corrects for Coordinated Omission error:

wrk's model, which is similar to the model found in many current load generators, computes the latency for a given request as the time from the sending of the first byte of the request to the time the complete response was received. While this model correctly measures the actual completion time of individual requests, it exhibits a strong Coordinated Omission effect, through which most of the high latency artifacts exhibited by the measured server will be ignored. Since each connection will only begin to send a request after receiving a response, high latency responses result in the load generator coordinating with the server to avoid measurement during high latency periods.

We are excited to announce the release of Proxygen, a collection of C++ HTTP libraries, including an easy-to-use HTTP server. In addition to HTTP/1.1, Proxygen (rhymes with "oxygen") supports SPDY/3 and SPDY/3.1. We are also iterating and developing support for HTTP/2.

Proxygen is not designed to replace Apache or nginx — those projects focus on building extremely flexible HTTP servers written in C that offer good performance but almost overwhelming amounts of configurability. Instead, we focused on building a high performance C++ HTTP framework with sensible defaults that includes both server and client code and that's easy to integrate into existing applications. We want to help more people build and deploy high performance C++ HTTP services, and we believe that Proxygen is a great framework to do so.

'Verizon Wireless is monitoring users' mobile internet traffic, using a token slapped onto web requests, to facilitate targeted advertising even if a user has opted out.

The unique identifier token header (UIDH) was launched two years ago, and has caused an uproar in tech circles after it was re-discovered Thursday by Electronic Frontier Foundation staffer Jacob Hoffman-Andrews.

The Relevant Mobile Advertising program, under which the UIDH was used, allowed a restaurant to advertised to locals only or for retail websites to promote to previous visitors, according to Verizon Wireless.'

This Java library can route paths to targets and create paths from targets and params (reverse routing). This library is tiny, without additional dependencies, and is intended for use together with an HTTP server side library. If you want to use with Netty, see netty-router.

web service API for Dublin Bikes data (and other similar bikesharing services run by JCD):

Two kinds of data are delivered by the platform:

Static data provides stable information like station position, number of bike stands, payment terminal availability, etc.
Dynamic data provides station state, number of available bikes, number of free bike stands, etc.
Static data can be downloaded manually in file format or accessed through the API. Dynamic data are refreshed every minute and can be accessed only through the API.

Comcast is adding data into the broadband packet stream. In 2007, it was packets serving up disconnection commands. Today, Comcast is inserting JavaScript that is serving up advertisements, according to [Robb] Topolski, who reviewed Singel's data. "It's the duty of the service provider to pull packets without treating them or modifying them or injecting stuff or forging packets. None of that should be in the province of the service provider," he said. "Imagine every Web page with a Comcast bug in the lower righthand corner. It's the antithesis of what a service provider is supposed to do. We want Internet access, not another version of cable TV."

The Police Intellectual Property Crime Unit has arrested a 20-year-old man in Nottingham on suspicion of copyright infringement for running a proxy server providing access to other sites subject to legal blocking orders.

Is operating a proxy server illegal? Interesting. Seems unlikely that this will go to court though.

To prove how easy [MITM attacking Mavencentral JARs] is to do, I wrote dilettante, a man-in-the-middle proxy that intercepts JARs from maven central and injects malicious code into them. Proxying HTTP traffic through dilettante will backdoor any JARs downloaded from maven central. The backdoored version will retain their functionality, but display a nice message to the user when they use the library.

We dynamically monitor and manage a large and rapidly growing number of web servers deployed on our infrastructure and systems. However, existing tools present major challenges when making REST/SOAP calls with server-specific requests to a large number of web servers, and then performing aggregated analysis on the responses. We therefore developed REST Commander, a parallel asynchronous HTTP client as a service to monitor and manage web servers. REST Commander on a single server can send requests to thousands of servers with response aggregation in a matter of seconds. And yes, it is open-sourced at http://www.restcommander.com.

Feature highlights:

Click-to-run with zero installation;
Generic HTTP request template supporting variable-based replacement for sending server-specific requests;
Ability to send the same request to different servers, different requests to different servers, and different requests to the same server;
Maximum concurrency control (throttling) to accommodate server capacity;
Commander itself is also “as a service”: with its powerful REST API, you can define ad-hoc target servers, an HTTP request template, variable replacement, and a regular expression all in a single call. In addition, intuitive step-by-step wizards help you achieve the same functionality through a GUI.

I like the integration of Eureka and Hystrix in particular, although I would really like to read more about Eureka's approach to availability during network partitions and CAP.

https://groups.google.com/d/msg/eureka_netflix/LXKWoD14RFY/-5nElGl1OQ0J has some interesting discussion on the topic. It actually sounds like the Eureka approach is more correct than using ZK: 'Eureka is available. ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessary consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.'

I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances. This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones. What I saw was that the two other instances noticed that the first server “going away”, but they continued to function as they still saw a majority (66%). More interestingly the first instance noticed the other two servers “going away” dropping the ensemble availability to 33%. This caused the first server to stop serving requests to clients (not only writes, but also reads). [...]

To me this seems like a concern, as network partitions should be considered an event that should be survived. In this case (with this specific configuration of zookeeper) no new clients in that availability zone would be able to register themselves with consumers within the same availability zone. Adding more zookeeper instances to the ensemble wouldn’t help considering a balanced deployment as in this case the availability would always be majority (66%) and non-majority (33%).

BorderPatrol is an nginx module to perform authentication and session management at the border of your network. BorderPatrol makes the assumption that you have some set of services that require authentication and a service that hands out tokens to clients to access that service. You may not want those tokens to be sent across the internet, even over SSL, for a variety of reasons. To this end, BorderPatrol maintains a lookup table of session-id to auth token in memcached.

If this kind of bullshit -- a HTTP GET of an XML file from www.samsung.com -- is how the Samsung Smart TV firmware decides if the internet is working or not, I dread to think how crappy the rest of the code is. (At least in Netnote we performed a bunch of bigco-domain DNS lookups before giving up...)

kellabyte's hack in progress -- 'an asynchronous HTTP server framework written in C. The goal of Haywire is to learn how to create a server with a minimal feature set that can handle a high rate of requests and connections with as low of latency and resource usage as possible. Haywire uses the event loop based libuv platform layer that node.js is built on top of (also written in C). libuv abstracts IOCP on Windows and epoll/kqueue/event ports/etc. on Unix systems to provide efficient asynchronous I/O on all supported platforms.'

* includes explicit support for long-distance WAN operation as well as on LANs.

It all looks very practical and usable. MPL-licensed.

The only potential risk I can see is that expecting to receive config updates from a blocking poll of the HTTP interface needs some good "best practice" docs, to ensure that people don't mishandle the scenario where there is a network partition between your calling code and the Consul server/agent. Without any heartbeating protocol behind the scenes, HTTP is vulnerable to "hung connections" which would result in a config change being silently missed by the client until the connection eventually is timed out, either by the calling code or the client-side kernel. This could potentially take minutes to occur, which in some usage scenarios could be a big, unforeseen problem.

Poul-Henning Kemp details why Varnish doesn't do SSL -- basically due to the quality and complexity of open-source SSL implementations:

There is no other way we can guarantee that secret krypto-bits do not leak anywhere they should not, than by fencing in the code that deals with them in a child process, so the bulk of varnish never gets anywhere near the certificates, not even during a core-dump.

We shouldn’t think of “the web” as only what renders in web browsers. We should think of the web as anything transmitted using HTTP and HTTPS. Apps and websites are peers, not competitors. They’re all just clients to the same services.

"Microservices" seems to be yet another term for SOA; small, decoupled, independently-deployed services, with well-defined public HTTP APIs. Pretty much all the services I've worked on over the past few years have been built in this style. Still, let's keep an eye on this concept anyway.

a utility to perform parallel, pipelined execution of a single HTTP GET. htcat is intended for the purpose of incantations like: htcat https://host.net/file.tar.gz | tar -zx

It is tuned (and only really useful) for faster interconnects: [....] 109MB/s on a gigabit network, between an AWS EC2 instance and S3. This represents 91% use of the theoretical maximum of gigabit (119.2 MiB/s).

Google paper describing the infrastructure they've built for cross-service request tracing (ie. "tracer requests"). Features: low code changes required (since they've built it into the internal protobuf libs), low performance impact, sampling, deployment across the ~entire production fleet, output visibility in minutes, and has been live in production for over 2 years. Excellent read

On-the-fly video transcoding during live streaming. They've done a great job of this!

At the beginning of the development of this feature, we entertained the idea to simply pre-transcode all the videos in Dropbox to all possible target devices. Soon enough we realized that this simple approach would be too expensive at our scale, so we decided to build a system that allows us to trigger a transcoding process only upon user request and cache the results for subsequent fetches. This on-demand approach: adapts to heterogeneous devices and network conditions, is relatively cheap (everything is relative at our scale), guarantees low latency startup time.

IPKat says 'this morning the Court of Justice of the European Union issued its keenly awaited decision in Case C-466/12 Svensson [...]: The owner of a website may, without the authorisation of the copyright holders, redirect internet users, via hyperlinks, to protected works available on a freely accessible basis on another site. This is so even if the internet users who click on the link have the impression that the work is appearing on the site that contains the link.'

This is potentially big news. Not so much for the torrent-site scenario, but for the NNI/NLI linking-to-newspaper-stories scenario.

Rest.li is a REST+JSON framework for building robust, scalable service architectures using dynamic discovery and simple asynchronous APIs. Rest.li fills a niche for building RESTful service architectures at scale, offering a developer workflow for defining data and REST APIs that promotes uniform interfaces, consistent data modeling, type-safety, and compatibility checked API evolution.

An 11 hour outage caused by a false positive in Sky's anti-phishing filter; all sites using the code.jquery.com CDN for JQuery would have seen errors.

Sky still appears to be blocking code.jquery.com and all files served via the site, and more worryingly is that if you try to report the incorrect category, once signing in on the Sky website you an error page. We suspect the site was blocked due to being linked to by a properly malicious website, i.e. code.jquery.com and some javascript files were being used on a dodgy website and every domain mentioned was subsequently added to a block list.

Staggeringly inept. The UK national porn filter blocks based on a regexp match of the URL against /.*sex.*/i -- the good old "Scunthorpe problem". Better, it returns a 404 response. This is also a good demonstration of how web filtering has unintended side effects, breaking third-party software updates with its false positives.

The update to online strategy game League of Legends was disrupted by the internet filter because the software attempted to access files that accidentally include the word “sex” in the middle of their file names. The block resulted in the update failing with “file not found” errors, which are usually created by missing files or broken updates on the part of the developers.

'like inetd, but for WebSockets' -- 'a small command line tool that will wrap an existing command line interface program, and allow it to be accessed via a WebSocket. It provides a quick mechanism for allowing web-applications to interact with existing command line tools.'

a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue. An optional LuaJIT script can perform HTTP request generation, response processing, and custom reporting.

Need a flexible format to record, export, and analyze network performance data? Well, that's exactly what the HTTP Archive format (HAR) is designed to do! Even better, did you know that Chrome DevTools supports it? In this episode we'll take a deep dive into the format (as you'll see, its very simple), and explore the many different ways it can help you capture and analyze your sites performance. Join Ilya Grigorik and Peter Lubbers to find out how to capture HAR network traces in Chrome, visualize the data via an online tool, share the reports with your clients and coworkers, automate the logging and capture of HAR data for your build scripts, and even adapt it to server-side analysis use cases

This looks really nice -- it's quite similar to something I was hacking on a while back. Only problem is that it's AGPL-licensed...

'Pushpin makes it easy to create HTTP long-polling and streaming services using any web stack as the backend. It’s compatible with any framework, whether Django, Rails, ASP, or even PHP. Pushpin works as a reverse proxy, sitting in front of your server application and managing all of the open client connections.'

'This article will use NettoSphere, a framework build on top of the popular Netty Framework and Atmosphere with support of WebSockets, Server Side Events and Long-Polling. NettoSphere allows [async JVM framework] Atmosphere's applications to run on top of the Netty Framework.'

Wow, this looks excellent. A must-read for people working on systems with high-volume, low-latency phone-to-server communications -- and free!

How prepared are you to build fast and efficient web applications? This eloquent book provides what every web developer should know about the network, from fundamental limitations that affect performance to major innovations for building even more powerful browser applications—including HTTP 2.0 and XHR improvements, Server-Sent Events (SSE), WebSocket, and WebRTC.

As part of the Turmoil system, the NSA places secret servers, codenamed Quantum, at key places on the internet backbone. This placement ensures that they can react faster than other websites can. By exploiting that speed difference, these servers can impersonate a visited website to the target before the legitimate website can respond, thereby tricking the target's browser to visit a Foxacid server.

At GE, we were more interested in the uncoordinated aspects of Snowflake than its throughput requirements, so HTTP was fine for our needs. We also exposed the core of Snowflake as an embeddable module so it can be directly integrated into our applications. We don't have the guarantees that the Snowflake-Zookeeper integration was providing, but that was also acceptable to us. In places where we really needed high throughput, we leveraged the snowizard-core embeddable module directly.

The award for least-intrusive and entirely painless mitigation proposal goes to Paul Querna who, on the httpd-dev mailing list, proposed to use the HTTP chunked encoding to randomize response length. Chunked encoding is a HTTP feature that is typically used when the size of the response body is not known in advance; only the size of the next chunk is known. Because chunks carry some additional information, they affect the size of the response, but not the content. By forcing more chunks than necessary, for example, you can increase the length of the response. To the attacker, who can see only the size of the response body, but not anything else, the chunks are invisible. (Assuming they're not sent in individual TCP packets or TLS records, of course.) This mitigation technique is very easy to implement at the web server level, which makes it the least expensive option. There is only a question about its effectiveness. No one has done the maths yet, but most seem to agree that response length randomization slows down the attacker, but does not prevent the attack entirely. But, if the attack can be slowed down significantly, perhaps it will be as good as prevented.

This is a neat hack. Since SSL/TLS connection establishment requires lots of consecutive round trips before the connection is ready, by performing that closer to the user and reusing an existing region-to-region connection behind the scenes, the overall latency is greatly improved. Works for HTTP as well

Jim Gettys on modern web design, HTTP, buffering, and FIFO queues in the network.

Web surfing is putting impulses of packets, without congestion avoidance, into FIFO queues where they do severe collateral damage to anything sharing the link (including itself!). So today’s web behavior incurs huge collateral damage on itself, data centers, the edge of the network, and in particular any application that hopes to have real time behavior. How do we solve this problem?

on the mechanical-sympathy mailing list. Some really interesting discussion on handling insane quantities of TCP connections using low volumes of hardware:

This talk has some good points and I think the subject is really interesting. I would take the suggested approach with serious caution. For starters the Linux kernel is nowhere near as bad as it made out. Last year I worked with a client and we scaled a single server to 1 million concurrent connections with async programming in Java and some sensible kernel tuning. I've heard they have since taken this to over 5 million concurrent connections.

BTW Open Onload is an open source implementation. Writing a network stack is a serious undertaking. In a previous life I wrote a network probe and had to reassemble TCP streams and kept getting tripped up by edge cases. It is a great exercise in data structures and lock-free programming. If you need very high-end performance I'd talk to the Solarflare or Mellanox guys before writing my own.

There are some errors and omissions in this talk. For example, his range of ephemeral ports is not quite right, and atomic operations are only 15 cycles on Sandy Bridge when hitting local cache. A big issue for me is when he defined C10M he did not mention the TIME_WAIT issue with closing connections. Creating and destroying 1 million connections per second is a major issue. A protocol like HTTP is very broken in that the server closes the socket and therefore has to retain the TCB until the specified timeout occurs to ensure no older packet is delivered to a new socket connection.

Written by Google, this library is a flexible, efficient, and powerful Java client library for accessing any resource on the web via HTTP. It features a pluggable HTTP transport abstraction that allows any low-level library to be used, such as java.net.HttpURLConnection, Apache HTTP Client, or URL Fetch on Google App Engine. It also features efficient JSON and XML data models for parsing and serialization of HTTP response and request content. The JSON and XML libraries are also fully pluggable, including support for Jackson and Android's GSON libraries for JSON.

Not quite as simple an API as Python's requests, sadly, but still an improvement on the verbose Apache HttpComponent API. Good support for unit testing via a built-in mock-response class. Still in beta

At Railscamp X it became clear there is a gap in the current HTTP specification. There are many ways for a developer to screw up their implementation, but no code to share the nature of the error with the end user. We humbly suggest the following status codes are included in the HTTP spec in the 7XX range.

Includes such useful status codes as "724 - This line should be unreachable".

'an elegant and simple HTTP library for Python, built for human beings.' 'Requests is an Apache2 Licensed HTTP library, written in Python, for human beings. Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly broken. It was built for a different time — and a different web. It requires an enormous amount of work (even method overrides) to perform the simplest of tasks. Requests takes all of the work out of Python HTTP/1.1 — making your integration with web services seamless. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, powered by urllib3, which is embedded within Requests.'

Troy Hunt, an Aussie software architect working on a .Net security product called ASafaWeb, does a great job extensively deconstructing Tesco's appalling website security on their shopping site. In the process, he gets this wonderful tweet from their customer-care account:

"@troyhunt Let me assure you that all customer passwords are stored securely & in line with industry standards across online retailers."

As he says, this is a clear demonstration that Tesco is in the first stage of the four stages of competence -- "unconscious incompetence": "The individual does not understand or know how to do something and does not necessarily recognise the deficit." ( http://en.wikipedia.org/wiki/Four_stages_of_competence )

by Erik Onnen of Urban Airship. very good presentation on the current state of the art in large-scale low-latency service operation using the JVM on Linux. Lots of good details on async vs sync, HTTPS/TLS/TCP tuning, etc.

'a Java framework for developing ops-friendly, high-performance, RESTful web services. Developed by Yammer to power their JVM-based backend services, Dropwizard pulls together stable, mature libraries from the Java ecosystem into a simple, lightweight package that lets you focus on getting things done. Dropwizard has out-of-the-box support for sophisticated configuration, application metrics, logging, operational tools, and much more, allowing you and your team to ship a production-quality HTTP+JSON web service in the shortest time possible.' From Coda Hale/Yammer; includes Guava, Jetty, Jersey, Jackson, Metrics, slf4j. Pretty good baseline to start any new Java service with....

'Starting with Chrome 13, we'll have HTTPS pins for most Google properties. This means that certificate chains for, say, https://www.google.com, must include a whitelisted public key. It's a fatal error otherwise.' good anti-MITM protection

'a reverse proxy, load balancer and HTTPS front-end for Web server(s). Pound was developed to enable distributing the load among several Web-servers and to allow for a convenient SSL wrapper for those Web servers that do not offer it natively. Pound is distributed under the GPL'

'allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads.' actively maintained, Windows and UNIX

'a web service of some sort (or any other way) to pull a current time zone settings for a (US) city. For the parts of the country that don't follow the Daylight Saving Time and basically jump timezones when everyone else is switching summer/winter time... I don't fancy creating own database of the places that don't follow DST. Is there a way to pull this data on demand?' earthtools.org seems the closest thing