Featured in AI, ML & Data Engineering

In this article, author shows how to use big data query and processing language U-SQL on Azure Data Lake Analytics platform. U-SQL combines the concepts and constructs both of SQL and C#. It combines the simplicity and declarative nature of SQL with the programmatic power of C# including rich types and expressions.

Featured in Culture & Methods

The book Agile Leadership in Practice - Applying Management 3.0 by Dominik Maximini is an experience report of the agile transformation journey of NovaTec. Maximini shares his experiences from applying principles and practices from Management 3.0, success stories, failure stories, and learnings from experiments.

Featured in DevOps

Yuri Shkuro presents a methodology that uses data mining to learn the typical behavior of the system from massive amounts of distributed traces, compares it with pathological behavior during outages, and uses complexity reduction and intuitive visualizations to guide the user towards actionable insights about the root cause of the outages.

Feross Aboukhadijeh on WebRTC, PeerCDN, WebTorrent

Bio

Feross Aboukhadijeh is a programmer, designer, teacher, and mad scientist. He is currently building WebTorrent, a streaming BitTorrent client for the browser, powered by WebRTC. Before that, he built PeerCDN, a peer-to-peer content delivery network that dramatically reduces bandwidth costs. Feross is a graduate of Stanford University and has worked at Quora, Facebook, and Intel.

About the conference

CRAFT is about software craftsmanship, which tools, methods, practices should be part of the toolbox of a modern developer and company, and it is a compass on new technologies, trends.

So I like to think at myself as a mad scientist because I like to make things that make people say : “Wow, I didn’t realize that was possible” but I’m a software developer and I write a lot of JavaScript and I like to build things!

Sure, so WebRTC is a web standard that lets you do peer to peer in the browser, what that means is that you can have one browser open up a connection directly to another browser and you can have two visitors on your website talking to each other, either like a video or audio chat, think like a Skype kind of thing or even arbitrary data so you can come up with your own protocol and you can use it for really anything you can imagine, it’s the kind of thing that is low level enough that you can build any kind of application on top of it, and what's exciting about it I think that it’s the very first time that peer to peer has been possible in the browser and it means that it's making the web more powerful because it’s a primitive, it’s like the web keeps getting better and becoming more like an operating system and instead of features and this is really one of the few remaining things that the web couldn’t do that a generic operating system could do, you know what I mean?

That's a good question, so it’s up to your application how that happens and so the spec doesn’t specify how that happens and that is a good thing because every application is different. So I give you an example, if you are building a chat application like Gmail or Facebook Chat where you want to chat from one user to another, the way you might implement the way peers find each other is every peer reports to the server that they are online and that they are, this is their name, the username whatever, and then the server makes that information available to all the people who should know, so that would probably be your friends, your email contacts or maybe for some use cases just anyone who wants to know.

And then, now that the information is available so then the user has the chance to say: “I want to talk to so and so”, they select that person in the UI and then the application would send a message to the server and say: “I want to talk to this person now” and then there is a little bit of an exchange that needs to happen and that is the WebRTC part, so the one peer needs to send some information to the other peer about what the IP address is and things like that, and then the other peer sends a response back, and then the peers are able to connect directly to each other. So the central server is needed for two things, one is how the peers find each other and it helps with that and then it also facilitates that exchange of information that they can eventually connect directly to each other.

Yes, so there is a collection of hacks that people come for over time called STUN, stands for like Standard Traversal Utilities for NAT or something, I don’t remember exactly what it is, but what’s cool about WebRTC is it has built in support for STUN so you can specify a STUN server when you setup your peer connection object and then it will take care of automatically trying to traverse the NAT, so it will talk with the STUN server which will punch a hole in your NAT and the other person does the same thing, and then most of the time they’ll be able to talk to each other. There are some cases when NAT doesn’t work like if both users are behind extremely restrictive NATS like symmetric NATS, then you may have two peers that are completely unable to talk to each other, and so depending on your application that maybe is a big deal or not for a lot of things it is probably big deal because, I mean the two can’t just even chat for like a BitTorrent or something like that kind of application maybe it doesn’t matter, but the work around for that is you can specify a relay server which is called the Turn server and that will step in, in case that two peers can’t connect all the data will be relayed through that server and that’s no longer peer to peer obviously but at least the users will be able to talk to each other, you don’t want to your users to think that your application is completely broken.

Werner: That fixes a lot of those problems and it feels like peer to peer although in the worst case it'll have to go via some server.

And then sometimes even the Turn server doesn’t work, people have been finding out in practice, people who’ve been running, the reason why that sometimes doesn’t even work is, so I think it’s around five percent of connections, don’t even work even with the Turn server and that’s because the firewalls they are behind are so restrictive that they don’t even allow UDP packets to go through and the Turn server uses UDP which is unfortunate. So in that case you have to fall back to something even more reliable which is like maybe just XHR requests or WebSockets or something like that. Unfortunate, but it’s a new technology so I think it will take a little while until it’s smooth around the edges.
I agree, very messy.

Sure yes, so I have been playing around with really crazy ways of using WebRTC that maybe the creators of it didn’t anticipate the standard people didn’t anticipate. I've done a few things, so one of them is called Peer CDN, basically the idea is to use visitors who are on our website at the same time to host that site as a CDN, as a Contend Delivery Network, so the idea is if I come to a site and I want to access some resource on the site so like a picture or a video, audio file, any kind of static asset before I actually request the file from a normal CDN like Akamai or something or from the site's server, first I will ask Peer CDN “Are there any other peers who have this resource in their browser cache and who are still on the same site at the same time?”, and if it tells me “Yes these ten people are on the same page as you”, they already have resources that you need to load the page, then it will actually tell you that information you need to know to connect to them and this happens transparently, so your browser will connect to them and fetch the resources.

It also verifies that the resources haven’t been tampered with, so there is a central Peer CDN server that’s trusted that sends you the hashes of the content, so the hashes are extremely short, they are like a fingerprint and then when you get the full file from the peer you can verify that they didn’t mess with it, they didn’t replace the picture that you wanted with another picture, and so the site owner can have [certainty of ] integrity and it lowers the bandwith cost for the site owner significantly because they no longer haveto pay for any resource that you fetch from a peer is considered free for them, that is a site that their server did not have to serve. So yes, it’s kind of crazy idea and we did a company, we did a startup for it, we work on it for about eight months and then we were approached by Yahoo and they wanted to acquire company so we sold the company to them and , so now me and the other two cofounders, we are working at Yahoo.

Yes, it’s a good question, the original story is kind of silly I think, so I like to build, like I was saying before, I like to build things that make people surprised, they say: “I didn't realize that you can do that, that’s crazy”, you know, and so I was thinking about, I was trying to find ways to make people’s browsers do things that they didn’t expect and one of the things that I thought would be good to try was, when a user is on your site ,to use their CPU resources for computation. I mean I’m not the first person to think of doing this, so there is a lot of people who’ve tried it, but I thought of the idea and thought I'll try and see if it's feasible and I built a quick prototype to try to crack passwords using visitors on my site. So a visitor would come to the site and then I just had a fake password that I already knew what it was and I just had their computer try hashing every possible password and the server distribute the load among the visitors and I just saw how fast it was, it was way too slow, it was completely infeasible because it was about fifty times slower than an equivalent C program and then the thing with password cracking is that you can even do it on GPU’s which is way faster than even a C program would be.

So the JavaScript was actually 8000 times slower than the GPU version of the password cracker. And then I thought what if I could use WebGL to write a Shader and then put the Shader on their graphics card and then render it into a one pixel thing, but it’s actually cracking passwords and it didn’t work because WebGL doesn’t have actually an integer type, OpenGL does but the WebGL doesn’t have an integer type which is very important, you need to do integer arithmetic for password cracking. So anyway that was one idea and of course other people have tried Bitcoin mining using their visitors' computers and things like that, I never tried anything like that, and then I eventually came to this idea of, what are the resources that the users have on their machine that we could use. Bandwidth is one and they may not even mind giving up a little bit of their bandwidth in exchange for helping the site owner reduce their costs. Maybe the site owner doesn’t need to show so many ads now because their site is a lot cheaper to run, and I even thought maybe you could sell the bandwidth and the site owner could make some profit and a lot of these ideas, but in the end it wasn’t the worst idea, it sort of worked.

WebTorrent is another project I started but this is an OpenSource one. Peer CDN was a company, WebTorrent is just an OpenSource project and the idea is I want to make BitTorrent work on the web, so users can visit a site and they can, if they have a URL to a BitTorrent file, they can just paste it in and then they’ll just start watching the video or looking at the images or the PDF or whatever the content is in their browser without ever having to install anything. And the idea is that I think if we can make the user experience of BitTorrent better, then a lot more people will use it and the network would be stronger and more resilient and we would just be able to get a lot more people using it who’ve never used it before, like my mom for example is not going to use BitTorrent, but maybe if it was as easy to use as Youtube then she would, and I think that would be good for the world.

Good question, there are things that make it not a trivial project, it’s quite hard and I’ll be doing it mostly by myself at this point. There are a lot of other people in the JavaScript community who’ve written modules that I’ve used, so it’s not completely starting from scratch, there is other good people who’ve done good work, but the plan is to do it in stages. So the first stage is to build a BitTorrent client in JavaScript, so not necessarily in the browser but just like in Node.js or something. If you have a BitTorrent client that works at least in Node and you’ve decomposed and split up the different pieces of the code into their own modules that are reusable, then that is a good starting point, so then once that works, then the plan is to ship that as a native application that users can install on OS X, Windows and Linux and put a good GUI on it, make it a client that people want to use it their primary client because it's just really good, has good features that no other client has, like for example streaming video, so I want to be able to push play and have a video play before the entire torrent is finished downloading.

That is a feature that a few clients have tried but it really none of the mainstream clients have really gotten it right or done it well I think, so if we can get that really right that will be good, so that currently is basically working as a command line app, you can download stuff on the command line and I made a whole bunch of modules, probably at least ten different modules that do ten different pieces of the client and some those are some of those that are being used by other projects and things like that which is cool, and then we make a Chrome app and the Chrome app is also usable, but have yet to make the actual native app that users will install. So that is the status of that part but then there are two more steps, so let’s say we have the client now that everyone's using because it’s awesome, it's doing BitTorrent, great.

The second step is we need to make a run in the browser so fortunately there is a great tool called Browserify that will take a Node package and it will bundle it into a single file that you can put on your website and it will make all the require statements, basically the way that you load modules in Node, it will make it all work in the browser. So that’s pretty cool and then the last step is to replace the parts of BitTorrent that use features like TCP and UDP that don’t exist in the browser and replace that with WebRTC. So that has not been done yet, I think it’s possible I thought about it, I drew a lot of stuff and sketches and I think it could work but it doesn’t exist yet, so I’m on step one of the three step plan, so we'll see how it goes. I should mention the reason, one really good reason why it’s good that we started with BitTorrent clients that people install and they like is because once the WebRTC version exists, we can produce these, we are going to have clients that are just single JavaScript tags that are going to be added to sites and they are only going to be able to talk to other browsers, the browser clients can only talk to other browser clients because this is how WebRTC works, but there is a lot of content on the existing BitTorrent network that we won't have access to, so we need some people who can see to the web users to get everything going initially and to bring content into the web network, and so the way we are going to do that is all people who install the BitTorrent client in JavaScript that we wrote in step one, we are going to update that so that it also speaks WebRTC and can talk to the web users.

So the people who are using those clients which I’m going to call hybrid clients now, the people who are using those they are going to be able to talk to both the web network and the normal network, so when they download something and they start to seed it, they are seeding it to both networks, they are seeding the sharing to both networks which means the web users now can download from them which is pretty cool, so hopefully a lot of people will use those hybrid clients because they are really good and the side benefit is they are helping the web users out.

I think there is a lot of interesting stuff going on in JavaScript these days from like WebCrypto in the browser to people using JavaScript to do hardware like powering drones and Arduinos and getting JavaScript to run and do all sort of things that people never thought that it’s possible before. The JavaScript community is awesome, it’s full of mad scientists, people who are like let’s see, can we make JavaScript do this and I love that ”can do” attitude and people aren’t afraid of hack on stuff, so I think one thing that is really helped that whole way of doing things is NPM.

So NPM is a package manager for Node and it’s really nice because people feel like they can publish anything to it really easily, so people publish lots of code and a lot of it is very small modules that do one thing well and then are decomposed out, so they are really small and robust and composable with other modules and it's just really vibrant and going really fast right now, I think the rate at which it is growing is incredible, I mean Node is not very many years old but there are already 70.000 modules on NPM, it’s incredible, it’s growing faster than any other module ecosystem ever. So I mean it’s really exciting and I think just watching the new stuff that comes out every day is incredibly fun and the people who are writing all the code are really cool people and I just encourage anyone who has any passing interest to really take a look at what’s going on in JavaScript these days, I think there is really cool stuff happening.

Yes, so WebTorrent is on GitHub, the short URL that you can use in www.WebTorrent.io , so just visit www.WebTorrent.io and it redirects to the GitHub [repo] currently, I’ll make a site for it later but you can check up on the project there and give it a try and my blog is www.Feross.org , so if you want to follow the stuff I do later you can go to www.Feross.org and I’d say also you can just check out my GitHub page for all the other BitTorrent related modules that I’ve been writing, they are all in my GitHub if anyone is interested in that.