Rethinking Web Development: WebRTC

Jan 30, 2014 at 9:05PM

Caleb Doxsey

Fundamentally web applications are client-server applications. Web developers write code that runs on a server and end users (clients) connect to that server (via a browser using HTTP) to perform tasks. In recent years this rather standard definition of the web application has come under fire. Increasingly code is no longer run on a server (but rather via javascript on the client) and HTTP is no longer, necessarily, the protocol used to communicate between the two machines (SPDY being the rather obvious alternative, but also things like WebSockets which don't really fit in the HTTP bucket).

However, even more dramatic than those two shifts has been the relatively recent introduction of WebRTC. If you're not familiar with WebRTC (the RTC meaning Real Time Communication), it's a technology that allows for peer-to-peer communication. That is to say, end users can communicate with one another directly, without the need for an intermediate server.

It seems to me, at least at this moment, that this is a technology that is generally not well understood and it's potential has not been fully realized. WebRTC ought to be a seismic shift in the way we build web applications. It's not yet, but I suspect it will be in a few years.

In this post, the penultimate post in the series, I will give a brief overview of how to use WebRTC and then discuss some of the possible implications for web development.

How to Use WebRTC

Getting started with WebRTC is not easy. Most of the documentation is very confusing (good luck understanding the spec), there aren't a ton of examples out there yet, and, up to this point at least, the technology has been in a constant state of flux. Nevertheless WebRTC is not merely experimental, it's a fully functional technology available in both Firefox and Chrome today.

To underlie that point, if you've not seen it, you should see the AppRTC project. This is a video chat app, similar to Skype, implemented almost entirely in the browser (with a small bit of server code) and using peer-to-peer transfer of data. For a mere demonstration, it's surprisingly useful and calls into question all of the applications out there that attempt to implement this functionality using custom Java, Flash or similar installed applications.

But back to the question at hand: how does one use WebRTC?

For the purposes of this tutorial I built a small chat application which consists of two components: a server-side Go application which facilitates the initial signaling process between the two peers and a client-side Javascript application which implements the actual WebRTC workflow. First let's take a look at a high level description of the process.

For this example, suppose we have two end users who want to talk to each other: Joe and Anna.

Joe arrives first, connects to a server, subscribes to a topic and waits for someone else to show up.

Anna arrives next, connects to the same server, subscribes to the same topic and at this point the server tells Joe that someone else connected.

Joe sends an "offer" to Anna (via the server) indicating that he wants to establish a peer to peer connection.

Anna receives the "offer" and sends an "answer" to Joe.

Both Joe and Anna trade ICE candidates. ICE stands for interactive connectivety establishment (described here) and basically represents the various ways the two parties can reach each other.

Finally the connection is made, one of the ICE candidates is agreed upon and Joe & Anna can communicate directly with another.

Code

The (mostly) complete source code for this example can be found in this gist. The server is implemented in Go, the client in Javascript and communication between the server and the client occurs over a WebSocket. WebSockets are actually fairly straightforward to implement using Go and they make it so I don't have to worry too much about storing things on the server. (As a long-polling comet server would require)

When a user first connects, the application will ask them to enter a topic. This topic is how the two peers are connected on the server (they have to enter the same thing). You can think of it like an agreed-upon virtual meeting location. That topic is sent to the WebSocket like so:

function startWebSocket() {
  ws = new WebSocket("ws://api.badgerodon.com:9000/channel");
  ws.onopen = function() {
    ws.send(JSON.stringify({
      topic: topic,
      type: "SUBSCRIBE"
    }));
  };
}

The server receives this connection, sends back the user's ID and subscribes them to the topic: (IDs are generated using a full cycle PRNG)

id := <-idGenerator
out <- Message{
  Type: "ID",
  To: id,
}
for {
  var msg Message
  websocket.JSON.Receive(ws, &msg)
  if msg.Type == "SUBSCRIBE" {
    msg.Data = out
  }
  in <- msg
}

// in subscribe
topics[topicId][userId] = out
for id, c := range topic {
  if id != userId {
    send(c, Message{
      Topic: topicId,
      From: userId,
      To: id,
      Type: "SUBSCRIBED",
    })
  }
}

And now the first user waits for someone else to show up. When that happens the same process is repeated, except this time the first user is informed that the second user subscribed, so he proceeds to begin the process of establishing a peer-to-peer connection and sends the second user an offer (via the server):

case "SUBSCRIBED":
  to = msg.from;
  startPeerConnection();
  sendOffer();
  break;

pc.createOffer(function(description) {
  pc.setLocalDescription(description);
    ws.send(JSON.stringify({
    topic: topic,
    type: "OFFER",
    to: to,
    data: description
  }));
}, ...);

The second user sees this offer and sends an answer in reply:

case "OFFER":
  to = msg.from;
  startPeerConnection();
  setRemoteDescription(msg.data);
  sendAnswer();

pc.createAnswer(function(description) {
  pc.setLocalDescription(description);
  ws.send(JSON.stringify({
    topic: topic,
    type: "ANSWER",
    to: to,
    data: description
  }));
});

In addition to the offer and the answer both users have also been forwarding along ICE candidates:

pc.onicecandidate = function(evt) {
  ws.send(JSON.stringify({
    topic: topic,
    type: "CANDIDATE",
    to: to,
    data: evt.candidate
  }));
};

Finally once the first user receives an answer, and an ICE candidate is agreed upon the two peers are connected. RTC has the ability to stream audio and video, but for this example I used an RTCDataChannel:

pc = new RTCPeerConnection({
  iceServers: [{
    // stun allows NAT traversal
    url: "stun:stun.l.google.com:19302"
  }]
}, {
  // we are going to communicate over a data channel
  optional: [{
    RtpDataChannels: true
  }]
});
dc = pc.createDataChannel("RTCDataChannel", {
  reliable: true
});

Actually sending messages is trivial:

function onSubmit(evt) {
  evt.preventDefault();
  var text = chatInput.value;
  chatInput.value = "";
  var msg = {
    type: "MESSAGE",
    data: text,
    from: from,
    to: to
  };
  onMessage(msg);
  dc.send(JSON.stringify(msg));
}

Implications

There have been two large, and completely divergent trends when it comes to networked applications: the cloud and decentralized, distributed networks. On the one hand companies like Google, Apple and Facebook have been moving all of their user's information and applications to their data centers. Your email, pictures, games, etc... are no longer on your computer, but rather they are accessed over the internet, and stored in the cloud. This comes with a whole host of advantages, but it also comes with a steep cost: Google reads all your emails, Facebook knows everything about your personal life, Apple can track all your movements and although these companies certainly provide a great deal of value with their products, in the end their goal is not to make you a happy customer (because most of the products are free), but to use the information you freely provide them for their own ends. (and mostly just to serve you ads)

At the same this has been happening, we've also the seen the rise of decentralized, distributed networks. From file-sharing applications (like BitTorrent) and streaming content providers (like Spotify) to instant messaging applications (like Skype) and even digital currencies (like BitCoin). Decentralized networks have no single, governing authority. Data is spread across the network and communication is peer-to-peer.

And though the architecture of the web has always, fundamentally, been client-server, in some ways peer-to-peer architecture represents a more accurate representation of one of the original goals of the web:

The project started with the philosophy that much academic information should be freely available to anyone. It aims to allow information sharing within internationally dispersed teams, and the dissemination of information by support groups.

Now perhaps at one time, the Googles of this world had intended to organize the world's information. But increasingly, the fundamental goal of companies like Google is not merely to organize the world's information, but to own it. You can see this when they killed Google Reader. Google doesn't want you to read blog posts on other servers, they want everyone to use Google Plus as their blog. With peer-to-peer networks it may be possible to undermine this trend. People can own their own content again.

This example demonstrates one of the most obvious use cases for this technology: video, audio or textual chat among peers. But there are other possibilites:

The chat application has a far broader usage than most people realize. Of course there are the typical Google & Facebook-chat like applications, but there are also feedback libraries (like Olark), support applications (like you might see on Comcast's website) and a whole host of multiplayer games.

Furthermore, WebRTC is secure by-default. In this day-and-age of NSA skepticism where Google reads all your email, and Facebook knows your entire life history, WebRTC is a breath of fresh air. You can finally have your privacy back and still get the robust accessibility and flexibilty of a modern web application.
Bit Torrent has demonstrated the power of a distributed file sharing network. With HTML5 technologies it is possible to build such a network directly in the browser. For example: ShareFest. Could someone implement an in-browser DropBox? (perhaps, similar to SpaceMonkey?) Or Mega? One of the downsides of WebRTC is that users must be connected at all times: once their browser closes they are unreachable: but that issue doesn't seem insurmountable... and maybe with a few helper nodes such a system could be sustained.
A distributed social network (ala Diaspora) may have a better chance of succeeding if it's just as easy to use as Facebook or Twitter. Storing all that data is challening (particularly with large media like pictures and video), but the relationships and textual updates are much more realistic storage-wise. Store that data among your (actual) peers and perhaps it could even be fairly reliable.
With the introduction of things like WebGL, WebAudio, PNACL, asm.js and a myriad of other HTML5 technologies, it's possible to build real games that exist solely in the browser. WebRTC makes it so those games can be multiplayer. This isn't a new idea - consider Artillery - but, as far as I know, it's not something which has really been realized yet.
One of the reasons internet companies are so successful is that it's very difficult to build something like Facebook. It requires a tremendous amount of capital and large, robust, highly reliable and fast systems can't be put together by just anyone. How many projects have been sunk by their inability to scale?

And yet peer-to-peer networks have the potential to scale in a way the cloud never could. A signaling server can handle an enormous amount of load with no issues. If the vast majority of your application can be re-written to run purely on the client, most of your server-side concerns evaporate.

Of course, most web applications can't be moved entirely to the front-end. Nevertheless this isn't an all or nothing game. The more work you can push to your end users, the less work you have to do on your own machines. Is it possible to do some of that background processing in a browser? It might be more possible than you imagine: modern browsers have threads, type arrays, full blown databases, offline capabilities, file-system access, etc...

And those are just a few of the things that came to mind in the last week. It's an exciting time to be a web developer and it'll be interesting to see just what turns up in the next few years.

If you managed to make it this far: thanks for sticking with it. Stay tuned for my final post in this series, where I will propose an even more radical change to how we build web applications.