RTCTunnel: Building a WebRTC Proxy with Go

Nov 15, 2018 at 10:00PM
Caleb Doxsey

This blog post is based on a talk I gave at Bread in collaboration with the GolangNYC Meetup. It's for RTCTunnel an open-source application I built to tunnel TCP traffic over WebRTC, using the pions/webrtc library. The talk is embedded below along with the notes I used. The slides are available here.

This is not a transcription of the talk, it's the notes I used to prepare the talk.

I’m here to talk to you today about RTCTunnel, a proof-of-concept application I built which proxies TCP traffic over a WebRTC connection. That’s kind of a mouthful, so let’s see a demonstration.

## [A] Start Client
```bash
docker run -it rtctunnel-demonstration
rtctunnel init && rtctunnel info
```

## [B] Start Server

```bash
docker run -it rtctunnel-demonstration
rtctunnel init && rtctunnel info
```

## [+] Add Routes

```bash
export CLIENT_KEY=
export SERVER_KEY=
rtctunnel add-route \
    --local-peer=$CLIENT_KEY \
    --local-port=6379 \
    --remote-peer=$SERVER_KEY \
    --remote-port=6379
```

## [B] Start Redis Server

```bash
redis-server &
```

## [+] Start RTCTunnel (both)

```bash
rtctunnel run &
```

## [A] Start Redis Client

```bash
redis-cli INFO
```

This talk will be broken into two sections. First we will look at how RTCTunnel works and then I will try to answer the question of why build this at all.

But before we dive in, a couple things to keep in mind:

How RTCTunnel Works

RTCTunnel is a TCP proxy over WebRTC. Let’s take each of these ideas in turn:

TCP Networking

A TCP connection is a full-duplex, reliable, in-order stream of bytes. Full-duplex means you can send and receive at the same time, reliable means if packets are lost or corrupted along the way the protocol will fix it, and in-order means data will come out in the order it was sent in.

TCP connections use a client, a server and an address and a port. In Go the Client uses net.Dial to make the connection:

That returns a net.Conn, which implements the io.Reader and io.Writer interfaces, which is similar to a file or stdin/stdout (a very powerful abstraction we will see later).

Write takes in a slice of bytes and sends them over the connection via the TCP protocol to the server. Read takes bytes off of the connection and fills a slice of bytes. Every Write on the client will correspond with a Read on the server and vice versa.

Multiplexing

TCP forms a one-to-many relationship, where a server handles many clients. It is sometimes useful to share a single connection with multiple sub-connections or streams. We call that multiplexing.

There are many multiplexing libraries, but the one I used was xtaci/smux - a simple multiplexing library which comes from the KCP project - which is a custom networking protocol built on top of UDP.

SMUX works by taking an existing connection and creating a session. Once again there’s is a client and server role. Either side of the connection can act as server or client, but there should only be one of each.

The server is created with the Server method, the client with the Client method. Both of these return a Session, which has an OpenStream and an AcceptStream method. The stream returned by these methods implements the net.Conn interface and so can be used anywhere a Connection can be used.

The neat thing about a multiplexing library like this is either side can Open and Accept streams. That means we can invert the relationship between client and server, and once a connection is established our “client” can act like a server via SMUX.

As it turns out multiplexing is done all over the place. It’s how cell phones can share the same spectrum, it’s how a single cable can carry multiple channels, and, with the advent of HTTP/2, multiplexed streams can now be used to allow multiple HTTP requests to be sent in parallel.

Proxies

The third major component to RTCTunnel is a proxy. A proxy sits between a client and a server and acts as an intermediary or broker between the two. The client connects to the proxy, which then connects to the server, and any data the client sends to the proxy is forwarded to the server, and vice-versa, any data the server sends to the proxy is sent back to the client.

When everything goes well, the proxy is basically transparent to the connection and anything you can do with a client you can do with the proxy.

As before we have a client and a server, with the client connecting to port 8000 in this example, but now the server is listening on port 8001. We create a new application which is our proxy, and like our server it also listens, except on port 8000. When it receives the connection from the client, it then dials out and makes another connection to the real server on port 8001.

And then we simply copy the data between the two connections. No really. We just use io.Copy on both ends. It’s a great example of how a well thought out abstraction can make programming easy.

Why Proxies?

So you might be asking why use proxies? You see them everywhere these days, from load balancers for web services, to convoluted service meshes like Istio, where all traffic is sent over local proxies on both the client and the server. On the face of it this would seem to be monumentally inefficient and unnecessarily complex -- maybe a great example of over-engineering gone haywire.

Well perhaps the best way I can tackle this, is by offering a couple stories. I should mention that these stories are not true. I’ve changed major details. But hopefully they’re not too far off from the kinds of problems you can run into when building distributed systems.

So imagine we have setup a system of microservices, perhaps a dozen or so, and they each communicate using an RPC protocol like gRPC. For service discovery we’re using Hashicorp’s Consul.

In Consul services are registered to a node (machine) and they are associated with health checks, for example we might have a redis server with a health check to make sure redis is running on port 6379, but most services will use the serf health check - a health check that uses a gossiping network protocol to determine if a node is reachable on the network.

Consul stands up a DNS server (along with an HTTP API) that allows clients to discover services. For example our redis server might be available at redis.service.consul.

When a health check fails, the node is evicted from the cluster, and all services are considered unhealthy and no longer returned by service requests. (redis.service.consul won’t return that IP anymore)

Now suppose we have a telemetry service which is solely used for debugging. It receives small UDP packets from all the other services in our datacenter, which is used for tracking data as it proceeds through pipelines. We use it to find unusual traffic patterns, anomalies in the data and for root cause analysis. (if you don’t have something like this already you should… it can be a real lifesaver during outages)

Suppose we have 4 of these servers, each of them designed to handle ¼ of the traffic using a consistent hashing scheme based on an id in the payload. That way as the payload travels through various systems, all of the associated data ends up on the righter. We end up with 4 DNS entries in Consul:

And that’s how clients figure out where to send the data.

So that’s our setup. Now suppose we’re doing some maintenance and need to replace or restart all the machines.

In the first case we do that and everything looks fine. But we notice a marked reduction in the telemetry data. Whole nodes just appear to be missing even though there’s just as much network traffic flowing through. After some debugging we realize that nodes are hitting the wrong service. For example the traffic for A is going to node B a lot of the time. Restarting the clients fixes the issue.

As it turns out it’s all because the DNS lookup for the service only ever happens the first time a connection is made. The clients aren’t picking up the fact that A has now taken on a new role. And we see this happen again and again for each separate client, Cassandra, Zookeeper, PostgreSQL, etc. A dozen patches later and maybe we’ve finally fixed the problem.

So that’s frustrating, but the second case is much worse.

After using Consul for a while, we realize that there ends up being a lot of DNS traffic for Consul, so we decide to use a local dnsmasq server running on every node to pull off some of the load. We setup a an application listening on Consul’s service watcher endpoint, and it writes all the services to a HOSTS file. If you’re not familiar with the format of that file, it’s just a DNS name and an IP address per line.

This is a huge win, achieving 99% fewer requests to Consul, at basically no overhead, since the caching is done on each node.

We run our maintenance again and we notice that DNS traffic starts to increase on Consul.

Obviously that wasn’t supposed to happen.

As Consul becomes overloaded its not able to keep up with the request load, and all kinds of things begin to suffer, sessions timeout, locks are released, and, crucially, the serf health check starts to fail. Nodes are randomly being evicted from the cluster everywhere. Which only causes the DNS traffic to increase, which leads to more nodes getting evicted and the whole thing spirals out of control.

Now as it turns out the reason for this behavior is obvious in retrospect. Our hosts file contains DNS names along with IP addresses and anything not in the file is forwarded upstream. So as long as there is at least one server for each service, everything is fine. Consul will only see 1% of the traffic. But as soon as there are no servers registered for a service every single request will be sent upstream. Consul is now seeing 100% of the traffic, it can’t handle it and falls over, causing everything else to fall over.

This kind of catastrophic, unpredictable failure happens all too often with distributed systems. It’s like the proverbial Zeno who comes to the conclusion that motion must be impossible. After having been through the outage ringer a few times, the question isn’t how things fail, it’s how anything works at all.

Proxies, in the form of a service mesh can actually help with this problem. Connection management, service discovery, TLS encryption, distributed tracing and telemetry, all of these things can be done by proxies instead of clients and servers. By making the network smart you can make clients and servers dumb. At least that’s the theory.

Anyway, back to RTCTunnel.

WebRTC

The final component to RTCTunnel is, of course, WebRTC. WebRTC is a suite of protocols for Real Time Communication:

I could keep going, but knowing this alphabet soup of protocols isn’t super important for using WebRTC.

From the developer’s perspective, WebRTC involves 3 components:

  1. A signaling server used as part of a handshake process to establish a connection
  2. A peer-to-peer connection which supports different capabilities like video or audio chat, and, what we care about
  3. A datachannel, which supports a reliable, in-order stream of bytes

So to use WebRTC we follow these steps:

  1. First the client creates an offer, which contains things like UDP addresses over which the client can be reached
  2. That offer is sent a signalling server, which acts as a middleman and can forward messages between two parties. Think of like your matchmaking friend setting you up on a date.
  3. The server receives this offer from the signalling server and creates an answer, which is very similar to the offer, and sends it to the signalling server as well
  4. The client receives the answer from the signalling server, and both the client and server establish the peer to peer connection

With the peer to peer connection established, we then create a datachannel.

RTCTunnel

And now we have everything we need to build RTCTunnel. It works like this:

  1. We create identifiers for the client and server using ed25519 encryption keys
  2. The client and server establish a peer-to-peer connection using WebRTC and create a datachannel
  3. We run SMUX on the datachannel so that we can multiplex many connections over a single datachannel
  4. The client runs a listener on the destination port. When it receives a connection, it creates a new stream on the datachannel session, and writes the destination port. From there it proxies data in both directions.
  5. Meanwhile the server accepts the stream on the datachannel from the client, reads the destination port, and dials out to the server running locally. It then also proxies data in both directions.

And that’s how you build a TCP proxy in Go with WebRTC.

Why WebRTC?

So why build this?

Well there are a couple reasons. Perhaps you’ve used ngrok.io, a service that allows you expose a local HTTP or TCP server over the internet, working around firewalls and NAT gateways. RTCTunnel is basically doing the same thing - an ngrok without the need for a server.

But more broadly, WebRTC is an example of a technology that runs contrary to the trend where all services are being centralized through SaaS providers. Leading to all kinds of massive security and privacy concerns.

WebRTC allows direct peer-to-peer communication, without the need of a datacenter’s worth of servers. You’ve got a supercomputer sitting in your pocket, why not use it.

Or as Tim Berners-Lee put it, we need to re-decentralize the web, and WebRTC can help.

The Browser

The second reason is something I haven’t even mentioned. WebRTC works in the browser. It’s why we’re even talking about the technology. And it provides a capability to the browser which is not possible to do in any other way.

Browsers are not allowed to create TCP connections. They can’t listen on a port. The best they can do is make AJAX HTTP requests, and even that is highly limited by the same-origin policy.

You can work around these limitations using Web Sockets, and simulate something a lot like a TCP request, but you’ll need centralized servers to pull it off, and a lot of custom code along with it.

WebRTC changes all that. You can now make peer-to-peer connections which are pretty similar to a TCP connection.

And as it turns out, RTCTunnel works in the browser too. Using GopherJS and some abstractions, we can compile Go programs with networking capabilities using WebRTC that can run natively or in the browser.

Here’s an example. This code is running a Go HTTP server and a Go HTTP client and we’re using the RTCTunnel library to establish the connection between the two. When we clicked submit on both an RTC connection was established between the two tabs in my browser, and the server responded to the client with “Hello World”.

A Browser Development Environment

In my opinion it’s this idea which holds real promise. We live in a world of constrained computing. The phones you’re carrying are sandboxed environments which go out of their way to make it difficult to program. You can’t even build an iOS app without Mac hardware - a likely prohibitively expensive barrier to entry for many.

And perhaps the worst example of this, the Chromebook. A machine which has seemingly taken over the low-cost landscape. It’s in schools everywhere, given to children as their first and maybe only computer they’ll ever see. And it’s extremely difficult to program. It doesn’t even have a text editor.

Imagine a browser-based editing environment, perhaps using the VS Code Monaco editor, where files could be edited locally, or saved to Github, and code could be compiled to Javascript. And not just toy, HTML-generating programs. But with WebRTC in the mix the ability to write distributed systems software: databases, http servers, message passing systems…

And with a set of network abstractions like those available in Go, all of the techniques learned to build those systems would be immediately applicable to real software intended to run on servers.