Rethinking Web Development: Non-RESTful APIs

Jan 27, 2014 at 8:30AM
Caleb Doxsey

Software development is a strange industry. Applications are hard to build: they take months of work, have lots of moving parts and they're extremely risky - often being built under tight deadlines in competitive markets. And they're usually built by surprisingly small teams of developers; developers who probably learned a very large chunk of the knowledge needed to build the application as they built it. It's kind of amazing that anything we build works at all.

And so I'm constantly surprised that we obsess over things which don't really matter. Developers will have vigorous arguments over everything from arbitrary stylistic choices (tabs vs spaces, where to braces, ...) to tool choice (like which editor to use) and naming conventions. Sometimes it seems like the amount of energy we expend discussing these things is precisely correlated with its level of arbitrariness.

Actually it's worse than that. Have you ever met an Architecture Astronaut? They take your crude, simple but working project and transform it into a more "correct" version. It's not actually functionally any different (one hopes), but it now has umpteen levels of indirection, factories, interfaces, powerful abstractions, design patterns and a myriad of other features you never really knew you needed. It also has the quality of being basically unmaintainable because you don't actually understand it anymore.

A remarkable demonstration of this can be seen in the Fizz Buzz Enterprise Edition. It's funny because it's not all that different from reality.

Actually this tendency to transform arbitrary decisions into moral categories and then hang them as a millstone around the neck of others is a much broader phenomena. At its root is perhaps the need for markers to indicate who is in the group and who is out of it, and then once we have those markers established the need to assert our mastery (as a game of one-upmanship). This tendency is only exacerbated by the challenges we're presented with: we aren't actually confident that we know what we're doing and the systems we build are often incredibly difficult to get working. Rather than address the hard problems, we isolate the easy things and focus on them. (Maybe I can't build a distributed database, but I can tell you that your function should start with a capital letter)

So I thought I might tackle one of these software shibboleths: the Restful API.

REST

REST is a complicated architectural style described here. I'm not particulary interested in tackling the actual academic meaning of the term, rather what it has become as a popular buzzword implementation. I have in mind the focus on proper HTTP verbiage (GET, POST, PUT, DELETE), resource URIs, and thinking almost entirely in terms of the representation and transfer of resources.

For example suppose we are building a RESTful API for email. We might have a URL structure like this:

{GET|POST|DELETE} /emails/{EMAIL_ID}
{GET|PUT} /emails

GET /emails lists the most recent emails, PUT /emails creates a new one, GET /emails/{EMAIL_ID} gets a particular email, POST /emails/{EMAIL_ID} updates an email, DELETE /emails/{EMAIL_ID} deletes an email.

So here's why I don't like this approach:

State

REST focuses on state, but state is not the primary building block of an application. What does it mean to "create" an email? Does that mean it sends it? How can you "update" or "delete" an email? Suppose you have 1,000,000 emails... listing them all doesn't really work anymore does it? Consider this approach to sending emails:

PUT /emails/{EMAIL_ID}/send

A URL like this doesn't make sense. It would be read "Create a 'send' record for email {EMAIL_ID}". What exactly would you send to this endpoint? An empty JSON object? ({}) It's an uneasy fit.

Suppose instead you add a "state" field to your email:

{"id":1234, "subject": "whatever", ... , "state": "unsent"}

And you would update that record and POST the update. This is a better approach from a URL perspective, but it's much messier from an implementation perspective. On the server I have to detect changes to this object and act accordingly (ie State was changed, therefore I will send the email). Do I merely update the record in my database, schedule the email to be sent later, and respond that everything is fine? Or do I wait for the action to be completed in its entirety? Maybe I add additional types of state:

{ ... "state": "sending" }, { ... "state": "sent" }, { ... "state": "bounced" }, ...

But "state" in this sense is not really a property of an email itself, rather it's more a property of our system: I'm currently sending your email, I sent your email, I tried to send your email but it was blocked, ...

Problems like this aren't unusual - they're typical. Modern web applications aren't glorified Wikipedias, they're the desktop applications of yesteryear: described in terms of user workflow and actions not in terms of the mere transfer of resources.

Caching is Broken

One of the supposed advantages of a RESTful architecture is that it lends itself to caching. Unfortunately those caching mechanisms are notoriously difficult to use properly.

Consider a web application with a lot of Javascript (which is basically all of them). Somewhere in the HTML for that site the Javascript has to be included:

<script src="/assets/js/site.js"></script>

That's web dev 101, and it's wrong. For 2 reasons:

  1. Most web applications change frequently. When you change site.js there's no guarantee that your end user will get the latest version the next they visit your site unless you explicitely make it so your web server adds headers to invalidate the cache.
  2. If you add headers to invalidate the cache everytime a user comes to your site that means they're downloading 100s of KB of script everytime they reload (which can be devestating to performance)
The solution is to use a hash as part of the name of the script and add aggressive caching headers:

<script src="/assets/js/site-d131dd02c5e6eec4.js"></script>

Let's just call a spade a spade here: that's a hack. The modern web developer spends and inordinate amount of time optimizing the performance of their applications to work around issues like this. This is because the architecture to which they're beholden is fundamentally flawed. A well designed system makes the typical case easy, not hard.

For more guidance on caching read this article by Google. Speaking of Google, they got so fed up with slowness of HTTP, they silently replaced it with SPDY on their servers.

Clean URLs

Perhaps you've read this blog post: URLs are for People. Well, I disagree. URLs are not for people. Nobody enters them manually, and rarely do they even bother to look at them. Your domain matters, but outside of that if you're spending more than a few minutes thinking about how you want to layout your URLs, you are wasting your time focusing on a part of your system that doesn't actually matter. (And one thing I love about that article is his two primary examples of bad URLs are two of the most popular sites on the internet: Google and Amazon...)

HTTP Verbs

If you look at a framework like Ruby on Rails it places a great deal of emphasis on using the correct HTTP verbs. What's bizarre about this is even in the original construction of web servers with simple CGI-based forms, the set of HTTP verbs was not widely supported. GET and POST were widespread, but their lesser-known cousins DELETE and PUT were unreliable. This leads Rails to add code like this:

<input name="_method" type="hidden" value="delete" />

So why advocate for a system which doesn't actually work out of the box?

Bi-Directional Communication

RESTful architectures are client-server architectures. All requests originate from the client, and all responses originate from the server. Therefore it is impossible for the server to initiate communication with the client. Sadly almost every application needs server-initiated communication.

For example it'd be great if our email application could tell the client when a new email came through. The RESTful solution to this problem is to poll periodically - a clumsy, inefficient and unreliable process.

Alternatives

Perhaps the main alternative to REST when it comes to APIs is RPC, which could be pulled off with any number of mechanisms (AJAX, WebSockets, ...). But RPC is not a magic bullet, it has its own set of issues (which probably led to the creation of REST in the first place). My point is not to offer an architecture which is superior to REST in every way, rather I think that the application ought to drive the discussion about architecture. If REST works for you application, then by all means use it, but if it doesn't don't be afraid to use something else.

Too often we measure the quality of an application by it's conformity to a set of pre-defined rules - the Thou Shalt Not's of web development - when we should really be treating those rules as suggestions - conventions that someone once found useful in building their own application.