WhatfettleSquare Peg, Web hole

William Vambenepe set an interesting challenge Square Peg, REST Hole, which unusually I decided to pickup and in a timely fashion. It’s worth noting, I’m not a REST purist rather a Web practitioner which will hopefully colour my approach differently to the other fazillion answers this will no doubt generate:

Long-lived operations. You can’t just hang on for a synchronous response. Tim Bray best described the situation, which he called Slow REST. Do you create an “action in progress” resource?

Consider a phone call: a HTML form POSTs two phone numbers to make the call. Phones ring and you’re returned a link, or redirected to a page for the call-id. Refreshing that page (using GET) tells you the status of the call: how long they’ve been talking, how much its cost so far, etc.

If you really want to avoid polling for when the call ended, then there’s the Comet hack. Until they release the Websocket doomsday virus.

Query: how do you query for “all the instances of app foo deployed in a container that has patch 1234 installed” in a to-each-resource-its-own-URL world? I’ve seen proposals that create a “query” resource and build it up incrementally by POSTing constraints to it. Very RESTful. Very impractical too.

Um, I don’t see the impracticality given how common this is on the Web; a GET form which searches for patch with two fields “container URI” and “patch-number”. That returns a list of URIs for each application containing the patch. Asked to do the search today on a large dataset I’d probably use something like CouchDB map reduce, but that’s a detail of implementation. For power-users you could offer an advanced options form, or even a SPARQL query form, like http://data.gov.uk This seems so trivial, so I’m starting to worry I’m falling into a trap!

Events: the process of creating and managing subscriptions maps well to the resource-oriented RESTful approach. It’s when you consider event delivery mechanisms that things get nasty. You quickly end up worrying a lot more about firewalls and the cost of keeping HTTP connections open than about RESTful purity.

I’ll ignore the Firewall issue, given that’s the same for WS-*, but “Event delivery” is a matter of either polling or web-hooks and these days pubsubhubbub has traction, and are certainly easy to understand, implement and scale. If you need a way to aggregate fragmented message flows, then Salmon is worth a look. Really Webhooks, Pubsubhubbub and Salmon are just trendy names for patterns observed working on The Web. A long time ago, I built a system using two RSS feeds for a message queue, one said “here’s a list of data items for you”, the other on the subscriber said “here’s a list of data items I’ve secured”.

Enumeration: what if your resource state is a very long document and you’d rather retrieve it in increments? A basic GET is not going to cut it. You either have to improve on GET or, once again, create a specifically crafted resource (an enumeration context) to serve as a crutch for your protocol.

You have quite a few options: offer the ability to address a portion of the resource, using, say, a query string, e.g. http://example.com/video?start=1:20&end=2:20 or use the Content-Range HTTP header. I prefer bookmarkable URIs you can easily try out in a browser, so would suggest serving the entire document and identifying the portion using a fragment-identifier until it really hurt.

Hmm.. this seems so trivial, I guess I’ve missed the point, again. Are we talking about long documents, or paging through results, which is a very common pattern on the Web, you’ve used Google, right? The trick to making this programable is not to say ..?page=2, but put something stable in the URI, ?item=1024&nitems=100.

Filtering: take that same resource with a very long representation. Say you just want a small piece of it (e.g. one XML element). How do you retrieve just that piece?

Ah, maybe I’ve mixed up this with the last question. Or maybe they’re the same question. I’ll say as above.

Collections: it’s hard to manage many resources as one when they each have their own control endpoint. It’s especially infuriating when the URLs look like http://myCloud.com/resources/XXX where XXX, the only variable part, is a resource Id and you know – you just know – that there is one application processing all your messages and yet you can’t send it a unique message and tell it to apply the same request to a list of resources.

Write a form which POSTs or PUTs a series of IDs to be changed. Alternatively send a value to modify a collection in one step: e.g. POST status="paused" http://myCloud.com/resources/status/thrashing. You can write an “endpoint”W CGIW resource handler^W thing to do anything to anything. I’d consider exposing operations on a set of tags, a search results, whatever, so the collection can be in the eye of the consumer.

The afterlife: how do you retrieve data about a resource once it’s gone? Which is what a DELETE does to it. Except just because it’s been removed operationally doesn’t mean you have no interest in retrieving data about it.

I avoid DELETE precisely for this reason, or at least reserve it for the nuclear option. As with the phone-call example, hanging up isn’t a DELETE, rather a POST or a PUT to change the status of the call to “terminated”.

So given William is significantly brighter than me I’m sure I’ve just set off all the booby-traps, and now have pie all over my face. Hopefully I’m going to learn something as a consequence.