Up with which I will not PUT

Submitted by Larry on 17 October 2012 - 10:35pm

For Drupal 8, we want to bake REST support directly into the core system. It's unclear if we'll be able to go full-on hypermedia by the time we ship, but it should be possible to add via contributed modules. For the base system, though, we want to at least follow REST/HTTP semantics properly.

One area we have questions about is PUT, in particular the details of its idempotence requirements. For that reason, I'm reaching out to the Interwebs to see what the consensus is. Details below.

For now, we're confining ourselves to RESTful access to entites, Drupal's main data object. Every entity has a "native" URI at http://www.example.com/$entity_type/$entity_id, such as /node/5. We're currently looking at JSON-LD as our primary supported serialization format.

Naturally for creating a new entity, we cannot use PUT since the entity ID is auto-generated. For that, POST to a /node/add page of some sort, which returns the URI of the created node. But what of updates?

Idempotence

At first blush, PUT /node/5 seems like an obvious thing to do. Simply PUT a JSON-LD representation of a node to an existing URI and it gets overwritten with the new version, no muss no fuss. The problem is that Drupal, being a highly extensible system, cannot always guarantee no-side-effects when that happens.

My understanding is that idempotence in HTTP is not absolute. For instance, GET, HEAD, and PUT are idempotent, but "incidental" side effects such as logging or statistics gathering are OK and not a violation of their idempotence. RFC 2616 has this to say on idempotence:

Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property. (RFC 2616 section 9.1.2)

I'm not clear on "the side effects are the same" qualifier. Does that mean it can be a repeat of the same side effect, or a net-0 effect?

There are two places where this becomes relevant for Drupal.

Versioning

Many types of entity in Drupal (hopefully all soon) support revisioning. That is, when saving the entity instead of overwriting the existing one a new draft is created, which, sometimes but not always, becomes the new "default" version. Previous versions are available at their own URIs. That can change at any time, however, subject to user configuration. Also, more recently we've been allowing forward revisions, that is, creating a new version that is not yet the default version, but will be.

How does that play into idempotence and PUT? If a new revision is created, then repeating the PUT is not a no-op. Rather, it would create yet another revision. The spec says:

A single resource MAY be identified by many different URIs. For example, an article might have a URI for identifying "the current version" which is separate from the URI identifying each particular version. In this case, a PUT request on a general URI might result in several other URIs being defined by the origin server. (RFC 2616 section 9.6)

That seems to imply that a PUT to create a new revision is OK. However, what of forward revisions? If you create a new revision, but don't set it live, it means that a PUT followed by a GET on the same URI will not return the value that was PUT. It would return the previously existing value.

Put another way:

PUT /node/5

{title: "Hello world"}

Results in:

GET /node/5

{title: "Hello world"}

and

GET /node/5/revision/8

{title: "Hello world"}

And that's totally fine, by my read of the spec. However, what of:

PUT /node/5

{title: "Bonjour le monde"}

Results in:

GET /node/5

{title: "Hello world"}

GET /node/5/revision/8

{title: "Hello world"}

GET /node/5/revision/9

{title: "Bonjour le monde"}

Is that still spec-valid behavior? And if not, does that mean that any system that uses a Create-Read-Archive-Purge (CRAP) model instead of CRUD, or that supports forward revisioning, is inherently not-PUT-compatible? (That would be very sad, if so.)

Hooks

The other concern is Drupal's extensibility. When an entity is saved, various hooks/events fire that allow other modules to respond to the fact that the node has been saved. Those hooks can do, well, anything. While in the vast majority of cases they will do the exact same thing every time a new update is made or a revision is saved, that's not a guarantee. They may take a different action depending on the values that were just saved for the entity. Or they may take a different action on different days. Or they may generate IO, such as sending an email or saving additional database information, or triggering a cache clear, or launching a nuclear warhead. (Unlikely, but the API allows for it!)

Since those hooks MAY do things that are not idempotent, does that mean that we MAY NOT use PUT, since it must be idempotent? Or does it mean that we simply document that hooks SHOULD NOT do non-idempontent things and call it a day?

In any event, that's our situation. We want to properly leverage the HTTP spec and REST principles here, but I fear that Drupal's very extensibility makes that semantically impossible. I am hoping I'm wrong, but in any event I turn the question out to the peanut gallery for consideration.

Can we PUT up with it?

Berend de Boer (not verified)

17 October 2012 - 11:37pm

Larry, "side effects" means "observable side effects" for the caller. There are always side effects, but its irrelevant if you can't observe them (a PUT doesn't mean you have to write to the same block on the disk).

A PUT to node/5 should be allowed. Creating a new invisible revision is fine.

But here when it should not be allowed: when you have some advanced editing workflow, and you need to approve the change. PUT to node/5 would overwrite the published one. In such a scenario you need PUT to node/5/draft or so.

On your last example: PUT is an overwrite, getting back something different than you wrote in that case is highly unexpected behaviour. Note that you don't need to get back an exact copy I would say. I.e. if you PUT to node/5 and do a GET, you get a formatted page if you request "Accept: text/html". To get back what you wrote, you probably want a GET with "Accept: application/json" or so.

I.e. what you get depends on the media type.

Yeah, I was deliberately ignoring media types for the moment, as those are just a representation of the resource.

However, your comment concerns me. It implies that forward revisions inherently break PUT, and since whether or not forward revisions happen is a user-configurable switch it means that a user-configurable switch would determine whether PUT is legal or not. That is very unstable.

Do you have a source for that, if so? I'd want firm grounding to say we cannot use PUT, especially since some other comments below say we can.

In all REST designs versioning is problematic. Not just for PUT either, GET has its fair share of problems if you follow it's idempondence property. If the URI reflects the version information things get better, but only if every update creates a new version. That maintains a something close to referential transparency, but becomes very hard to track, very quickly, which in turn is destroying the point of ref transparency.

For PUT it is ok to use, if it always creates a new resource. Then it is always idempondent. For example PUT /node/12345 should be fine, as long as 12345 is a fresh node id. PUT /node/add is not, as we observe different http locations in the response.

Going further - PUT /node/12345/revision/7886 will be fine, assuming the same logic as above. The common factor seems that we have to supply a unique name from the outside, which more often than not will be cause more problems than not.

POST does not have these problems, it subsumes the PUT functionality, loses the idempondency. It allows clear and reasonable chaining of requests, as we can interpret the POST response, for example the location header of node/add and go from there.

Having said all that, it probably is not worth the effort to use put on any of the type of resources Drupal core maintains. I can't think of an instance where it will be justified. Maybe uploading files cuts it, but that's it.

Maybe the keys lies in the explanation in 9.1.2 in RFC2626:

>However, it is possible that a sequence of several requests is non- idempotent, even if all of the methods executed in that sequence are idempotent.

The way I read that is PUT + PUT + GET doesn't necessarily have to be idempotent. Only the PUT + PUT part does and you are not restricted to returning the last PUT resource. So that should get Drupal this in the clear as far as the revisioning and forward revisioning goes?

As for the hooks, I think documenting and evangelizing it is the best bet.

I don't have an answer, but I will add another thoughts

HTTP 1.1 is basically RFC 2616, but also there is another RFC 5789 (PROPOSED STANDARD) that extends it with the PATCH verb, that is very much related to PUT. Although in this case "PATCH is neither safe nor idempotent"

It's interesting some paragraphs that actually makes the definition of PUT more clear:

The PUT method is already defined to overwrite a resource with a complete new body, and cannot be reused to do partial changes.

In a PUT request, the enclosed entity is considered to be a modified version of the resource stored on the origin server, and the client is requesting that the stored version be replaced. With PATCH, however, the enclosed entity contains a set of instructions describing how a resource currently residing on the origin server should be modified to produce a new version.

Also to see a real example: GitHub API v.3 implements a REST API, with PATCH verb included.
Looking at Where do they use PUT ?, seems in places where they can warranty idempotency: Starring a project for example. Starring multiple times will not have no side effects.

So is it that bad that we discourage the use of PUT ? Maybe leaving to contrib the task of warranty idempotency for those specific paths where they want?
For example a path for "marking published" an article, it's idempotent: Then accept a PUT in that case

agentrickard (not verified)

18 October 2012 - 7:20am

This might help, if my story is not unique.

In 15 years of professional development, I have used PUT once. The use case was to move files via REST. The data for those files was sent via POST. In the case of files, the overwrite of the body is required behavior.

The MediaMosa team has implemented a robust REST interface in Drupal 6 & 7, and that's their only support of PUT.

From RFC2616: "The PUT method requests that the enclosed entity be stored under the supplied Request-URI."

While it's true that a client may do "PUT /node/5 " and subsequently not obtain from "GET /node/5" due to intervening requests from other clients, the server should not be built in a way that it doesn't have the "modify in place" semantics for PUT. It's sensible to use "POST /node/5/revision " to publish a new revision.

I wonder why don't you guys try to leverage APP (http://www.ietf.org/rfc/rfc5023.txt) or WebDAV (http://tools.ietf.org/html/rfc4918) for most of your RESTful implementation? Either or both of them may not provide out of the box everything Drupal needs but they support many useful idioms for authoring and publishing.

Shouldn't we just treat node/5/revision as it's own resource?

I.e. if I PUT to node/5 it just updates the current revision in place, but if I POST to node/5/revision it creates a new revision. PUT to node/5/revision/8 should then update that revision in place. I think that removes some of the ambiguity around what different actions will do, and basically exposes revisions as their own resource, which would provide a more consistent API for working with revisions.

That implies that the decision of whether to make a new revision or edit in place is entirely up to the client. That's currently entirely up to the server. Switching to client-decision would be a big and potentially security-problem-causing change.

Also, that would imply that using a CRAP model instead of CRUD, which I firmly believe is the superior way to go, is fundamentally incompatible with PUT. That seems like a huge mismatch that I'd hope someone has figured out how to resolve.

I understand that Drupal currently tries to make that decision on the server, but it seems that it should be an option for the client as well. These types of problems in Drupal have historically been it's biggest weakness, such as the crap in node_save. It's time to actually fix these problems instead of trying to bolt a REST API on top of them.

Decision is not up to the client: the server simply responds with 403 if you do a PUT to /node/5 and revisions are required, or you generate a revision on the fly.

It's still the server that determines what's allowed.

Owen Barton (not verified)

18 October 2012 - 5:50pm

I don't view forward revisions as a fundamental issue - I would suggest something like the following is quite within the REST and CRAP ethos:
- Forward revisions disabled: allow PUT /node/123, create a new revision and reference /node/123/revision/456 as an alternate URL (via rel in the HTML).
- Forward revisions enabled: if we get a PUT /node/123, respond with an access denied 403.3 (probably, although I could see a case for a redirect here instead), indicating that new revisions may be POSTed to /node/123/draft. On POSTing new revision, client is redirected to /node/123/revision/457 (or whatever the ID is of their revision).

With respect to idempotence, I think ultimately this is a dialog between the site owner and API users. I think most notifications and similar activities wouldn't be considered to break that rule - and there is very little anyone can do to ensure absolute global idempotence (/me sets up an IFTTT rule to delete the internet next time Eaton mentions ponies on twitter). I feel the essence is to avoid things like a PUT "foo" giving a GET result of "foobar" one time and "foobaz" the next. Presumably site owners will have the ability to disable the API (and construct their own more appropriate API as they see fit) if they know their site crosses the line, but ultimately I think most sites will be fine, and the final decision is not Drupal's to make.

Nodes are very like composites of the core "node" and "node revision". How we break it up, and how we structure the URIs, will be important to clarify what is node, what is revision, and what is their combination for use in rendering the currently published content. Another scheme:

  • POST /node: Create a new node, and a new initial revision.
  • PUT /node/123: Change the "core node". If revisions are not enabled, this includes changing "revision" data. If they are enabled, I can't figure what we'd use this for, except perhaps adjusting the current revision.
  • POST /node/123/revision: Create a new revision.

Agreed on idempotence. Thinking of it in the context of caching the resource is a good shorthand way of thinking about it--if you can effectively use the same cached representation for either of the same PUT request, you got idempotence (sans perhaps an update date :)

Alan Dixon (not verified)

25 October 2012 - 8:15am

Fascinating discussion, especially after reading Mark Boulton's "Adaptive Content Management".

I think you are asking for a technical interpretation, but here's an answer considering it from a workflow and business process point of view. When you ask about PUT to an entity url - we've become accustomed to thinking of that either creating or modifying a node, but we really need to think of it as one part of a potentially long series of content editing, i.e. we're always creating a revision from a previous one, possibly empty.

In other words, if you follow this advice: when you PUT to a en entity URL, you must always explicitly PUT to a previous revision (i.e. tell it what your original was).

Or put another non-technical way - if I'm working on a paper and I send it to you for revision, I'll be irritated if you ignore what I've written and send me back a completely new paper - I want you to only revise it if you've read what I written, and I hope you're reading the most recent version to boot. If I've modified my original in the mean-time, I'll have to take responsibility for sorting out the mess.

All of which supports Moshe's point, I think, that revisions should always be used and not configurable (but we might need better tools for throwing out or at least archiving the old ones more efficiently).