Let’s say you have a resource that you can do normal PUT/POST/GET operations on. It represents a BLOB of data and the methods retrieve representations of the data, be they metadata about the BLOB or the BLOB itself.
The resource is something that can be processed by the server on request. In this instance a file that can be parsed multiple times.
How do I initiate that processing? It’s a bit RPC like. Is there best practice around this?
I’ve read this: RESTFul: state changing actions but it doesn’t really answer the question.
(First time on programmers. This is the right place for this sort of question, right?)
4
The processor for this resource could be its own resource. Consider for a minute that the name of the BLOB resources is /data/
. By making the processor its own resource, you allow yourself to make different processors. This also prevents you from having to modify the data
resource to facilitate processing. It could work like this:
POST
A post might take the id of the data
that you’re trying to process. This would return some sort of id that represents the process or job.
GET
Given an Id, this would get the result of the processing. If it’s not done, you could return an empty response or a 204 NO CONTENT.
HEAD
Given an Id, this could return just the status without returning the result of the processing. The “job metadata,” if you will. If you really wanted to, you could return the parameters, such as the data
Id that is being processed, the start date, etc.
DELETE
Given an Id, this would cancel the request.
PUT
I’m not sure that PUT is applicable here.
3
The HTTP protocol has built-in support for the concept of background or batch operations in the form of status code 202:
10.2.3 202 Accepted
The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place. There is no facility for re-sending a status code from an asynchronous operation such as this.
The 202 response is intentionally non-committal. Its purpose is to allow a server to accept a request for some other process (perhaps a batch-oriented process that is only run once per day) without requiring that the user agent’s connection to the server persist until the process is completed. The entity returned with this response SHOULD include an indication of the request’s current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.
Clients will appreciate if your 202 response also includes a Location
header where they may check the status. This is especially important for any automated end-to-end tests.
I’m not sure if this is really your question, because a large chunk of it actually seems to be referring to Content Negotiation, which is how you store or retrieve “different representations” of something. Of course you can just have different URLs, such as /widgets/1/info
and widgets/1/blob
, but the preferred way of doing this is through conneg. There are two headers that are relevant here:
Accept
, which specifies what the client wants to receive;Content-Type
, which specifies what the client is actually sending.
So, for example, let’s say you have a widgets
API that is supposed to be able to support widgets in either the “ACME” or “OMNI” format. The client can send in either format via the Content-Type
header:
PUT /widgets/123
Content-Type: application/vnd.foo.acme-widget+jsonPUT /widgets/456
Content-Type: application/vnd.foo.omni-widget+json
The server should understand both of these requests and choose the correct parsing method based on the Content-Type
header. On the receiving end, the client specifies with Accept
:
GET /widgets/123
Accept: application/vnd.foo.acme-widget+jsonGET /widgets/123
Accept: application/vnd.foo.omni-widget+json
Both requests are for the same resource, and they use the same method (GET), but when the server sees the first one, it should provide the data using the ACME scheme, and when it sees the second one, it should provide the same data in OMNI scheme.
One considerable benefit of this approach is that every resource still has a canonical URI which you can use in Location
headers and so on. It’s not required, but generally ideal in REST for each resource to “live” at exactly one URI and no more.
If the difference is truly between data and metadata then you probably should have an actual metadata resource, like /widgets/123/info
, but if you are dealing with different representations of the same data then you should use Accept
and Content-Type
.
I think that covers all aspects of your question, but if you think something is missing – please clarify.
2