If I want to run a job at the end of the month to calculate the commission for all the sales consultants based on the current transaction data, eg commission rate for the person, sales done in the month, and payments received from their customers, that becomes potentially a complex batch operation.
My existing resources are something like:
/consultant
<– Details of each sales consultant/payment
<– transactions of moneys received from customers/customer
<– Details of customers in the file/order
<– Something that translates into an invoice, refers to a sales consultant and to a customer, as well as other things, eg what service and goods were sold on that order
What will be my batch initiating URI? I’m thinking either
/consultant?filter=have-made-sales&action=generate-commission-report
But it doesn’t feel right, so the other way would be to treat “batchjobs” as a virtual resource, eg
/batchjobs?jobname=consultants-comission
/batchjobs?jobname=email-invoices
Which allows me to return a job-id, which I can use in a subsequent GET to find out whether the job is done, eg
/batchjobs?jobid=123
Another resource which isn’t a resource is what to do about Reports. These are quite similar to Batch. When the client requests that a report be generated, the service should return OK 200 and a few minutes later the PDF report should sit in the requestor’s email inbox. In this case I may name my URIs as follow:
/reports?report=sales-for-the-month
/reports?report=accounts-outstanding
Or is there a better, more acceptable alternative?
It seems the problem I’m facing is about Resource vs Operation. The issue is that batch operations affects many resources, so to be RESTFUL a client should GET all the resources’s state, perform the batch, then PUT the updated state for each and every resource. There is a performance issue, particularly since the internet introduce unacceptable latency, and as result a risk to data integrity (race conditions – PUT should be idempotent) There is little to prevent another client from PUT’ing something else to one of the affected resources while the batch is being processed, or even immediately after, so undoing the batch work for once specific resource. Locking resources seems to be unRESTful. So how does one do this?
You could POST /salesreport
with { "start" : <START>, "end": <END> }
.
That would give you a 201 Created, Location: /salesreport/<ID>
When you first GET
it will be { "start" : <START>, "end": <END>, "status": "pending", "progress": <PROGRESS> }
(consider using cache directives to limit client polling rate).
You can use DELETE
to abort a report generation while it is underway.
Batch jobs related to a specific resource type could be nested accordingly, i.e. the URL could be /order/salesreport
or /consultant/commission/
or whatever.
I don’t think it is wise to expose long running operations through an API anyway, but that’s a whole different story.