Here’s an API concept which could be useful for performance optimisation. It’s an example of key-based cache expiry applied to a broader internet-wide context instead of the internal Memcached-style scenario it seems to be mostly used for.
I want to make API calls almost as cacheable as static images. Using a news/feed subscriber as an example, which we might poll hourly, the idea is to send a last-updated timestamp along with each topic (it could just as easily be a version number or checksum):
{
username: "Wendy",
topics: [{
name: "tv",
updated: 1357647954355
},
{
name: "movies",
updated: 1357648018817
},
{
name: "music",
updated: 1357648028264
}]
}
To be clear, this resource itself comes directly from the server every time and is not cached on the edge or by the client. It’s our subsequent calls for topics that we can aggressively cache, thanks to the timestamp.
Assuming we want to sync all topics, we’d have “N” further calls to make in a naieve implementation (/topics/tv
etc). But because of the timestamp, we can construct a URL like /topics/tv/1357647954355.json
. The client usually doesn’t make a call at all if it’s already seen (and cached) the same version of that resource. Furthermore, even if it’s new to the client, an edge cache (e.g. a reverse-proxy like Squid, Varnish, or service like Cloudflare) probably has seen it before, because some other user has probably opened the latest version of this topic already. So we still bypass the application server; the server only ever creates topic JSON once after the underlying resource has updated. So instead of N+1 calls to the server, the client probably makes a much smaller number of calls, and those calls will rarely hit the app server anyway.
Now for my question All this seems feasible and worth doing, but my question is if there’s any prior art for this kind of thing and in particular, any HTTP standards to support it. I initially thought of conditional caching (ETags and modified dates), and I think they’d help to optimise this setup further, but I don’t believe they are the answer. They are subtly different, because they require calls to be passed through to the application server in order to check something’s changed. The idea here is the client saying “I already know the latest version, please send that resource back to me”. I don’t think there’s any HTTP standard for it, which is why I propose a URL scheme like /topics/tv/1357647954355.json instead of some ETag-like header. I believe some CDNs work this way and it’s surprising to me if there’s no real HTTP standard around it.
Update:
On reflection, an important special case of this is what a web browser does when it fetches a new HTML page. We know it will immediately be requesting CSS+JS, so the same versioning/timestamp trick can be used to ensure those static resources are cache-friendly. That this trick has not been formalised by the spec gives me confidence that unfortunately there is no HTTP standard for it. http://www.particletree.com/notebook/automatically-version-your-css-and-javascript-files/
4
As far as I can tell, you don’t really need any specific HTTP support for what you want to do. At the core, you seem to want to be able to take a json fragment like
topics: [{
name: "tv",
updated: 1357647954355
},
and turn it into
GET /topics/tv/1357647954355.json
and have that returned from an edge cache or other store without hitting your server, presuming of course that a neighbour (to some Internet definition of local) has already requested the same page and therefore the local edge cache has it (assuming good cache-hints, etc)?
So long as your javascript processing your original json can perform the second AJAX-GET, then you should get pretty much what you are expecting to happen, that the number of requests hitting the central server for the individual tv/ URL are few, given their high cachability.
The solution is smart, but I would question if there is even any need to make this a non-cache request to fetch the JSON for dynamic processing using JavaScript at all.
If you were to just emit the latest version of the URLs directly in <script>
and <link>
elements as part of the HTML page that you serve up that would result in one less resource request. I guess it depends on exactly how large the HTML is that you’re serving up in the first place, but even that could be served up with an ETag based on a hash of the latest dependent resource timestamps and, assuming they haven’t changed, that would result in a 304 not modified for the HTML itself.
This way you’re avoiding any JavaScript magic at all (which has to add bloat for the download logic too) and reducing requests by at least one.
3