The future of caching

Submitted by Larry on 7 October 2011 - 2:36am

This is not your father's Internet. When the Web was first emerging onto the scene, it was simple. Individual web pages were self-contained static blobs of text, with, if you were lucky maybe an image or two. The HTTP protocol was designed to be "dumb". It knew nothing of the relationship between an HTML page and the images it contained. There was no need to. Every request for a URI (web page, image, download, etc.) was a completely separate request. That kept everything simple, and made it very fault tolerant. A server never sat around waiting for a browser to tell it "OK, I'm done!"

Much e-ink has been spilled (can you even do that?) already discussing the myriad of ways in which the web is different today, mostly in the context of either HTML5 or web applications (or both). Most of it is completely true, although there's plenty of hyperbole to go around. One area that has not gotten much attention at all, though, is HTTP.

Well, that's not entirely true. HTTP is actually a fairly large spec, with a lot of exciting moving parts that few people think about because browsers offer no way to use them from HTML or just implement them very very badly. (Did you know that there is a PATCH command defined in HTTP? Really.) A good web services implementation (like we're trying to bake into Drupal 8 as part of the Web Services and Context Core Initiative </shamelessplug>) should leverage those lesser-known parts, certainly, but the modern web has more challenges than just using all of a decades-old spec.

Most significantly, HTTP still treats all URIs as separate, only coincidentally-related resources.

Which brings us to an extremely important challenge of the modern web that is deceptively simple: Caching.