The goal of this post is simple, in plain English explain what a CDN does and how it is used in our development/version management process. In scouring the web, you find a lot of articles like this one that do a good job of selling/explaining the benefits of a CDN but provide little in terms of integrating it with your process. But before we dive into that, what exactly is a CDN?
A CDN is quite simply a smarter static file server co-located near the users of your applications. Essentially, imagine a collection of script, css, image, html, and other files sitting in a server separate from your main web server/application. The CDN steps in for your web server and does the mundane work of sending files to your users. However, it is smarter than that since you have one of these junior level employees in every country your people work in. So imagine a senior level CDN delegating the work to a pool of internationally distributed junior employees that handle their respective regions, and that is the primary benefit you get from this additional architecture layer. But it gets better, CDNs are typically built on fast hardware like SSDs and powerful CPUs. In addition, they sometimes serve up your static files directly from RAM akin to a NoSQL database (RAM is fast period). So now your CDNs are doing the job of your webserver better than your webserver ever could (remember your webserver has a never ending list of responsibilities, ain’t nobody got time for static file hosting):
Oh but we are not done, CDNs typically provide for the ability to purge their own cache of files when you need to refresh items with new content once they expire on the browser. In addition, http cache headers can be adjusted to your desired expiration interval forcing refreshes any time you please when the client requests the content configured with the expiration. Last but not least, we gain some scalability. The webserver doesn’t waste its time and the CDNs can scale out accordingly per demand. Congratulations, you have just implemented the single responsibility principle of SOLID at an architecture/hardware level, who doesn’t love that?
Now that we know what these junior workers do, let’s talk about including them in our development process. We will focus on managing versions of content rather than continuous integration (CI) processes which are largely outside the scope of this post. So you have went live with your first single-page application (SPA). Everything is working great. However, this peaceful existence never lasts. As users typically do (thankfully as it keeps us in business), they request some changes. There are two scenarios you can run into here. First, they want a minor change that doesn’t break the application overall. Think a cosmetic change like maybe hiding a div after a save action. Great this is pretty easy for us. We could follow the same process as normal development and deploy an updated script file. We then purge the CDN cache of this file in favor of the new file and wait for the user’s browser to expire the locally cached script. We are assuming here of course that index.html is never cached. So 7 days go by (default for an Azure CDN) and the user gets the new functionality or if they are relatively savvy and know the change is there, they can flush their cache and get it sooner. Great no issues, until our second scenario arrives on the scene.
There are now major changes needed in the application that will break functionality for users with cached scripts. Maybe an endpoint on the backend server has changed for instance and a particular page will need updated to call it appropriately. This is where versioning comes in. I know what you are thinking right now, well can’t I just have the users refresh their cache manually or worse could they wait until it expires? Maybe you think I could deploy both to maintain backwards compatibility. Sure you can but do you want to manage telling users across multiple countries globally how to get a new version of their app? The answer is a resounding no way, it should just happen. So what are we to do?
One could make the case the above second situation problems can be resolved with doing a hash/checksum. Sure, but to do a validation check you must send this information to the server every time a user requests a file you are concerned about, thus defeating some of the benefit you get from caching. The only definitive solution to this problem is changing the URL of the file(s) being retrieved from a CDN on an always cache exempt index.html. Enter versioning, in the form of a new directory perhaps or even a new file name (ie: /v1/app.js or app_v1.js). By creating a new URL, you force the browser to request the item from the CDN (it has no knowledge of that file as being cached locally, as it never existed to the browser). Not too hard right, you ensure the CDN has the file cached up and you update your script file references. I hear it coming through the Internet already, “but Shawn that is really annoying and just another thing to remember during a deployment”. Yup, you are right, but I have seen the light and the light is lit with gulp build tasks. I am not going to go into a specific plugin, here is a quick search to start the hunt. Essentially, you can update your html to a new version of your script/file references in your build/deployment process. Imagine changing a simple string variable in your gulpfile.js with the version number and concatenating it into the file path when it is written to disk or maybe using your build system to bring in the build number. Then another task steps in to update your script tags with the same version information. Now you ensure every user (provided the file is loaded to the CDN) gets the latest content you have available with no manual cache reset by end users. All in a day’s work for our junior employee of the week! Now our webserver can focus on the interesting tasks and who knows maybe take on more work as its job is never done!