Question

Our current deploy process goes something like this:

  1. Use grunt to create production assets.
  2. Create a datestamp and point files at our CDN (eg /scripts/20140324142354/app.min.js).

    Sidenote: I've heard this process called "versioning" before but I'm not sure if it's the proper term.

  3. Commit build to github.

  4. Run git pull on the web servers to retrieve the new code from github.

This is a node.js site and we are using forever -w to watch for file changes and update the site accordingly.

We have a route setup in our app to serve the latest version of the app via /scripts/*/app.min.js.

The reason we version like this is because our CDN is set to cache JavaScript files indefinitely and this purposely creates a cache miss so that the code is updated on the CDN (and also in our users' browsers).

This works fine most of the time. But where it breaks down is if one of the servers lags a bit in checking out the new code.

Sometimes a client hits the page while a deploy is in progress and tries to retrieve the new JavaScript code from the CDN. The CDN tries to retrieve it but hits a server that isn't finished checking out the new code yet and caches an old or partially downloaded file causing all sorts of problems.

This problem is exacerbated by the fact that our CDN has many edge locations and so the problem isn't always immediately visible to us from our office. Some edge locations may have pulled down old/bad code while others may have pulled down new/good code.

Is there a better way to do these deployments that will avoid this issue?

Was it helpful?

Solution 2

step 4 in your procedure should be:

git archive --remote $yourgithubrepo --prefix=$timestamp/ | tar -xf -
stop-server
ln -sf $timestamp current
start-server

your server would use the current directory (well, a symlink) at all times. no matter how long the deploy takes, your application is in a consistent state.

OTHER TIPS

As a general rule of thumb:

Don't do live upgrades. (unless the language supports it, but even then think twice)

Pulling code using git pull and then waiting for the app to notice changes to files sounds a lot like the 90's: uploading php files to an apache web server using ftp (or sftp if you are cool) and waiting for apache to notice that they were updated. It can't happen atomically, so of course there is a race condition. Some users WILL get a half built and broken site.

I recommend only upgrading your live and running application while no one is using it. Hopefully you have a pool of servers behind a load balancer of some sort, which will allow you to remove them one at a time and upgrade them.

This will mean that users will be able to use both the old and the new site at the same time depending on how and when they access it, but that is much better then not being able to access it at all.

Ideally you would be able to spin up copies of each of the web servers that you have running with the new version of the site. Check that the new version does work, and then atomically update the load balancer so that everyone gets bumped to the new site at the same time. And only once everything is verified to be working perfectly the old machines are shut down and decommissioned, or reused.

I'll go ahead and post our far-from-ideal monkey-patch that we're using right now.

We deploy once which may or may not go as planned, once we're sure the code is deployed on all the servers we do another build where the only thing that changes is the version number.

Then we deploy again server by server.

The race condition still exists but because the application code between the two versions is the same this masks the issue since no matter which server the CDN hits it gets the "latest" code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top