Deployment by replacing EC2 instances

Question 1

Having new machines for each deploy is a good idea. Specially if you are a PCI compliant company and refresh your machines like this every week, it eliminates lots of auditing criteria that simplifies your infrastructure significantly. The easier way to do this is to create a new AutoScaling group for a new deploy. The new machines launch in the new auto-scaling group. The auto-scaling group can be attached to ELBs of your choice. Then make sure your health-check to be an endpoint that ELB hits on each server. That endpoint responds with 200 if the server is ready/able to respond to requests. Once ready, they will automatically start serving traffic and you only have to kill the autoscaling group which was created by older deploy to kill the machines and remove them from ELB. There are 2 serious issues to consider which both relate to the machine being stateless:

Graceful termination of machines: you need to make sure the process serving web requests dies after it finishes responding to any established client connection otherwise the user will see a 50x error code; also if you are forwarding your logs somewhere like syslog, splunk, sumologix, you also need to make sure those are forwarded before you terminate the instance;
To not have sticky sessions/session affinity. If you are storing user sessions for web application on these servers locally, you probably have sticky sessions. It is generally a bad idea because now each machine has a state and the load may not be balanced evenly. So, use a central/shared location for session store like reddis/memcache and disable sticky sessions;

Once the above two are implemented, your method of deployment will be cleanest.

Question 2

If the servers host web applications, what about user's sessions? A better approach, in my opinion, should use existing instances to deploy, via capistrano or equivalent tool and use memcache to store sessions and others short living datas.

Question 3

I think your approach is fine. The key is to make sure the new instances are responding to requests before the old ones are removed. The nice thing about this approach is that you have a really easy rollback if needed.

There are lots of different ways to solve this problem. I might look at a hybrid between this and updating running instances, but this would depend on how large & complex your infrastructures is and how often you deploy.

For minor upgrades across a small fleet, an update is manageable, but you still need to figure out how to handle a rollback. Once you get beyond a few servers this gets more complicated.

If your fleet is large enough, and you're using auto scaling, you can probably update your rules, and just start killing old instances & let AS bring the new ones up.