Redundancy with self hosted ServiceStack 3.x service [closed]

Question

Server side redundancy & failover:

That's a very broad question. A ServiceStack self hosted application is no different to any other web-facing resource. So you can treat it like a website.

Website Uptime Monitoring Services:

You can monitor it with regular website monitoring tools. These tools could be as simple as an uptime monitoring site that simply pings your web service at regular intervals to determine if it up, and if not take an action, such as triggering a restart of your server, or simply send you an email to say it's not working.

Cloud Service Providers:

If you are using a cloud provider such as Amazon EC2, they provide CloudWatch services that can be configured to monitor the health of your host machine and the Service. In the event of failure, it could restart your instance, or spin up another instance. Other providers provide similar tools.

DNS Failover:

You can also consider DNS failover. Many DNS providers can monitor service uptime, and in the event of a failover their service will change the DNS route to point to another standby service. So the failover will be transparent to the client.

Load Balancers:

Another option is to put your service behind a load balancer and have multiple instances running your service. The likelihood of all the nodes behind the load balancer failing is usually low, unless there is some catastrophically wrong with your service design.

Watchdog Applications:

As you are using a self hosted application, you may consider making another application on your system that simply checks that your service application host is running, and if not restarts it. This will handle cases where an exception has caused you app to terminate unexpectedly - of course this is not a long term solution, you will need to fix the exception.

High Availability Proxies (HAProxy, NGINX etc):

If you are run your ServiceStack application using Mono on a Linux platform there are many High Availability solutions, including HAProxy or NGINX. If you run on a Windows Server, they provide failover mechanisms.

Considerations:

The right solution will depend on your environment, your project budget, how quickly you need the failover to resolve. The ultimate considerations should be where will the service failover to?

Will you have another server running your service, simply on standby - just in case?
Will you use the cloud to start up another instance on demand?
Will you try and recover the existing application server?

Resources:

There are lots of articles out there about failover of websites, as your web service use HTTP like a website, they will also apply here. You should research into High Availability.

Amazon AWS has a lot of solutions to help with failover. Their Route 53 service is very good in this area, as are their loadbalancers.

Client side failover:

Client side failover is rarely practical. In your clients you can ultimately only ever test for connectivity.

Connectivity Checking:

When connectivity to your service fails you'll get an exception. Upon getting the exception, the only solution would be to change the target service URL, and retry the request. But there are a number of problems with this:

It can be as expensive as server side failover, as you have to keep the failover service(s) online all the time for the just-in-case moments. Some server side solutions would allow you to start up a failover service on demand, thus reducing cost significantly.
All clients must be aware of the URL(s) to failover too. If you managed the failover at DNS i.e. server side then clients wouldn't have to worry about this complexity.
Your client can only see connectivity failures, there may not be an issue with the server, it may be their connectivity. Imagine the client wifi goes down for a few seconds while servicing your request to your primary service server. During that time the client gets the connectivity exception and you try to send the request to the failover secondary service server, at which point their wifi comes online. Now you have clients using both the primary and secondary service. So their network connectivity issues become your data consistency problems.
If you are planning web based clients, then you will have to setup CORS support on the server, and all clients will require compatible browsers, so they can change the target service URL. CORS requests have the disadvantages of having more overhead that regular requests, because the client has to send OPTIONS requests too.
Connectivity error detection in clients is rarely fast. Sometimes it can take in excess of 30 seconds before a client times out a request as having failed.
If your service API is public, then you rely on the end-user implementing the failover mechanism. You can't guarantee they will do so, or that they will do so correctly, or that they won't take advantage of knowing your other service URLs and send requests there instead. Besides it look very unprofessional.
You can't guarantee that the failover will work when needed. It's difficult to guarantee that for any system, even big companies have issues with failover. Server side failover solutions sometimes fail to work properly but it's even more true for client side solutions because you can test the failover solution ahead of time, under all the different client side environmental factors. Just because your implementation of failover in the client worked in your deployment, will it work in all deployments? The point of the failover solution after all is to minimise risk. The risk of server side failover not working is far less than client, because it's a smaller controllable environment, which you can test.

Summary:

So while my considerations may not be favourable of client side failover, if you were going to do it, it's a case of catching connectivity exceptions, and deciding how to handle them. You may want to wait a few seconds and retry your request to the primary server before immediately swapping to the secondary just in case it was an intermittent error.

So:

Catch the connectivity exception
Retry the request (maybe after a small delay)
Still failing, change the target host and retry
If that fails, it's probably a client connectivity issue.