I have been developing an app that will require a cron task every minute. We are handling our cron tasks with Spring Boot Scheduling. Although, I am a little worried about the following question:

One part of our product must be highly available on the mentioned task, meaning, if it fails even for 1 minute, it will have a great impact on our processes and customers. The question is: is Google Cloud App Engine reliable enough to support these processes so our product wont get affected easily, and if Google Cloud App Engine gets to fail, what options do we have to handle this kind of situation where we need an application that cannot fail not even by one single minute?

有帮助吗?

解决方案

If you need high availability where one minute of downtime is not acceptable a single cloud provider is not enough. You need multiple providers to have high availability at that level, even then it's still a matter of hoping any issues don't affect multiple providers at the same time. You also need internal processes and procedures in place that are far more challenging and demanding than choice of cloud providers. You also need to ensure anything supporting the highly available module is highly available itself in most cases.

When faced with the price tag for true high availability most organizations discover they really don't need high availability. Once a real cost benefit analysis is done, downtime tends to not look so bad. The less downtime acceptable the more your costs to insure that happens increases and that scale is exponential in nature. Accepting an hour of downtime a year only costs $D, but a few minutes of downtime a year is going to cost 10-20 times more, the price paid to prevent losses can quickly eclipse the actual losses from downtime.

To give you an idea of just how extreme avoiding one minute of downtime is, an SLA for 99.9% up-time still allows for a minute of downtime per day. For a 99.99% SLA a minute of downtime on a weekly basis is acceptable, and a 99.999% is five minutes on a yearly basis. It's very easy to have all sorts of SLAs with what look like impressive numbers, but a minute of downtime is an extremely short window. Everything has to be automated to maintain that level of up time, you need to detect and mitigate issues without human interaction. App Engine only offers a default SLA of 99.95% which wouldn't meet your needs alone if one minute of downtime is an issue.

许可以下: CC-BY-SA归因
scroll top