Question

Wondering if there are any good, established protocols that Magento merchants (or solutions providers) use to prioritize and triage incoming issues filed by the customer support team, marketing, sales, etc.

For example, levels of urgency / importance, such as:

  • Blocking customers from placing orders
  • Blocking customer support from placing orders in the backend
  • Feature with the potential to earn $X of additional revenue
  • Feature with the potential to increase customer satisfaction
  • ...
Was it helpful?

Solution

This is the classification I used at another eCommerce company and I believe it holds true today:

Fire

Anything the prevents checkout/payment/order placement:

  • Host / DNS / Server outage or downtime
  • Shipping method errors
  • Site slow / outage
  • Payment declines / processor down
  • Hardware failure
  • Anything that bothers the CEO*

An issue that affects all or the 80% majority of users. Represents loss of income, should be addressed immediately and out of the regular deployment cycle (i.e. RIGHT NOW)

Emergency

Anything that is incorrect, misleading, or out of date:

  • Premature publication of promotions
  • Incorrect messaging / promotional text / touts / banners
  • Incorrect pricing
  • SSL certificate invalidation/expiration

Loss of income opportunity or in the case of promotions or incorrect advertising / content on the site, an otherwise ROI negative issue.

May represent 50-80% majority of users, but is typically classified as the majority of people are still able to convert without disruption. If the next deployment window is unacceptable an out of cycle deployment can be scheduled, typically in off-peak/overnight hours.

SLA Response

  • Issues which are difficult to reproduce such as site bugs or display issues on varying devices
  • Anything that is not a fire or an emergency should be addressed within SLA and should slot into the next available deployment window.

Other thoughts:

Recovery:

  • Immediate response: Redundancy, redundancy, redundancy. For third-party issues, such as payment processors or shipment APIs, you should have an easy fallback, e.g. enable the alternative method. Other providers, disable their functionality. If it's a CDN outage or the like, address in software and deploy the fix. You get the picture...

  • Down 1 hour: You're probably hoping the storm passes - maybe it's a good idea to wait it out. If the problem is within your control, such as an integration failure, perhaps put up some helpful messaging to your customers on the site. Display a maintenance page at the least. If your servers aren't operational, consider a temporary DNS switch to an offsite cloud box to host this maintenance page. Give your customers alternative means of checking out - e.g. call center or the direct office line. Give them your info@ email and encourage them to contact you for questions. Display a contact form or discount code for when the site comes back online. You're trying to salvage potential customers at this point.

  • Down 4 hours+: Maybe you're fortunate enough to have a parallel environment. After 4 hours or so of downtime you should have switched to your backup plan by now. Hopefully you're hosting DNS somewhere and you have low TTLs.

  • Down 12 hours+: Now you're going to have to worry about how you're going to get all of the data from your backup plan folded back into your production instance. I hope your hosting provider/co-lo/cloud service has a managed backup service. I hope you tested your backups, too.


* While in jest, learning how to prioritize issues the way your clients and your employers prioritize issues will take you far in life.

Licensed under: CC-BY-SA with attribution
Not affiliated with magento.stackexchange
scroll top