Question

Preface: We want to extend the monitoring of one of our webshops as the provider had troubles with the PHP configuration and parts of the live webshop crashed (backend & checkout not working). I don't want to discuss about moving to another provider here.

As we are now thinking about possibilities to monitor the webshop itself and the availability of certain parts (like "Is the checkout working?"), the question is:

What tools and strategies do you suggest to monitor a live website?

Some ideas:

  • Do you automatically check, if the checkout is still working on a live website?
  • What can be good parameters to monitor to detect failure? Last Order < 1 day ago, last user login,...
  • Using cron jobs: Checking for example for last order date/time and if it's too long ago, send an email and/or check manually if the checkout still works?
  • Using software/tools like Icinga, Uptime Robot,...
  • Sending out warning E-Mails to Admins,...

Looking forward to your answers :)

Was it helpful?

Solution

There are a couple of things you could do automated.

  1. If parts of the shop stop working Unit tests are a nice way of detecting whether certain functionalities are still working.
  2. To test frontend I use phpQuery on a remote server to periodically look for certain DOM elements on certain key pages like 'are there still products on the category list', 'is there a footer* on the homepage', etc
  3. Set up a simple cronjob that pings your host to see if it's still available
  4. Use the native Magento order RSS feed to check if orders are still comming in. On high traffic shops no order for an hour on a friday evening is a good indicator that there's something wrong :)
  5. Monitor your Payment Service Provider. In the Netherlands we use iDeal for handling payments. This website displays their uptime, your PSP might provide a similar service

*if there's no footer on a page that could point to a PHP error halting rendering.

These are a couple of solutions that we're using. They just need some setup time and are free to run.

Great question by the way, I'm really looking forward to all the answers!

OTHER TIPS

I will dovetail onto Sander's fantastic answer the following, which assumes you've set up and use a monitoring service like Pingdom*:

  • Watch for content on the page; usually the closing </html> tag. I have seen so many before_body_end scripts fail with 3rd parties (uncaught exceptions, etc.) that are invisible to end-users but return 500 status -- very bad for SEO / Google / Webmaster Tools
  • Set up Webmaster Tools to notify you when errors are increasing above a certain threshold
  • Set up alerts for invalidated SSL on the page
  • Set up alerts for javascript errors on the page
  • Use email groups/bcc for payment failed emails, error reports.
  • Get in tight with your call center people and make sure they know how to screen shot issues - they're usually the first to point out when things are going wrong.
  • A slow site is as bad as a down site. Make sure your alerts are sensitive as to when your site is taking a longer time to load than usual.
  • Subscribe to twitter feeds for all of your key 3rd party / hosted services. Larger hosts usually have Twitter triggers for when there are issues. You can configure Twitter to email/text you when certain accounts post.

Devops:

  • Set up Nagios for monitoring critical systems and sending alerts
  • Set up a syslog or Splunk (free up to a certain # of queries/day) to aggregate logs and issue alerts based on log data
  • Configure a scripted, routine check of your network equipment. I've seen (on more than one occasion) NICs go back and drop from 1GB to 10MB unbeknownst to us.

For larger teams:

  • Set up a CI server (Travis, Jenkins/Hudson, Capistrano) to warn you of potential failing tests after commits.
  • Set up pre-commit hooks in your source control to enforce code standards or to check for blatant issues like broken code
  • Like Sander said, set up something to monitor the RSS feeds for orders and volume by time of day - a benefit here is it's uncached and typically if you set the notification threshold low enough a potential issue will trip this up immediately
  • Use Selenium. A LOT. Have scripted tests that run through the checkout process every hour or two.
  • Set up Calendar reminders and specific alerts for SSL expiration

You're going to generate a LOT of data and potentially false positives; don't become immune to alerts.


I'm not affiliated with Pingdom. I just love their (free) product.

If you only have problems with your hoster and not the payment, you can think about setting up a product, which is hidden, write a selenium-test put it in the cart add a coupon to make it free and then step through the checkout.

There are already some great answers here, depending on your setup. I use NewRelic to monitor server and transaction stats, as well as setting up key transactions for every step of the checkout process. That way, I can look at a single screen on my phone and determine if we are still getting the appropriate amount of people checking out through the whole process, and if they are getting appropriate response times. If I see a bunch of throughput on everything up to the last step, I know that PayPal is probably broken as nobody is able to process their cards. I also get alerts if there are a lot of errors, response times are off, etc.. You don't strictly need NewRelic to do this, but it is very simple and quick to set up and I didn't have time to build my own dashboard/app/alerting system.

I like NewRelic and PagerDuty for this, they are simply perfect and notifies you (email, text and call) in a minute if your site or any part of your site is down. It even notifies if your CPU or Memory goes beyond the specified percentage of use making site unresponsive.

  • Setup New Relic with all the pages you want to monitor and monitoring frequency. Example: Homepage, any 1 category page, any 1 product page, cart page, checkout page, etc.
  • Add users (who all gets notifications), schedules (day and time you prefer to receive notifications), services (New Relic alerts) and escalation policies on PagerDuty alerts and types of notifications you want (email, text, call)

https://www.pagerduty.com/docs/guides/new-relic-integration-guide/

Disclaimer: I am not affiliated with any of the above services.

MageMonitoring - https://github.com/magento-hackathon/Hackathon_MageMonitoring Great free open source tool which track server and Magento health, send emails with exceptions and system logs etc.

  • Munin on provider side to get historical values for all servers (LB, App, DB, Redis, etc) and all services (memory, load, io etc.)
  • Nagios/Icinga on on provider or local side for near live monitoring load on all servers
  • Pingdom to collect response time for "important" urls like front page, checkout etc.
  • Pingdom for real user monitoring, you get a value similar to APDEX and see the historical development
  • Pingdom to check urls and their correct content
  • Reporting with last X orders in auto reload mode. With it I can see possible breaks
  • Automated tests with Selenium on a identical stage system. I am not a friend of automated checkouts on my live system. You will get problems with your accounting later:)
  • Zapier and Twilio for Email2SMS. Critical errors are sent as SMS to a phone
  • freeboard.io and dweet.io to display everything on a nice dashboard.
Licensed under: CC-BY-SA with attribution
Not affiliated with magento.stackexchange
scroll top