Pergunta

The performance benefits of the full page cache in Magento Enterprise are fairly well known. What may not be quite so well known is that for the full benefit of this to be realized, it must be fully populated and hot, particularly on large product sets where you don't have just a few pages thus making use of organic traffic to prime it speedy enough.

Magento includes a built-in cronjob to crawl the site and warm the FPC early in the morning.

I've seen and heard of issues caused by early morning jobs taking too long to run, blocking other jobs from running, and would like to know what others use or would suggest be used to do this. A couple ideas I have are:

  • Put together a shell script to crawl every page in the generated sitemap file.
  • Use a separate crontab entry and a short PHP script to bootstrap Magento and execute the crawler process directly.

Any thoughts and/or experience on this is welcome!

Foi útil?

Solução

You could use siege in combination with the sitemap.xml file, like MageSpeedTest does.

#categories
curl http://yourmagentostore.com/sitemap.xml | sed 's/\<url\>/\<url\>\n/g' | grep 0.5 | sed 's/.*loc>\(.*\)<\/loc.*/\1/g' > urls.txt
#products
curl http://yourmagentostore.com/sitemap.xml | sed 's/\<url\>/\<url\>\n/g' | grep 1.0 | sed 's/.*loc>\(.*\)<\/loc.*/\1/g' >> urls.txt

Then run

siege -i -c 1 -t 7200s -f urls.txt

Content sourced from here.

Outras dicas

We just don't - at all. Ever. We'll say this over and over again but

Caching != Performance

Your site needs to be fast without the addition of FPC (or Varnish for that fact). There is always going to be a time when the content isn't primed (your scenario above).

On an unloaded store, page load times with FPC shouldn't be that much more impressive than non-FPC; Magento is quite happily capable of < 400ms page load times on standard caches (on category/product/search pages). FPC will bring that down to < 80ms - but comes with caveats.

  1. Stock/price information is out of date until invalidation or TTL expiry
  2. New items/more relevant search is out of date until invalidation or TTL expiry

    etc.

Why reliance on FPC (or Varnish) is A Bad Idea

If you're looking to continually ensure caches are primed manually, there's likely a few reasons

  1. You don't have enough natural footfall to keep the caches primed (see 'Where FPC is useful')
  2. Your site is too slow without them

You can't cache everything

If you take a store with just 5 categories, nested 2 levels deep, 5 filterable attributes, 5 attribute options each and 1000 products; that is a lot of possible combinations.

25 options to choose from, picking one up to 5 times in a row - I'm no statistician, but I'm aware that is ... (assuming the number of attribute options doesn't decrease completely)

25 possible URLs on the first selection
20 possible URLs on the second selection
15 possible URLs on the third selection
10 possible URLs on the fourth selection
5  possible URLs on the fifth selection

5^5 = 3,125 possible combinations (for top level categories)
5^4 = 625 possible combinations (for 2nd level categories)

Ok, the above is not a likely scenario, as I would imagine, within 3 clicks - the number of available products would have decreased sufficiently for the customer to find their product. So even if it were ...

25 possible URLs on the first selection
10 possible URLs on the second selection
3 possible URLs on the third selection

5^3 = 125 possible URL combinations 

Then times that by 5 categories, that is 625 URLs. At this stage, we're talking about a tiny catalogue, and completely ignoring all the product URLs.

We're also not factoring in that if you had nested categories with is_anchor on, its going to exponentially increase.

So to crawl that volume of pages - you've either got to hope that your page load times are nice and low to begin with, so that it is a quick lightweight process (thus defeating the purpose of the crawl) - or that you have enough time for it to complete before the TTL expires.

If your pages had a page load time of 0.4s and you had a 8 core CPU - then ...

625 * 0.4 = 250 / 8 = 31 seconds

0.5 minutes, not bad - but lets imagine you had 2s page load times

625 * 2 = 1250 / 8 = 156 seconds

But if you took the maximum possible scenario

3,750 * 2 = 7,500 / 8 = 937 seconds ~ 15 minutes

So that's your production server, under 100% CPU load for 15 minutes. You would reduce the crawl speed proportionately to the TTL that you want.

So if you want the content to have a 3600s TTL, the crawl could be 4 times slower - ie. only 25% CPU dedicated to the crawl. That's a lot of resource just to keep category content primed - we haven't even factored in products, search terms or additional store views at this stage

In fact, just looking at the sheer size of combinations in the catalog_url_rewrites table (which isn't even factoring in parameters from the layered navigation) will give an idea as to how many URLs you could end needing to crawl.

Every store will certainly be different, but what I'm trying to strike home is that crawling the site to prime FPC isn't practical. Just ensure your store is fast to begin with.

Where FPC is useful

Where the benefits of FPC come into play is on a heavily loaded store - where you have genuinely high levels of traffic and the caches are naturally and continually primed by sheer foot-fall alone.

FPC then comes into play by reducing infrastructure overheads on commonly requested content - cutting down on those repeated calls to the Magento backend.

So we've found that FPC is great to deploy when you've got very high traffic levels - not to reduce page load time - but to reduce resource usage.

Who cares, I still want to crawl

Well, then you've got two options

  1. Crawl from a template (Eg. sitemap)
  2. Extract links page by page and crawl each

And there are many utilities to do both of these, these are some I know of

  1. mage-perftest
  2. HTTrack
  3. Nutch
  4. Sphider
  5. Crawler4j

Using Mage-Perftest

You can crawl your store with Mage-Perftest pretty easily, first download it

wget http://sys.sonassi.com/mage-perftest          (64bit) OR
wget http://sys.sonassi.com/mage-perftest-i386     (32bit)
chmod +x http://sys.sonassi.com/mage-perftest*

Then define the crawl process using the Magento sitemap (you can customise this by making a sitemap of any URLs, provided the urls are wrapped in <loc></loc> tags). The following command will read all the URLs from the sitemap file, then crawl (PHP only) the URLs over the course of 1440 minutes (1 day). If the server exceeds 20% CPU or a load average of 2 - the crawl will pause temporarily.

./mage-perftest -u www.example.com -s www.example.com/sitemap.xml -r auto -b -d 1440 -z -a 20 -l 2  

If you have 1000 URLs, crawled over 1 day, that will be approx. 1 request every 86 second(s) ~ target of 0.011 RPS

I'll save my full rant for a blog post one these days, but in the meantime have a peak at my little cache warmer wfpc.

Testing performance

You can test the performance of your Magento site

./wfpc -t http://mymagentosite.com/sitemap.xml

Finished testing your Magento site performance
Total download time (in seconds)   : 5.0269110202789
Total download time (formatted)    : 0:0:5.026
Average page time (in milliseconds): 502.69110202789

FPC Warming

And you can warm the FPC, which will hit every URL in sitemap.xml.

./wfpc -w http://mymagentosite.com/sitemap.xml

You can also put a delay between requests if you like, here's a 1 second delay between requests.

./wfpc -w -d=1 http://mymagentosite.com/sitemap.xml

The test mode only hits 10 URLs randomly, so once you've warmed your FPC, you can run the test mode to find out how much of a difference the FPC makes!

Thoughts

Personally, I think a warmer makes sense... On a small site with about 40 pages, download time is cut roughly in half by the FPC. On a large site with nearly 40,000 products using Lesti_FPC with APCu as the backend, I'm using a little over 200MB for the cache, which frankly is nothing on the 8GB production server.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a magento.stackexchange
scroll top