Question

First, a little background of the setup.

We are using Magento EE 1.14.2.4 and have cache/sessions stored in a Redis cluster (a separate instance for cache, FPC, and session data).

The client has an unusual set up in that they have 1000+ customer groups. With 10000 products and 2 websites, this means the price index tables are very large (around 30,000,000 rows). Products are updated on a daily basis and quite often a full price reindex is needed due to new customer groups being added.

The full price index process is very slow and can take hours to finish. To improve this, we've overriden the full reindex process so that it adds the products IDs to the index changelog - the changelog index does the products in batches rather than a single query and this is more efficient.

However, even though this works and the indexing completes, at the end of the process when it sets the changelog as being valid, we're getting a Redis timeout errors.

exception 'CredisException' with message 'Read operation timed out.' in /var/www/vhosts/httpdocs/app/code/local/Credis/Client.php:1150

This error comes from the _setChangelogValid() method in Enterprise_Index_Model_Action_Abstract. I think it's trying to get the DDL cache for the enterpise_mview_metadata table. When this exception happens, the index doesn't get set as being 'valid' and it then kicks off another full reindex next time the cron runs.

I've had a look at the Redis slowlog when these errors happen. This is a sample from it:

1) (integer) 683
2) (integer) 1470228203
3) (integer) 14124
4)  1) "del"
    2) "zc:k:5f7_E810651E43193868704A20D1661FC9F0C9E04B2B"
    3) "zc:k:5f7_PRICE_SEPARATORS_Q_5424_STORE_1_CAT_2_CUSTGROUP_1_CC5A2A9EBB46E536E66809075B030909"
    4) "zc:k:5f7_C41AB18297BCE4D094322846EEAC4B04D84F52CE"
    5) "zc:k:5f7_D41A73404C90FF47F4582EFF8E79A3B1BCC62FE5"
    6) "zc:k:5f7_PRICE_SEPARATORS_Q_14564_STORE_1_CAT_2_CUSTGROUP_0_A748D6B6B50C8241FA189821D3E9E809"
    7) "zc:k:5f7_PRICE_SEPARATORS_Q_3204_STORE_1_CAT_2_CUSTGROUP_1_0EA47949F002C6572566B935024CCFA8"
    8) "zc:k:5f7_BA5630C3264FCB5185FF55D28C447997E0B2AFA1"
    9) "zc:k:5f7_BCAED8A49F173618DA1D52926E21A8F56D336085"
   10) "zc:k:5f7_PRICE_SEPARATORS_STORE_1_CAT_953_CUSTGROUP_0_9162FF9FDE1EB95454CE7AA8100BB679"
   11) "zc:k:5f7_C352846B3B16BB14222BF5C0E8E12846370F42EE"
   12) "zc:k:5f7_PRICE_SEPARATORS_STORE_1_CAT_1217_CUSTGROUP_0_183570F8E8687FE08D93962E168D420B"
   13) "zc:k:5f7_PRICE_SEPARATORS_Q_3345_STORE_1_CAT_2_CUSTGROUP_1_554230DC4BB7D2E3E3812C0AF50B8163"
   14) "zc:k:5f7_LAYOUT_1B6A182FB16EB3A80E9A1F610E935FA58"
   15) "zc:k:5f7_PRICE_SEPARATORS_Q_3073_STORE_1_CAT_2_CUSTGROUP_18914_B8E030F635CC94A96EE095BC81BDAA79"
   16) "zc:k:5f7_133897D71A873E8FB44597F784247EB5"
   17) "zc:k:5f7_F9135A77222E5FAC6598D1E4A2CA6D346210555E"
   18) "zc:k:5f7_263C0E9D984EC3EF91736B2059F25F4D"
   19) "zc:k:5f7_LAYOUT_1FA90BDDCF6BFCA4DC8B7799373029071"
   20) "zc:k:5f7_PRICE_SEPARATORS_Q_4410_STORE_1_CAT_2_CUSTGROUP_18465_414071F2A286B8DFF05266A5787E093F"
   21) "zc:k:5f7_PRICE_SEPARATORS_Q_4479_STORE_1_CAT_2_CUSTGROUP_18914_D92E70C4B98F1429E4D44049F08B6547"
   22) "zc:k:5f7_115877AC1325BD3D713D0219101B6EB4"
   23) "zc:k:5f7_47472E3E209AF7333E82D1AE624082BB1374DBDE"
   24) "zc:k:5f7_C10D6FD04F622ED5234593318108837FE191AE99"
   25) "zc:k:5f7_07B7F0BF4E453BDA42784D4352100284E345D570"
   26) "zc:k:5f7_PRICE_SEPARATORS_Q_9513_STORE_1_CAT_2_CUSTGROUP_1_1924529607620139F454122EE8895479"
   27) "zc:k:5f7_PRICE_SEPARATORS_STORE_1_CAT_764_CUSTGROUP_0_FA5B0A7EE2364113D5E52430D36A0D20"
   28) "zc:k:5f7_LAYOUT_164D248F2CA0F7FFBD90E26AA1F7B83C9"
   29) "zc:k:5f7_PRICE_SEPARATORS_Q_6848_STORE_1_CAT_2_CUSTGROUP_0_627973B284C580088B904AA537BCEACC"
   30) "zc:k:5f7_LAYOUT_12FFF138110E6CA61C1CDD4BF30B208FB"
   31) "zc:k:5f7_D26148610550DEF8390635D2E9C9CCEC9362EEE5"
   32) "... (4834 more arguments)"

So, what I think is happening is that Redis is taking too long to process that enormous delete query and the connection times out.

Does anybody have any suggestions on how to solve this? Is it a case of catching the exception and retrying until it works? Has anybody else encountered this issue?

Thanks!

Was it helpful?

Solution

CAUSE:

The line in question returning the exception in CRedis:

NOTE the timeout causing because of the read meta data of the redis process stream. More than likely you've hit timeout & or memory limits possibly.

More details on stream_get_meta_data:


FIRST STEPS:

First make sure you are using the latest versions of Collin's modules for Redis with Magento.

One thing to note in Cm_RedisSession is:

A process may lose it's lock if another process breaks it, in which case the session will not be written.

Also setup a garbage collection cron script noted here:

Be sure to review local.xml configurations, and Redis's configurations, particularly timeouts and max limits. Compare with any log data and known hardware limits of the server.

More than likely with such a large set of Customer Groups and 2 websites, the amount of required data to be indexed is going to be quite large as you mentioned. Most likely the indexing process is timing out or hitting a max limit causing the indexing to fail, and with such a long process occurring in Redis for the current session a read timeout occurs.

Ultimately the best solution is to leverage Asynchronous indexing, Class path caching, disable logging tables, and disable Core Magento modules not required. With these in place monitoring the processes closely with the other tools mentioned below to help pinpoint the location of the limits being hit.


PERFORMANCE:

Disable logging and any unused core modules you don't use:

Some modules to consider when needing more insight into the indexing/caching mechanisms:

Improve Index management in Magento. Enables Magento to process indexes asynchronously. I would highly recommend giving this a try as it will chunk up the data needing to be indexed and spawned across separate PHP processes. By Default Magento only uses one CPU Core when indexing. On a large multi-core system this helps tremendously with indexing performance.

Another free option available on GitHub that looks to be more up to date:

There is a paid for solution as well, as Aoe_Index hasn't been updated in over 4 years, and claims to be EE 1.14+ compatible:

With as many files/folders that Magento has, this helps the Autoloader find the exact files its looking for instead of fseek's on your servers file system to locate the files. Not a direct solution to your problem but will greatly speed up Magento's performance.


ANALYSIS:

Utility for managing Redis services directly in Magento's admin:

This module adds some statistics to System -> Index Management for the changelog based indexers introduced in Magento Enterprise Edition 1.13+

Recommended Profiler over Magento's built-in one:


FURTHER READING:

Licensed under: CC-BY-SA with attribution
Not affiliated with magento.stackexchange
scroll top