Question

This question has been somewhat unanswered for me for a while now, and I'd like to get a definitive answer from somebody about how it works.

Scenario:

Running Magento 1.14.2.1, I have set all my indexers to run on schedule, and I run the Magento cron which runs enterprise_refresh_index whenever anything needs to be partially (or fully?) indexed.

Before the upgrade to this version, I was running 1.13.1 and was not using the cron. I had indexers set to be on save, and whenever I needed to force any of the indexers to run I'd do it either via shell/indexer.php or using the following code example:

// Example of indexers passed in:
$indexers = ['catalog_product_flat', 'catalog_category_flat'];

foreach ($indexers as $code) {
    $process = Mage::getSingleton('index/indexer')
                   ->getProcessByCode($code)
                   ->reindexEverything();
}

Previously...

This worked fine previously in terms of having the indexers running immediately when I want them to, and having the various changes required propagate relatively quickly to the frontend, etc.

Obviously we ran into deadlock/table lock issues and indexers conflicting with each-other when we had multiple admin users saving products etc at the same time, hence our move to the new "best practice" of using the Magento cron to handle it all on schedule.

My questions:

  1. How do you replace the old style of forcing a product flat reindex (php shell/indexer.php --reindex catalog_product_flat) with the equivalent now through the scheduled cron job? All I can seem to see is that you can run enterprise_refresh_index cron job, but that handles all of the indexers right - no ability to single out certain parts?
  2. Is it even relevant to need to do this any more? By this I mean that by doing that previous command, would Magento still have only indexed what it needed to within the catalog product flat realm, and by running the scheduled task it does the same thing but combines all indexers into one? If this is the case, that's fine - it's just quite confusing to understand.

Background:

I have asked this question with Magento EE support. They came back to say that it falls outside the realm of the support agreements, but still got an answer from one of the Magento devs and replied to me saying that "changes to the database are detected with triggers and have changelog records created." I understand that from here, the scheduled indexer decides what needs to be done based on those changelog records.

I previously have kind of assumed that if I need to reindex the product flat tables (all products, everything), I'd do it from the command line as the previous example. How do I achieve the same thing now if it's going to only reindex the ones it knows should be done via changelogs? Is there a way to reindex all of the product flat data without catching all the other indexers with it via enterprise_refresh_index?

I don't think its relevant, but we use the AOE_Scheduler module to give better insight into what the cron is doing.


I know it's long. Please let me know if I can summarize a little better. I've had no luck in getting a definitive answer, and some admin users are not convinced that the Magento cron is reindexing everything effectively everytime. When this kind of statement is made, I'd normally reindex all product flat data from CLI, but I can't do that anymore because it causes conflicts with the scheduled indexer and all hell breaks loose.

Was it helpful?

Solution

public function refreshIndex(Mage_Cron_Model_Schedule $schedule)
{
    /** @var $helper Enterprise_Index_Helper_Data */
    $helper = Mage::helper('enterprise_index');

    /** @var $lock Enterprise_Index_Model_Lock */
    $lock   = Enterprise_Index_Model_Lock::getInstance();

    if ($lock->setLock(self::REINDEX_FULL_LOCK)) {

        /**
         * Workaround for fatals and memory crashes: Invalidating indexers that are in progress
         * Successful lock setting is considered that no other full reindex processes are running
         */
        $this->_invalidateInProgressIndexers();

        $client = Mage::getModel('enterprise_mview/client');
        try {

            //full re-index
            $inactiveIndexes = $this->_getInactiveIndexersByPriority();
            $rebuiltIndexes = array();
            foreach ($inactiveIndexes as $inactiveIndexer) {
                $tableName  = (string)$inactiveIndexer->index_table;
                $actionName = (string)$inactiveIndexer->action_model->all;
                $client->init($tableName);
                if ($actionName) {
                    $client->execute($actionName);
                    $rebuiltIndexes[] = $tableName;
                }
            }

            //re-index by changelog
            $indexers = $helper->getIndexers(true);
            foreach ($indexers as $indexerName => $indexerData) {
                $indexTable = (string)$indexerData->index_table;
                $actionName = (string)$indexerData->action_model->changelog;
                $client->init($indexTable);
                if (isset($actionName) && !in_array($indexTable, $rebuiltIndexes)) {
                    $client->execute($actionName);
                }
            }

        } catch (Exception $e) {
            $lock->releaseLock(self::REINDEX_FULL_LOCK);
            throw $e;
        }

        $lock->releaseLock(self::REINDEX_FULL_LOCK);
    }

    return $this;
}

This runs "always" on every cron execution. It runs full re-indexes for the indexes that are needed and processes the changelog for those that don't.

Also, you can still run the CLI shell scripts, but they won't consider the partial changelog and will blindly do a full reindex.

If you are getting dead locks, you may want to set all of your indexers to Manual and setup alternative processes to rebuild the index at off-peak time hours (when admins are away). Only Product Prices and Stock Status should be set to Update on Save.

Also keep in mind that with partial re-indexing status in index_process table is not used anymore but calculated from enterprise_mview_metadata.

Disabling some internal modules like Mage_Rss can also help impact the frequency of invalidation of the indexes.

Further reading:

OTHER TIPS

As an aside, I recently came across the AOE_EeIndexerStats module from AOEpeople on GitHub - from a quick glance it looks like it will allow you to control certain parts of the indexers or whole indexers, which are all covered under enterprise_refresh_index. This would potentially be an equivalent to the previous process, where you could target the catalog_product_flat index on its own.

It should also be accessible in code as such:

$client = Mage::getModel('enterprise_mview/client'); /* @var $client Enterprise_Mview_Model_Client */
$client->initByTableName($tablename);
$metadata = $client->getMetadata();
$metadata->setInvalidStatus();
$metadata->save();
Licensed under: CC-BY-SA with attribution
Not affiliated with magento.stackexchange
scroll top