Question

I have an external file containing product details, I'm looping through this file and for each product in said file I have to check if the product already exists in magento.

The value to evaluate if the product already exists is an attributename and it's value.

So I check this by using (per product)

$collection = Mage::getModel('catalog/product')
                ->getCollection()
                ->addAttributeToSelect('sku')
                ->addAttributeToSelect("$supplier_name")
                ->addAttributeToSelect('production_nr')
                ->addAttributeToSelect('sku_supplier')
                ->addAttributeToFilter($supplier_name,array('eq' => $suppliers_product_nr));

This selection itself doesn't seem to take much time:

            echo 'check: ',round(microtime(true) - $time, 2) , "s<br/>\n";

reports 0.00 s,

however to check if it's an empty collection (ie the product exists in my magento database takes about 0.34-0.40 s

$collection->getSize()

Considering I've got several hundred of thousands of products to check. This will add up quickly. Very quickly. I was hoping for something more akin to 0.01 or lower in time.

So I'm looking for the fastest way to check if a product exists. I have quite a bit of freedom in how to change and implement the code so if there's an completely different way of appreaching this problem I'd like to hear about it.


Update:

I've changed it slightly so that instead of checking with magento if a product exists product by product, I instead get an array of all products that have the attribute to check against. I use this array to check if the attribute to check against exists.

This is way faster, but I fear for the overhead impact (primarily ram or cpu) this will have once the amount of products returned becomes too great (we've got about 40.000 products in our magento installation)

Was it helpful?

Solution

The best call would be getAllIds() in my knowledge.

Explanation

$collection = Mage::getModel('catalog/product')
                ->getCollection()
                ->addAttributeToSelect('sku')
                ->addAttributeToSelect("$supplier_name")
                ->addAttributeToSelect('production_nr')
                ->addAttributeToSelect('sku_supplier')
                ->addAttributeToFilter($supplier_name,array('eq' => $suppliers_product_nr));

This one won't take any time since it is just preparation of a collection query. This query will be actually run only when the collection is loaded through load() call or through foreach() loop or through count() call etc.

Here you can check the collection count status in many ways. I am listing out them in their best performance basis (performance decreases while go down).

  • $collection->getAllIds() - Best Option
  • $collection->getSize()
  • count($collection)
  • $collection->count() - Least option

Why This Order ?

count() defines in Varien_Data_Collection and it loads the collection first and then take the count by counting collection items. Since it involves collection load, it will take highest time.

count($collection) do not have much difference with above one.

getSize() defines in Varien_Data_Collection_Db and it avoids collection load. This is the big advantage over here. This will give you a good performance when use it for checking. See answer from Marius for more details.

getAllIds() is the best choice. This method is available in Mage_Eav_Model_Entity_Collection_Abstract and it has a very subtle difference with getSize() in definition and which makes this best choice.

getAllIds() internally calls getAllIdsSql() and see the definition here:

public function getAllIdsSql()
{
    $idsSelect = clone $this->getSelect();
    $idsSelect->reset(Zend_Db_Select::ORDER);
    $idsSelect->reset(Zend_Db_Select::LIMIT_COUNT);
    $idsSelect->reset(Zend_Db_Select::LIMIT_OFFSET);
    $idsSelect->reset(Zend_Db_Select::COLUMNS);
    $idsSelect->reset(Zend_Db_Select::GROUP);
    $idsSelect->columns('e.'.$this->getEntity()->getIdFieldName());

    return $idsSelect;
}

Here $idsSelect->columns('e.'.$this->getEntity()->getIdFieldName()); is what make this method finest in the all available methods. In fact, this line is absent in getSize().

OTHER TIPS

The fastes way for a single call is to use getIdBySku from catalog/product:

if (Mage::getSingleton('catalog/product')->getIdBySku('yoursku')) {
  // Product exists
} else {
  // Product does not exist
}

Mind the getSingleton instead of getModel for this usage!

Alternative:

If you need to read them all in once you can use getAllIds() from a collection.

As mentioned below by Rajeev K Tomy the speed of query using getAllIds() method seems fastest in magento. But you can get needed attribute values also (such as 'production_nr' or 'sku_supplier') without loading collection. This method slightly more slowly then getAllIds().

//addAttributeToSelect use only for EAV attributes
$collection = Mage::getModel('catalog/product')
    ->getCollection()
    ->addAttributeToSelect('meta_title', true) 
    ->addAttributeToSelect('meta_description', true)
    ->addAttributeToFilter(
        'name',
        array('like' => '%a%')
    );
//columns() use only for static attributes (product entity fields)
$collection->getSelect()->reset('columns')->columns(['entity_id', 'sku']);

/** @var Varien_Db_Select $select */
$select = $collection->getSelect();

/** @var Mage_Core_Model_Resource $res */
$res = Mage::getSingleton('core/resource');

/** @var Magento_Db_Adapter_Pdo_Mysql $conn */
$conn = $res->getConnection('core_write');
$res = $conn->fetchAll($select, Zend_Db::FETCH_ASSOC);

P.S. Do not add addAttributeToSelect('name') to collection if you use 'name' attribute for filtering method.
P.S.2 See my question/answer about fastest getting raw product data without loading collection here

Licensed under: CC-BY-SA with attribution
Not affiliated with magento.stackexchange
scroll top