How to get the total count of every product attribute/filter like newegg

https://stackoverflow.com/questions/4222635

26-09-2019
|

Question

If you go to newegg.com (just one example) you'll notice while browsing products you can see the number of items next to each product attribute in the left hand sidebar.

With so many attributes on some items and so many different configurations of product filters how do they calculate all of those totals so fast?

Solution

For newegg.com, they are using a faceted navigation technology provided by endeca

In nutshell, endeca will actually use the data provided in xml/csv or directly retrieve data from any database (not limited to just mysql) and calculate similarity and group the result into their own format

Endeca is not free, the open-source alternative such as sphinx or lucene solr

OTHER TIPS

Newegg uses Endeca, and they were probably one of Endeca's earlier customers. In retrospect, Endeca might have been a big contributor to their success. Faceted navigation works very well on complex electronics like computer parts.

There are a few things to consider in faceted navigation:

1) Do you want just faceted navigation on category-driven queries, or do you also want it to work on search? In fact, categories are a hierarchical facet of sorts.

2) Does the de-normalized inverted index model of Solr cause you problems?

If the answer to 1) is true -- it probably is -- you'll need some inverted indices. Inverted indices are pretty much the only way to do keyword search. They will also do faceting with some caveats.

Essentially you can consider each facet as an inverted index (in fact keyword search might be considered a special facet with ranking functions). Then to do counts you'd have to intersect/and the current query and filters with all other facet values. However, this model can lead to problems if you need to represent sparse product sets (see 2).

If the answer to 2) is true, it might help more to think about facets more in terms of OLAP. I don't know if inverted indices can handle complex relationships without some abstractions.

It's fair to consider and implement faceted search/nav as a blend of fulltext (typically implemented as an inverted index) and/or OLAP.

I'm pretty sure you can pull off faceting with a column store, but you'd still need to have an inverted index at your disposal to merge with if you want keyword search.

@Dan Grossman:

It might seem so, BUT --

Did you think for a moment how many combinations there are of facets? You can't cache so many pages like that. There are probably more combinations on Newegg.com than stars in your sky.

Add in multiple selection and it's even worse. Game over.

You can only cache some cases like unfiltered and commonly filtered. If you try to spider Newegg.com without limiting levels of recursion, you'll kill the spider. Faceted sites cause problems for search engines in general for this very reason. See http://www.searchmarketingstandard.com/facets-navigational-seo-powerhouse-part

You do not know that they calculate them fast. You only know that they render them fast. They could spend hours calculating those totals and rendering their pages, cache the results and serve those static files until some time when they want to refresh the data.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow