How does the image recognition work in Google Shopper?

https://stackoverflow.com/questions/3726866

03-10-2019
|

Question

I am amazed at how well (and fast) this software works. I hovered my phone's camera over a small area of a book cover in dim light and it only took a couple of seconds for Google Shopper to identify it. It's almost magical. Does anyone know how it works?

Solution

I have no idea how Google Shopper actually works. But it could work like this:

Take your image and convert to edges (using an edge filter, preserving color information).
Find points where edges intersect and make a list of them (including colors and perhaps angles of intersecting edges).
Convert to a rotation-independent metric by selecting pairs of high-contrast points and measuring distance between them. Now the book cover is represented as a bunch of numbers: (edgecolor1a,edgecolor1b,edgecolor2a,edgecolor2b,distance).
Pick pairs of the most notable distance values and ratio the distances.
Send this data as a query string to Google, where it finds the most similar vector (possibly with direct nearest-neighbor computation, or perhaps with an appropriately trained classifier--probably a support vector machine).

Google Shopper could also send the entire picture, at which point Google could use considerably more powerful processors to crunch on the image processing data, which means it could use more sophisticated preprocessing (I've chosen the steps above to be so easy as to be doable on smartphones).

Anyway, the general steps are very likely to be (1) extract scale and rotation-invariant features, (2) match that feature vector to a library of pre-computed features.

OTHER TIPS

In any case, the Pattern Recognition/Machine Learning methods often are based on:

Extract features from the image that can be described as numbers. For instance, using edges (as Rex Kerr explained before), color, texture, etc. A set of numbers that describes or represents an image is called "feature vector" or sometimes "descriptor". After extracting the "feature vector" of an image it is possible to compare images using a distance or (dis)similarity function.
Extract text from the image. There are several method to do it, often based on OCR (optical character recognition)
Perform a search on a database using the features and the text in order to find the closest related product.

It is also likely that the image is also cuted into subimages, since the algorithm often finds a specific logo on the image.

In my opinion, the image features are send to different pattern classifiers (algorithms that are able to predict a "class" using as input a feature vector), in order to recognize logos and, afterwards, the product itself.

Using this approach, it can be: local, remote or mixed. If local, all processing is carried out on the device, and just the "feature vector" and "text" are sent to a server where the database is. If remote, the whole image goes to the server. If mixed (I think this is the most probable one), partially executed locally and partially at the server.

Another interesting software is the Google Googles, that uses CBIR (content-based image retrieval) in order to search for other images that are related to the picture taken by the smartphone. It is related to the problem that is addressed by Shopper.

Pattern Recognition.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow