Question

I am new to machine learning and I am struck at one thing. Please help.

I am developing an app to sort images based on user preferences. So initially I have n data points(images) with m features each. Now when user create an album and add some images in it then my algorithm should find similar images from the gallery and add it to the same album. This should work even if user creates more albums. And not all images are needed to be sorted, only the relevant one.

What algorithms should I use to achieve this?

Was it helpful?

Solution

Well, thats a question that institutes at univiersities, among others, are trying to solve since several decades and where millions of money currencies are spent in each country every year.

Short answer: for now, nothing general is known to perform good.

Long answer: it depends completely on your image domain. Do you want to recognize suns in a picture? That possible up to some false discovery rates. Do you want recognize familiar faces? Possible, see automated passport controls at airports. Emotion recognition? Good luck with that! Grouping by persons? Possible, but statues will be included. etc...

Something really general: Grouping by colors? Totally viable! But will not mean much. Something that includes some hint to the structure? Well, the easiest way is to compute color histograms and use them as feature vectors for (like suggested) nearest neighbor calculation - (watch out for normalization, color space, and other sources of information distortion)

Basic answer: take some Computer Vision courses either online (e.g. https://youtu.be/skaQfPQFSyY?list=PL7v9EfkjLswLfjcI-qia-Z-e3ntl9l6vp)

There is also an intersting TED talk to that topic https://youtu.be/40riCqvRoMs

On the other side: don't let yourself down! Understand the #basics of data processing and break the structures! Develop that algorithm we are all hoping to use some day!

OTHER TIPS

You could use K-NearestNeighbors to do this. Nearest Neighbors is what you are probably you are looking for. You can try out different distance metrics. Cheers.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top