Algorithm for clustering pictures based on date taken

https://stackoverflow.com/questions/618054

03-07-2019
|

Question

Anyone know of an algorithm that will group pictures into events based on the date the picture was taken. Obviously I can group by the date, but I'd like something a little more sophisticated that would(might) be able to group pictures spanning multiple days based on the frequency over a certain timespan. Consider the following groupings:

1/2/2009 15 photos
1/3/2009 20 photos
1/4/2009 13 photos
1/5/2009 19 photos
1/15/2009 5 photos

Potentially these would be grouped into two groups:

1/2/2009 -> 1/5/2009
1/15/2009

Obviously there will be some tolerance(s) that need to be established.

Is there any well established way of doing this, other then inventing my own top/down approach?

Solution

You can apply pretty much any standard clustering technique to this, it's just a matter of defining your distance function correctly. When you are making your matrix of distances between your photos you should consider a combination of physical distance between locations - if you have it - and temporal distance between their creation timestamps. Normalise them and put them on separate dimensions and you may even just be able to take a regular euclidean distance.

Best of luck.

OTHER TIPS

Just group the pictures that were taken on successive days (no days on which no pictures were taken) together.

You might try to dynamically calculate tolerance based on how many or how big (absolute or %) clusters you want to create.

To get a useful clustering of pictures according to date you require the following:

1) The number of clusters should be variable and not fixed a priori to the clustering

2) The diameter of each cluster should not exceed a specific amount.

The clustering algorithm that best satisfies both requirements is the QT (quality threshold) clustering algorithm. From Wikipedia:

QT (quality threshold) clustering (Heyer, Kruglyak, Yooseph, 1999) is an alternative method of partitioning data, invented for gene clustering. It requires more computing power than k-means, but does not require specifying the number of clusters a priori, and always returns the same result when run several times.

Although it is mainly used for gene clustering I think it would fit in very well for what you need.

Try to detect the Gaps instead of the Clusters.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow