Question

I currently have a problem, where I want to compare two 1D distributions. I my case I am trying to find how much I have to move one of the distribution to turn it into another, so I would use the earth movers distance for comparison. However I am also interested in the actual direction which the distribution is moved to be turned into another.

Is there a directed version of the earth movers distance in 1D?

To be more precise I have two distribution f : {1...N} -> |R and g : {1...N} -> |R I have to turn into each other. Now I want to also multiply the amount of dirt moved from bins in f to bins in g, but I want to account for the diretion. I.e. if "dirt" is moved from bin x to bin y I want to multiply the amount I have moved by x-y instead of d(x,y) as in the standard earth movers distance. Then I want to find the the movement for which the total amount of moved earth is minimised.

Is there a known algorithm which I can use for this? I think I should be able to modify the original Hungarian algorithm for this, but I am not sure yet how to do this, since I have never used this algorithm before.

Was it helpful?

Solution

So if I understand your question correctly then I think the code below (in Go) would solve in. If you can assume that the integral over each distribution is equal (meaning that it is possible to transform one into the other), then just move through your distribution, left to right, and at each point figure out how much dirt needs to be moved there from the right, or how much excess dirt needs to be moved from there to the left. In all cases you can assume that any dirt you need you will be able to get, so it is ok to have 'negative' dirt temporarily.

// Finds out how to move dirt to transform b into a.
// assumes that the integrals over a and b are equal
func earthMover(a, b []int) []int {
  r := make([]int, len(a))
  for i := 0; i < len(a); i++ {
    // if diff is positive it means that there is too much dirt here and it
    // must be moved to the right.  if it is negative it means some dirt
    // needs to be pulled in from the right.  In either case dirt can be
    // moved over multiple spaces.
    diff := b[i] - a[i]
    r[i] = diff
    b[i] -= diff
    if i < len(a) - 1 {
      b[i+1] += diff
    }
  }
  return r
}

Here is an example. Negative numbers mean that dirt is pulled in from the right, positive numbers mean it is pushed from the left to the right. I think for your metric you would just sum up all of the numbers in the move array, so the solution for this case is 5.

target: [ 3  0  1  0  2  6]
source: [ 1  1  5  0  1  4]
  move: [-2 -1  3  3  2  0]

Now that I see that, if that's actually what you want, then you are really looking for the differences in the weigheted averages, or center of mass (dirt), of the distributions. For example:

            0  1  2  3  4  5
  target: [ 3  0  1  0  2  6]
weighted:   0 +0 +2 +0 +8+30 = 40

  source: [ 1  1  5  0  1  4]
weighted:   0 +1+10 +0 +4+20 = 35

As you can see, the center of mass of the target minus the center of the mass of the source is the number we got earlier, 40 - 35 = 5.

OTHER TIPS

Your "distance" is mean(f) - mean(g). To minimize the total amount of earth moved without regard to how far it is moved, a greedy algorithm achieves the optimum, which is the statistical distance between the distributions.

https://github.com/wihoho/VideoRecognition

  • Adapt the author's C implementation with python module through a file interface
  • The modified C codes are under the folder EarthMoverDistance SourceCode
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top