Question

Given an Array A of N non negative integers, find an integer k, such that the sum of difference of each element with k is minimum.
i.e. Summation(abs(A[i] - k)), 1 <= i <= N, or simply
|A[1] - k| + |A[2] - k| + |A[3] - k| + ... + |A[N] - k|

My Approach:

low = minimumValueInArray(A);
high= maximumValueInArray(A);
ans = Infinite;
for (k = low; k <= high; k++) {
    temp = 0;
    for (i = 1; i <= N; i++) {
        temp += abs(A[i] - K);
    }
    ans = min(ans, temp);
}
return ans;

This is simply brute force approach trying to solve for all values of K.Can we optimise it.
What is the smarter way of doing this?

Reference: This was my logic behind this Codejam Round 1B problem

Was it helpful?

Solution

I get essentially the median. To minimize the cost function, take the derivative.

enter image description here

To find the minimum of the cost function, find where the derivative is zero. The cost function is not continuously differentiable everywhere, but it is piecewise differentiable. Let S = the sorted numbers of A and si be the ith largest number.

Case 1. If m is odd

There will be floor(m/2) ai's less than the median and floor(m/2) ai's greater than the median. Selecting x = the median, gives -m/2 + 0 + m/2 = 0.

enter image description here

Case 2. If m is even

The derivative will be zero for values in between si and s(i+1); i = m/2. Then one can select any number k such that

enter image description here

Simple example 1 2 4 50

picking 2: 1 + 0 + 2 + 48 = 51

picking 4: 3 + 2 + 0 + 46 = 51

picking 3: 2 + 1 + 1 + 47 = 51

So arbitrarily pick s(m/2)

For a better algorithm, O(NlogN) you can sort the numbers and then pick (m+1)/2 or m/2 as appropriate from above or you can use the kth largest element which can compute the answer in O(N).

OTHER TIPS

Given a set of numbers A_1,...,A_N, it is known (see e.g. http://homepages.gac.edu/~holte/courses/mcs256/problems/median.pdf ) that the median minimizes the sum of absolute deviation.

Since you have an array of integers, either:

(a) N is odd and the median is an integer m, which you set k as.

(b) N is even, and the median is the average of two integers, a and b. In this case it turns out that all numbers between a and b minimize the sum of the absolute deviation. So you can pick k as a, b or any integer in-between (if there is one).

For example, if the numbers are 1,3,4,6,9 - the median is 4 which you set k as. If the numbers are 4,7,12,15 the median is (7+12)/2 and you can set k as any number from 7 to 12. For example k=7 gives total deviation (7-4)+(7-7)+(12-7)+(15-7)=16 and k=12 gives total deviation (12-4)+(12-7)+(12-12)+(15-12) = 16.

You can represent all the values in the array in the form of points lying on the number line.

For example the points lying on the number line can look like:

_____________________4___________7_______________________12___________15

So according to the problem definition, we can say that the point on the number line that is equidistant from all the points should give us the correct answer.

So what you have to do is find the average of all the numbers. Return the answer as the integer that is more closer to the avg. eg For 2.66 return 3, for 2.2 return 2, for 2.5 return any of 2 or 3.

Running Time: O(N) to calculate the average of the numbers.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top