What is wrong with this algorithmic solution of mine that checks whether a given function returns a sorted array when an array is given as input?

https://softwareengineering.stackexchange.com/questions/355170

17-01-2021
|

Question

An interviewer asked me this question:

Given a function f(ar[]) where ar[] is an array of integers, this functions claims to sort ar[] and returns it's sorted version ars[]. Determine if this function works correctly.

I approached this question as:

First check if the returned array ars[] is actually sorted in either non increasing or non decreasing order. This one is easy to check, ars[] should either follow the sequence ar[i + 1] >= ar[i] (for an array sorted in non decreasing order) or ar[i + 1] <= ar[i] (for an array sorted in non increasing order) for every i in the range [1, n], where n is the size of ars[]. The time complexity for this should be O(n).

Then check if sizes of both the input array ar[] as well as the output array ars[] are same.

Finally check if every element of ar[] is also present in ars[]. Since we have already examined at step 1 that ars[] is sorted and at step 2 that sizes of ar[] and ars[] are same we can use Binary Search algorithm to perform this action. The worst case time complexity for this should be O(n * log(n)).

If all the above 3 checks succeeds then the function is working fine else it is not. The overall time complexity of this algorithm should O(n * log(n))

But to my surprise the interviewer said that this solution is not correct and it's time complexity can be improved. I can not understand what actually is wrong with my solution, did I miss any corner case or the entire approach is wrong? Also what can be better approach to this(in terms of time complexity)?

PS: The interviewer mentioned no additional information or any additional constraint for this problem.

Solution

As already mentioned by @Dipstick, step 3 can fail if there are duplicates in the input array. To resolve this and improve the time complexity, one can use a dictionary with the array elements as keys and their number of occurrence as values. Such a dictionary can be created from the sorted and the unsorted array as well in O(n), and you need to test if the resulting dictionaries are identical, which can be done in O(n), too.

One can combine this using only one dictionary counting the total number of occurrences in the unsorted array minus the number of occurrences in the sorted array. In pseudo code (assuming a default of 0 for values in the dictionary when the key is used the first time):

 for(e in ar)
     noOfOccurence[e]+=1; 
 for(e in ars)
     noOfOccurence[e]-=1;

 for(e in noOfOccurence.Keys)
     if(noOfOccurence[e] != 0)
         return false;

 return true;

OTHER TIPS

Your method is only testing that the original and sorted arrays contain the same values, not that they contain the same number of each value; e.g. 1112 would pass for 1221

In step 3 you could, for example, mark that a particular value in the sorted array has already been matched (removing from the array would be too time consuming) but then the search would no longer be a binary search as you would hit already used values.

[Obviously this isn't a problem if the values are unique but that isn't stated]

For speed improvements you can test with known arrays with either presorted answers to check against or simply run through the returned array ra checking ra[n] =< ra[n+1] for all n in the range 0..len(ra-1) complexity of which is very low. You can determine if every element is present in each array by counting the instances of each value and then comparing the counts.

Your testing should also include corner cases such as ar=[1] and ar=[] and for an interview I would at least mention testing for invalid inputs such as arrays of non-integer values, non-arrays, etc., I know that the behaviour in such cases is undefined in the case you were given but a part of a testers job is to highlight specification omissions and ambiguities such as what is the error handling. If you only test for what is specifically in the specification you will not make a good tester and this sort of issue is one of the things that the interviewer will be looking for.

While you say the interviewer provided no other information, was it possible to ask questions? As there is only one argument to the function, I would be under the assumption that the function would always return the array sorted it one directory, most likely ascending. i.e.

fn([4,1,5]) -> [1,4,5]

But obviously that's something the interviewer should confirm for you as it affects the answer. (Also it would be very strange for the function to return asc OR desc, without any way to specify).

As such, my layman's approach would be to pass an unsorted array to the function, and then compare that against a second array I already knew was sorted -

expected = [1,4,5]
sorted = fn([4,1,5])
// compare expected and sorted...

As far as basic testing goes, the process of comparing every element against each other, and also comparing every single element to the source array does seem convoluted - compared to just comparing against a known correct list.

  > But to my surprise the interviewer said that this solution is not correct

If i were a job interviewer i would expect to get the answer: write a simple unittest for the sort method with a sample unsorted input and the expected output (see @USD Matt answer)

  > time complexity can be improved.

Your solution looks OK on the first glance and more complete/complicated than the simple example.

If i were searching for an experienced developer i would expect that the applicant knows unittests and does not more than neccessary.

I assume that the complex/complete automated testing is overkill and not necessary for testdriven development. A simple example should be enough.

If you keep track of the min and max a array of size max - min + 1 will work if the range is not large.

Subtract min from each value so it starts at 0

for(e in input)
     ar[e - min] +=1; 
for(e in sorter)
     ar[e - min] -=1;

for(e in ar)
     if(e != 0)
         return false;

return true;

You could check for duplicates as you test for sort
Yes binary would be O(n logn) but is might be faster than dictionary

Take a step back and look at this from a simplistic standpoint. You are making your first step be the most time and compute intensive test. I'd start with a series of sanity checks which catch the most glaring issues and avoid having to (re)sort the array and compare element-by-element.

Put your test to ensure the number of items in the result equals the number of items fed in because if the number of items don't match you can fail the test and not go any further.

Next test that the first element is less than the last element.

If that passes, spot check some elements in between, for example the first and last, and the items at the 1/4th mark, 1/2 mark, and 3/4th mark - and make sure first < item at 1/4th < item at 1/2 < item at 3/4 < last.

Assuming those tests pass, then I'd dig into iterating all the elements to test the integrity of the sort. But I wouldn't resort the input to compare, but simply iterate the results in order, comparing this item with the previous one and making sure this item is bigger than the previous one. Bail out with a failed test as soon as you hit one that is smaller than its predecessor.

The runtime of the test will be the same as the time it takes to sort the list but only if the sort is good. If the sort is not good, the runtime will be some degree shorter than it would've taken to re-sort and test. How much smaller depends on if the failed value is on the front end of the data or the back end of the data group.

Typically, there is something about the data being sorted, and especially where that data is coming from which can point you to the cases where sorting errors are most likely to crop up. If you have the luxury of having insight into these things, then craft sanity checks to catch those cases and put that before the visitation of every item to verify sorting.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange