Domanda

Here is the code for merge sort.. The code works very well and sorts the numbers. But if we go larger data that has to be sorted, something goes wrong. The data to be sorted contains numbers which are not repeated. But after sorting , there are certain numbers which gets repeated many times. I dont understand why is that.. And this happens when i give a dataset of 100000 numbers. when smaller data is to be sorted, the code works very well.

def mergeSort(alist):
    if len(alist)>1:
        mid = len(alist)/2
        lefthalf = alist[:mid]
        righthalf = alist[mid:]

        mergeSort(lefthalf)
        mergeSort(righthalf)

        i=0
        j=0
        k=0
        while i<len(lefthalf) and j<len(righthalf):
            if lefthalf[i]<righthalf[j]:
                alist[k]=lefthalf[i]
                i=i+1
            else:
                alist[k]=righthalf[j]
                j=j+1
            k=k+1

        while i<len(lefthalf):
            alist[k]=lefthalf[i]
            i=i+1
            k=k+1

        while j<len(righthalf):
            alist[k]=righthalf[j]
            j=j+1
            k=k+1


print("select the file: ")
file_name =  tkFileDialog.askopenfile(mode='r', title='Select word list file')
inv_data = np.loadtxt(file_name,dtype='float', comments='#', delimiter=None, converters=None, skiprows=0, usecols=None,
                unpack=False, ndmin=0)
mergeSort(inv_data)
print("sorted list :", inv_data)

The data set is here in the below link http://spark-public.s3.amazonaws.com/algo1/programming_prob/IntegerArray.txt

È stato utile?

Soluzione

I guess you tested with Python lists when you found the code working very well. Now you a using it with a NumPy array. The crucial difference is that slicing a NumPy array as in here

lefthalf = alist[:mid]
righthalf = alist[mid:]

creates views over the original array, while slicing a list creates copies.

When your algorithm overwrites alist by merging lefthalf and righthalf, all three lists must be separate; otherwise it may overwrite items lefthalf that have not yet been merged.

The bug can be triggered by a small array:

>>> l = np.arange(10,0,-1)
>>> l
array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])
>>> mergeSort(l)
>>> l
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top