如何在O（n）中找到长度为n的未排序数组中的第k个最大元素？

https://stackoverflow.com/questions/251781

05-07-2019
|

题

我相信有一种方法可以在O（n）中找到长度为n的未排序数组中的第k个最大元素。或许它是<！>“;预期<！>”; O（n）或其他什么。我们怎么做呢？

解决方案

这称为查找第k顺序统计。有一个非常简单的随机算法（称为 quickselect ），它采用O(n)平均时间，O(n^2)最坏情况时间，以及一个非常复杂的非随机算法（称为 introselect ）采取<=>最坏的情况时间。有关维基百科的一些信息，但它不是很好。

~~您需要的一切都在这些powerpoint幻灯片~~ 。只是为了提取<=>最坏情况算法（introselect）的基本算法：

Select(A,n,i):
    Divide input into ⌈n/5⌉ groups of size 5.

    /* Partition on median-of-medians */
    medians = array of each group’s median.
    pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
    Left Array L and Right Array G = partition(A, pivot)

    /* Find ith element in L, pivot, or G */
    k = |L| + 1
    If i = k, return pivot
    If i < k, return Select(L, k-1, i)
    If i > k, return Select(G, n-k, i-k)

Cormen等人的“算法导论”一书中也详细介绍了

其他提示

如果你想要一个真正的O(n)算法，而不是O(kn)或类似的东西，那么你应该使用quickselect（它基本上是快速排序，你扔掉你不感兴趣的分区）。我的教授有一个很好的写作，运行时分析：（参考）

QuickSelect算法可快速找到未排序的n元素数组的第k个最小元素。它是 RandomizedAlgorithm ，因此我们计算最坏情况预期运行时间。

这是算法。

QuickSelect(A, k)
  let r be chosen uniformly at random in the range 1 to length(A)
  let pivot = A[r]
  let A1, A2 be new arrays
  # split into a pile A1 of small elements and A2 of big elements
  for i = 1 to n
    if A[i] < pivot then
      append A[i] to A1
    else if A[i] > pivot then
      append A[i] to A2
    else
      # do nothing
  end for
  if k <= length(A1):
    # it's in the pile of small elements
    return QuickSelect(A1, k)
  else if k > length(A) - length(A2)
    # it's in the pile of big elements
    return QuickSelect(A2, k - (length(A) - length(A2))
  else
    # it's equal to the pivot
    return pivot

此算法的运行时间是多少？如果对手为我们翻转硬币，我们可能会发现枢轴始终是最大的元素，k始终为1，运行时间为

T(n) = Theta(n) + T(n-1) = Theta(n²)

但如果选择确实是随机的，那么预期的运行时间由

给出

T(n) <= Theta(n) + (1/n) ∑_{i=1 to n}T(max(i, n-i-1))

我们在做出不完全合理的假设，即递归始终位于A1或A2的较大位置。

让我们为某些T(n) <= an猜测a。然后我们得到

T(n) 
 <= cn + (1/n) ∑_{i=1 to n}T(max(i-1, n-i))
 = cn + (1/n) ∑_{i=1 to floor(n/2)} T(n-i) + (1/n) ∑_{i=floor(n/2)+1 to n} T(i)
 <= cn + 2 (1/n) ∑_{i=floor(n/2) to n} T(i)
 <= cn + 2 (1/n) ∑_{i=floor(n/2) to n} ai

现在不知何故，我们必须在加号的右侧获得可怕的总和以吸收左侧的cn。如果我们将它绑定为2(1/n) ∑_{i=n/2 to n} an，我们会大致2(1/n)(n/2)an = an。但这太大了 - 没有空间可以挤出额外的floor(n/2)。因此，让我们使用算术系列公式扩展总和：

∑_{i=floor(n/2) to n} i  
 = ∑_{i=1 to n} i - ∑_{i=1 to floor(n/2)} i  
 = n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2  
 <= n²/2 - (n/4)²/2  
 = (15/32)n²

我们利用n <！>“足够大<！>”;用更清洁（和更小）的n/4替换丑陋的a > 16c因素。现在我们可以继续

cn + 2 (1/n) ∑_{i=floor(n/2) to n} ai,
 <= cn + (2a/n) (15/32) n²
 = n (c + (15/16)a)
 <= an

提供T(n) = O(n)。

这给出了Omega(n)。它显然是T(n) = Theta(n)，所以我们得到<=>。

快速谷歌（第k个最大的元素数组）返回了这个： http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17

"Make one pass through tracking the three largest values so far."

（特别是最大的3d）

和这个答案：

Build a heap/priority queue.  O(n)
Pop top element.  O(log n)
Pop top element.  O(log n)
Pop top element.  O(log n)

Total = O(n) + 3 O(log n) = O(n)

你喜欢quicksort。随机选择一个元素并将所有内容推得更高或更低。此时你将知道你实际选择了哪个元素，如果它是你完成的第k个元素，否则你重复使用bin（更高或更低），第k个元素将落入。统计上讲，时间找到第k个元素随n，O（n）增长。

C ++标准库几乎完全与功能调用nth_element ，虽然它确实修改了你的数据。它预计线性运行时间为O（N），它也会进行部分排序。

const int N = ...;
double a[N];
// ... 
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a

虽然不太确定O（n）复杂度，但肯定会在O（n）和nLog（n）之间。也确定比nLog（n）更接近O（n）。函数用Java编写

public int quickSelect(ArrayList<Integer>list, int nthSmallest){
    //Choose random number in range of 0 to array length
    Random random =  new Random();
    //This will give random number which is not greater than length - 1
    int pivotIndex = random.nextInt(list.size() - 1); 

    int pivot = list.get(pivotIndex);

    ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
    ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();

    //Split list into two. 
    //Value smaller than pivot should go to smallerNumberList
    //Value greater than pivot should go to greaterNumberList
    //Do nothing for value which is equal to pivot
    for(int i=0; i<list.size(); i++){
        if(list.get(i)<pivot){
            smallerNumberList.add(list.get(i));
        }
        else if(list.get(i)>pivot){
            greaterNumberList.add(list.get(i));
        }
        else{
            //Do nothing
        }
    }

    //If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list 
    if(nthSmallest < smallerNumberList.size()){
        return quickSelect(smallerNumberList, nthSmallest);
    }
    //If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
    //The step is bit tricky. If confusing, please see the above loop once again for clarification.
    else if(nthSmallest > (list.size() - greaterNumberList.size())){
        //nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in 
        //smallerNumberList
        nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
        return quickSelect(greaterNumberList,nthSmallest);
    }
    else{
        return pivot;
    }
}

我使用动态编程，特别是锦标赛方法，在n个未排序元素中实现了kth最小值。执行时间是O（n + klog（n））。使用的机制被列为维基百科页面上关于选择算法的方法之一（如上面的一个帖子中所示）。您可以在我的博客页面上阅读有关该算法的信息并找到代码（java）找到Kth最小值。此外，逻辑可以对列表进行部分排序 - 在O（klog（n））时间内返回第一个K min（或最大值）。

虽然提供的代码结果为kth minimum，但是可以使用类似的逻辑来找出O（klog（n））中的第k个最大值，忽略了创建锦标赛树所做的前期工作。

你可以在O（n + kn）= O（n）（对于常数k）表示时间，O（k）表示空间，通过跟踪你看到的k个最大元素。

对于数组中的每个元素，您可以扫描k最大的列表，如果它更大，则用新的元素替换最小的元素。

Warren的优先堆解决方案虽然简洁。

Python中的性感quickselect

def quickselect(arr, k):
    '''
     k = 1 returns first element in ascending order.
     can be easily modified to return first element in descending order
    '''

    r = random.randrange(0, len(arr))

    a1 = [i for i in arr if i < arr[r]] '''partition'''
    a2 = [i for i in arr if i > arr[r]]

    if k <= len(a1):
        return quickselect(a1, k)
    elif k > len(arr)-len(a2):
        return quickselect(a2, k - (len(arr) - len(a2)))
    else:
        return arr[r]

在线性时间内找到数组的中位数，然后使用与快速排序完全相同的分区过程将数组分成两部分，中位数较小（<！> lt;）左边的值比中位数和右边大于（<！> gt;）中位数，也可以在行间时间内完成，现在，转到第k个元素所在的数组部分，现在复发成为： T（n）= T（n / 2）+ cn 这给了我一些O（n）。

以下是完整实现的链接，其中详细解释了如何在未排序算法中查找Kth元素的算法。基本思想是像QuickSort一样对数组进行分区。但是为了避免极端情况（例如，当在每个步骤中选择最小元素作为枢轴，使得算法退化为O（n ^ 2）运行时间）时，应用特殊的枢轴选择，称为中位数算法。整个解决方案在最差和平均情况下在O（n）时间内运行。

以下是完整文章的链接（它是关于找到Kth 最小元素，但找到Kth 最大的原理相同）：

查找未排序数组中的第K个最小元素

根据本文在n个项目列表中查找第K个最大项目在最坏的情况下，以下算法将花费O(n)时间。

将数组分为5个元素的n / 5个列表。

在5个元素的每个子数组中找到中位数。

递归<！>＃64257; nd所有中位数的中位数，让我们称之为M

将数组分区为两个子数组第一个子数组包含大于M的元素，假设这个子数组是a1，而其他子数组包含的元素小于M.，让我们调用这个子数组A2。

如果k <！> lt; = | a1 |，则返回选择（a1，k）。

如果k <！>＃8722; 1 = | a1 |，返回M。

如果k <！> gt; | A1 | + 1，返回选择（a2，k <！>＃8722; a1 <！>＃8722; 1）。

分析：正如原始论文所述：


我们使用中位数将列表分成两半（上半部分，   if k <= n/2，否则下半场）。这个算法需要   时间cn在第一级递归中为某些常量c，cn/2 at   下一级（因为我们在一个大小为n / 2的列表中递归），cn/4在   第三级，依此类推。所用的总时间是cn + cn/2 + cn/4 + .... = 2cn = o(n)。

为什么分区大小是5而不是3？

如原始论文中所述：


将列表除以5可确保最坏情况下的分割为70 <！>＃8722; 30.至少   一半的中位数大于中位数的中位数，因此至少   n / 5块中的一半具有至少3个元素，这给出了一个   3n/10拆分，这意味着在最坏的情况下，其他分区是7n / 10。   这给了T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1，   最坏情况下的运行时间是O(nlogn)。

现在我尝试将上述算法实现为：

public static int findKthLargestUsingMedian(Integer[] array, int k) { // Step 1: Divide the list into n/5 lists of 5 element each. int noOfRequiredLists = (int) Math.ceil(array.length / 5.0); // Step 2: Find pivotal element aka median of medians. int medianOfMedian = findMedianOfMedians(array, noOfRequiredLists); //Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian. List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian for (Integer element : array) { if (element < medianOfMedian) { listWithSmallerNumbers.add(element); } else if (element > medianOfMedian) { listWithGreaterNumbers.add(element); } } // Next step. if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k); else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian; else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1); return -1; } public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) { int[] medians = new int[noOfRequiredLists]; for (int count = 0; count < noOfRequiredLists; count++) { int startOfPartialArray = 5 * count; int endOfPartialArray = startOfPartialArray + 5; Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray); // Step 2: Find median of each of these sublists. int medianIndex = partialArray.length/2; medians[count] = partialArray[medianIndex]; } // Step 3: Find median of the medians. return medians[medians.length / 2]; }

为了完成，另一种算法使用优先级队列并花费时间18 18。

public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) { int p = 0; int numElements = nums.length; // create priority queue where all the elements of nums will be stored PriorityQueue<Integer> pq = new PriorityQueue<Integer>(); // place all the elements of the array to this priority queue for (int n : nums) { pq.add(n); } // extract the kth largest element while (numElements - k + 1 > 0) { p = pq.poll(); k++; } return p; }

这两种算法都可以测试为：

public static void main(String[] args) throws IOException { Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14}; System.out.println(findKthLargestUsingMedian(numbers, 8)); System.out.println(findKthLargestUsingPriorityQueue(numbers, 8)); }

正如预期的输出是： <=>

这种方法怎么样

保持buffer of length k和tmp_max，让tmp_max为O（k），并且完成n次，所以O(kn)

是对还是我错过了什么？

虽然它没有超过quickselect的平均情况和中位数统计方法的最坏情况，但它很容易理解和实现。

遍历列表。如果当前值大于存储的最大值，则将其存储为最大值，然后将1-4降低并从列表中删除5滴。如果没有，将它与数字2进行比较并做同样的事情。重复，检查所有5个存储的值。这应该在O（n）
中完成

我想建议一个答案

如果我们采用前k个元素并将它们排序为k值的链表

现在对于每个其他值，即使对于最坏的情况，如果我们对剩余nk值进行插入排序，即使在最坏的情况下，比较的数量将是k *（nk）并且对于要排序的prev k值，使其为k * （k-1）所以它是（nk-k），它是o（n）
欢呼

可以在此处找到中位数 - 中位数算法以找出n中第k个最大整数的说明： http://cs.indstate.edu/~spitla/presentation.pdf

c ++中的实现如下：

#include <iostream> #include <vector> #include <algorithm> using namespace std; int findMedian(vector<int> vec){ // Find median of a vector int median; size_t size = vec.size(); median = vec[(size/2)]; return median; } int findMedianOfMedians(vector<vector<int> > values){ vector<int> medians; for (int i = 0; i < values.size(); i++) { int m = findMedian(values[i]); medians.push_back(m); } return findMedian(medians); } void selectionByMedianOfMedians(const vector<int> values, int k){ // Divide the list into n/5 lists of 5 elements each vector<vector<int> > vec2D; int count = 0; while (count != values.size()) { int countRow = 0; vector<int> row; while ((countRow < 5) && (count < values.size())) { row.push_back(values[count]); count++; countRow++; } vec2D.push_back(row); } cout<<endl<<endl<<"Printing 2D vector : "<<endl; for (int i = 0; i < vec2D.size(); i++) { for (int j = 0; j < vec2D[i].size(); j++) { cout<<vec2D[i][j]<<" "; } cout<<endl; } cout<<endl; // Calculating a new pivot for making splits int m = findMedianOfMedians(vec2D); cout<<"Median of medians is : "<<m<<endl; // Partition the list into unique elements larger than 'm' (call this sublist L1) and // those smaller them 'm' (call this sublist L2) vector<int> L1, L2; for (int i = 0; i < vec2D.size(); i++) { for (int j = 0; j < vec2D[i].size(); j++) { if (vec2D[i][j] > m) { L1.push_back(vec2D[i][j]); }else if (vec2D[i][j] < m){ L2.push_back(vec2D[i][j]); } } } // Checking the splits as per the new pivot 'm' cout<<endl<<"Printing L1 : "<<endl; for (int i = 0; i < L1.size(); i++) { cout<<L1[i]<<" "; } cout<<endl<<endl<<"Printing L2 : "<<endl; for (int i = 0; i < L2.size(); i++) { cout<<L2[i]<<" "; } // Recursive calls if ((k - 1) == L1.size()) { cout<<endl<<endl<<"Answer :"<<m; }else if (k <= L1.size()) { return selectionByMedianOfMedians(L1, k); }else if (k > (L1.size() + 1)){ return selectionByMedianOfMedians(L2, k-((int)L1.size())-1); } } int main() { int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14}; vector<int> vec(values, values + 25); cout<<"The given array is : "<<endl; for (int i = 0; i < vec.size(); i++) { cout<<vec[i]<<" "; } selectionByMedianOfMedians(vec, 8); return 0; }

还有 Wirth的选择算法，其中具有比QuickSelect更简单的实现。 Wirth的选择算法比QuickSelect慢，但是通过一些改进它会变得更快。

更详细。使用Vladimir Zabrodsky的MODIFIND优化和3个中心枢轴选择并关注算法分区部分的最后步骤，我提出了以下算法（可以想象地命名为<！>“LefSelect <！ > QUOT）：

#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; } # Note: The code needs more than 2 elements to work float lefselect(float a[], const int n, const int k) { int l=0, m = n-1, i=l, j=m; float x; while (l<m) { if( a[k] < a[i] ) F_SWAP(a[i],a[k]); if( a[j] < a[i] ) F_SWAP(a[i],a[j]); if( a[j] < a[k] ) F_SWAP(a[k],a[j]); x=a[k]; while (j>k & i<k) { do i++; while (a[i]<x); do j--; while (a[j]>x); F_SWAP(a[i],a[j]); } i++; j--; if (j<k) { while (a[i]<x) i++; l=i; j=m; } if (k<i) { while (x<a[j]) j--; m=j; i=l; } } return a[k]; }

在此处的基准测试中，LefSelect为20-30比QuickSelect快％。

Haskell解决方案：

kthElem index list = sort list !! index withShape ~[] [] = [] withShape ~(x:xs) (y:ys) = x : withShape xs ys sort [] = [] sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs) where ls = filter (< x) rs = filter (>= x)

通过使用withShape方法发现分区的大小而不实际计算它，实现了中值解的中值。

这是Randomized QuickSelect的C ++实现。我们的想法是随机选择一个枢轴元素。为了实现随机分区，我们使用随机函数rand（）来生成l和r之间的索引，将随机生成的索引处的元素与最后一个元素交换，最后调用使用last元素作为pivot的标准分区进程。

#include<iostream> #include<climits> #include<cstdlib> using namespace std; int randomPartition(int arr[], int l, int r); // This function returns k'th smallest element in arr[l..r] using // QuickSort based method. ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT int kthSmallest(int arr[], int l, int r, int k) { // If k is smaller than number of elements in array if (k > 0 && k <= r - l + 1) { // Partition the array around a random element and // get position of pivot element in sorted array int pos = randomPartition(arr, l, r); // If position is same as k if (pos-l == k-1) return arr[pos]; if (pos-l > k-1) // If position is more, recur for left subarray return kthSmallest(arr, l, pos-1, k); // Else recur for right subarray return kthSmallest(arr, pos+1, r, k-pos+l-1); } // If k is more than number of elements in array return INT_MAX; } void swap(int *a, int *b) { int temp = *a; *a = *b; *b = temp; } // Standard partition process of QuickSort(). It considers the last // element as pivot and moves all smaller element to left of it and // greater elements to right. This function is used by randomPartition() int partition(int arr[], int l, int r) { int x = arr[r], i = l; for (int j = l; j <= r - 1; j++) { if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them { swap(&arr[i], &arr[j]); i++; } } swap(&arr[i], &arr[r]); // swap the pivot return i; } // Picks a random pivot element between l and r and partitions // arr[l..r] around the randomly picked element using partition() int randomPartition(int arr[], int l, int r) { int n = r-l+1; int pivot = rand() % n; swap(&arr[l + pivot], &arr[r]); return partition(arr, l, r); } // Driver program to test above methods int main() { int arr[] = {12, 3, 5, 7, 4, 19, 26}; int n = sizeof(arr)/sizeof(arr[0]), k = 3; cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k); return 0; }

上述解决方案的最坏情况时间复杂度仍然是O（n2）。在最坏的情况下，随机函数可能总是选择角元素。上述随机化QuickSelect的预期时间复杂度为<！>＃920;（n）

已创建优先级队列。

将所有元素插入堆中。

调用poll（）k次。

public static int getKthLargestElements(int[] arr) { PriorityQueue<Integer> pq = new PriorityQueue<>((x , y) -> (y-x)); //insert all the elements into heap for(int ele : arr) pq.offer(ele); // call poll() k times int i=0; while(i<k) { int result = pq.poll(); } return result; }

这是Javascript中的一个实现。

如果释放无法修改数组的约束，则可以使用两个索引来阻止使用额外内存来识别<！> quot;当前分区<！> （采用经典的快速风格 - http：// www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/ ）。

function kthMax(a, k){ var size = a.length; var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2) //Create an array with all element lower than the pivot and an array with all element higher than the pivot var i, lowerArray = [], upperArray = []; for (i = 0; i < size; i++){ var current = a[i]; if (current < pivot) { lowerArray.push(current); } else if (current > pivot) { upperArray.push(current); } } //Which one should I continue with? if(k <= upperArray.length) { //Upper return kthMax(upperArray, k); } else { var newK = k - (size - lowerArray.length); if (newK > 0) { ///Lower return kthMax(lowerArray, newK); } else { //None ... it's the current pivot! return pivot; } } }

如果您想测试其效果，可以使用以下变体：

function kthMax (a, k, logging) { var comparisonCount = 0; //Number of comparison that the algorithm uses var memoryCount = 0; //Number of integers in memory that the algorithm uses var _log = logging; if(k < 0 || k >= a.length) { if (_log) console.log ("k is out of range"); return false; } function _kthmax(a, k){ var size = a.length; var pivot = a[parseInt(Math.random()*size)]; if(_log) console.log("Inputs:", a, "size="+size, "k="+k, "pivot="+pivot); // This should never happen. Just a nice check in this exercise // if you are playing with the code to avoid never ending recursion if(typeof pivot === "undefined") { if (_log) console.log ("Ops..."); return false; } var i, lowerArray = [], upperArray = []; for (i = 0; i < size; i++){ var current = a[i]; if (current < pivot) { comparisonCount += 1; memoryCount++; lowerArray.push(current); } else if (current > pivot) { comparisonCount += 2; memoryCount++; upperArray.push(current); } } if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray); if(k <= upperArray.length) { comparisonCount += 1; return _kthmax(upperArray, k); } else if (k > size - lowerArray.length) { comparisonCount += 2; return _kthmax(lowerArray, k - (size - lowerArray.length)); } else { comparisonCount += 2; return pivot; } /* * BTW, this is the logic for kthMin if we want to implement that... ;-) * if(k <= lowerArray.length) { return kthMin(lowerArray, k); } else if (k > size - upperArray.length) { return kthMin(upperArray, k - (size - upperArray.length)); } else return pivot; */ } var result = _kthmax(a, k); return {result: result, iterations: comparisonCount, memory: memoryCount}; }

剩下的代码就是创建一些游乐场：

function getRandomArray (n){ var ar = []; for (var i = 0, l = n; i < l; i++) { ar.push(Math.round(Math.random() * l)) } return ar; } //Create a random array of 50 numbers var ar = getRandomArray (50);

现在，运行一下你的测试。由于Math.random（），每次都会产生不同的结果：

kthMax(ar, 2, true); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 34, true); kthMax(ar, 34); kthMax(ar, 34); kthMax(ar, 34); kthMax(ar, 34); kthMax(ar, 34);

如果你测试几次，你甚至可以凭经验看到迭代次数平均为O（n）〜=常数* n，而k的值不会影响算法。

我想出了这个算法，似乎是O（n）：

假设k = 3，我们想找到数组中的第三大项。我将创建三个变量，并将数组的每个项目与这三个变量中的最小值进行比较。如果数组项大于最小值，我们将使用项值替换min变量。我们继续相同的事情直到阵列结束。我们的三个变量中的最小值是数组中的第三大项。

define variables a=0, b=0, c=0 iterate through the array items find minimum a,b,c if item > min then replace the min variable with item value continue until end of array the minimum of a,b,c is our answer

而且，要找到第K个最大的项目，我们需要K个变量。

示例：（k = 3）

[1,2,4,1,7,3,9,5,6,2,9,8] Final variable values: a=7 (answer) b=8 c=9

有人可以回顾一下，让我知道我错过了什么吗？

这里是建议的算法eladv的实现（我也在这里用随机数据透露实现）：

public class Median { public static void main(String[] s) { int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16}; System.out.println(selectK(test,8)); /* int n = 100000000; int[] test = new int[n]; for(int i=0; i<test.length; i++) test[i] = (int)(Math.random()*test.length); long start = System.currentTimeMillis(); random_selectK(test, test.length/2); long end = System.currentTimeMillis(); System.out.println(end - start); */ } public static int random_selectK(int[] a, int k) { if(a.length <= 1) return a[0]; int r = (int)(Math.random() * a.length); int p = a[r]; int small = 0, equal = 0, big = 0; for(int i=0; i<a.length; i++) { if(a[i] < p) small++; else if(a[i] == p) equal++; else if(a[i] > p) big++; } if(k <= small) { int[] temp = new int[small]; for(int i=0, j=0; i<a.length; i++) if(a[i] < p) temp[j++] = a[i]; return random_selectK(temp, k); } else if (k <= small+equal) return p; else { int[] temp = new int[big]; for(int i=0, j=0; i<a.length; i++) if(a[i] > p) temp[j++] = a[i]; return random_selectK(temp,k-small-equal); } } public static int selectK(int[] a, int k) { if(a.length <= 5) { Arrays.sort(a); return a[k-1]; } int p = median_of_medians(a); int small = 0, equal = 0, big = 0; for(int i=0; i<a.length; i++) { if(a[i] < p) small++; else if(a[i] == p) equal++; else if(a[i] > p) big++; } if(k <= small) { int[] temp = new int[small]; for(int i=0, j=0; i<a.length; i++) if(a[i] < p) temp[j++] = a[i]; return selectK(temp, k); } else if (k <= small+equal) return p; else { int[] temp = new int[big]; for(int i=0, j=0; i<a.length; i++) if(a[i] > p) temp[j++] = a[i]; return selectK(temp,k-small-equal); } } private static int median_of_medians(int[] a) { int[] b = new int[a.length/5]; int[] temp = new int[5]; for(int i=0; i<b.length; i++) { for(int j=0; j<5; j++) temp[j] = a[5*i + j]; Arrays.sort(temp); b[i] = temp[2]; } return selectK(b, b.length/2 + 1); } }

它类似于quickSort策略，我们选择一个任意的枢轴，并将较小的元素放在其左侧，将较大的元素放在右侧

public static int kthElInUnsortedList(List<int> list, int k) { if (list.Count == 1) return list[0]; List<int> left = new List<int>(); List<int> right = new List<int>(); int pivotIndex = list.Count / 2; int pivot = list[pivotIndex]; //arbitrary for (int i = 0; i < list.Count && i != pivotIndex; i++) { int currentEl = list[i]; if (currentEl < pivot) left.Add(currentEl); else right.Add(currentEl); } if (k == left.Count + 1) return pivot; if (left.Count < k) return kthElInUnsortedList(right, k - left.Count - 1); else return kthElInUnsortedList(left, k); }

转到此链接的结尾：...........

http：/ /www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/

你可以在O（n）时间和恒定空间中找到第k个最小元素。如果我们认为数组只用于整数。

方法是对Array值范围进行二进制搜索。如果我们在整数范围内都有min_value和max_value，我们可以对该范围进行二进制搜索。我们可以写一个比较器函数，告诉我们是否有任何值是第k个最小值或小于第k个最小值或大于第k个最小值。进行二进制搜索，直到达到第k个最小的数字

以下是
的代码
类解决方案：

def _iskthsmallest(self, A, val, k): less_count, equal_count = 0, 0 for i in range(len(A)): if A[i] == val: equal_count += 1 if A[i] < val: less_count += 1 if less_count >= k: return 1 if less_count + equal_count < k: return -1 return 0 def kthsmallest_binary(self, A, min_val, max_val, k): if min_val == max_val: return min_val mid = (min_val + max_val)/2 iskthsmallest = self._iskthsmallest(A, mid, k) if iskthsmallest == 0: return mid if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k) return self.kthsmallest_binary(A, mid+1, max_val, k) # @param A : tuple of integers # @param B : integer # @return an integer def kthsmallest(self, A, k): if not A: return 0 if k > len(A): return 0 min_val, max_val = min(A), max(A) return self.kthsmallest_binary(A, min_val, max_val, k)

还有一种算法优于quickselect算法。它被称为 Floyd-Rivets（FR）算法。

原创文章： https://doi.org/10.1145/360680.360694

可下载版本： https://en.wikipedia.org/wiki/Floyd ％E2％80％93Rivest_algorithm

我尝试在C ++中实现quickselect和FR算法。我还将它们与标准C ++库实现std :: nth_element（它基本上是quickselect和heapselect的introselect hybrid）进行了比较。结果是快速选择和nth_element平均运行，但FR算法运行约。比它们快两倍。

我用于FR算法的示例代码：

template <typename T> T FRselect(std::vector<T>& data, const size_t& n) { if (n == 0) return *(std::min_element(data.begin(), data.end())); else if (n == data.size() - 1) return *(std::max_element(data.begin(), data.end())); else return _FRselect(data, 0, data.size() - 1, n); } template <typename T> T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n) { size_t leftIdx = left; size_t rightIdx = right; while (rightIdx > leftIdx) { if (rightIdx - leftIdx > 600) { size_t range = rightIdx - leftIdx + 1; long long i = n - (long long)leftIdx + 1; long long z = log(range); long long s = 0.5 * exp(2 * z / 3); long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2); size_t newLeft = fmax(leftIdx, n - i * s / range + sd); size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd); _FRselect(data, newLeft, newRight, n); } T t = data[n]; size_t i = leftIdx; size_t j = rightIdx; // arrange pivot and right index std::swap(data[leftIdx], data[n]); if (data[rightIdx] > t) std::swap(data[rightIdx], data[leftIdx]); while (i < j) { std::swap(data[i], data[j]); ++i; --j; while (data[i] < t) ++i; while (data[j] > t) --j; } if (data[leftIdx] == t) std::swap(data[leftIdx], data[j]); else { ++j; std::swap(data[j], data[rightIdx]); } // adjust left and right towards the boundaries of the subset // containing the (k - left + 1)th smallest element if (j <= n) leftIdx = j + 1; if (n <= j) rightIdx = j - 1; } return data[leftIdx]; } template <typename T> int sgn(T val) { return (T(0) < val) - (val < T(0)); }

我要做的是：

initialize empty doubly linked list l for each element e in array if e larger than head(l) make e the new head of l if size(l) > k remove last element from l the last element of l should now be the kth largest element

您可以简单地存储指向链接列表中第一个和最后一个元素的指针。它们仅在更新列表时更改。

更新

initialize empty sorted tree l for each element e in array if e between head(l) and tail(l) insert e into l // O(log k) if size(l) > k remove last element from l the last element of l should now be the kth largest element

首先我们可以从未排序的数组构建一个BST，它需要O（n）时间，而从BST我们可以找到O（log（n））中的第k个最小元素，它总计为O（n）的数量级

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow