Come trovare il kesimo elemento più grande in un array non ordinato di lunghezza n in O(n)?

https://stackoverflow.com/questions/251781

05-07-2019
|

Domanda

Credo che ci sia un modo per trovare il kesimo elemento più grande in un array non ordinato di lunghezza n in O(n).O forse è "previsto" O(n) o qualcosa del genere.Come possiamo farlo?

Soluzione

Questo si chiama trovare la statistica del k-esimo ordine . C'è un algoritmo randomizzato molto semplice (chiamato selezione rapida ) che impiega O(n) tempo medio, O(n^2) tempo nel caso peggiore e un algoritmo non randomizzato piuttosto complicato (chiamato introselect ) impiegando <=> il caso peggiore. Ci sono alcune informazioni su Wikipedia , ma non è molto buono.

~~Tutto ciò di cui hai bisogno è in queste diapositive powerpoint~~ . Solo per estrarre l'algoritmo di base dell'algoritmo <=> nel caso peggiore (introselect):

Select(A,n,i):
    Divide input into ⌈n/5⌉ groups of size 5.

    /* Partition on median-of-medians */
    medians = array of each group’s median.
    pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
    Left Array L and Right Array G = partition(A, pivot)

    /* Find ith element in L, pivot, or G */
    k = |L| + 1
    If i = k, return pivot
    If i < k, return Select(L, k-1, i)
    If i > k, return Select(G, n-k, i-k)

È anche molto ben dettagliato nel libro Introduzione agli algoritmi di Cormen et al.

Altri suggerimenti

Se vuoi un vero algoritmo O(n), al contrario di O(kn) o qualcosa del genere, allora dovresti usare quickselect (è fondamentalmente quicksort in cui butti la partizione che non ti interessa). Il mio prof ha un ottimo riscontro, con l'analisi del runtime: ( reference )

L'algoritmo QuickSelect trova rapidamente il k-esimo elemento più piccolo di una matrice non ordinata di n elementi. È un RandomizedAlgorithm , quindi calcoliamo il caso peggiore previsto tempo di esecuzione.

Ecco l'algoritmo.

QuickSelect(A, k)
  let r be chosen uniformly at random in the range 1 to length(A)
  let pivot = A[r]
  let A1, A2 be new arrays
  # split into a pile A1 of small elements and A2 of big elements
  for i = 1 to n
    if A[i] < pivot then
      append A[i] to A1
    else if A[i] > pivot then
      append A[i] to A2
    else
      # do nothing
  end for
  if k <= length(A1):
    # it's in the pile of small elements
    return QuickSelect(A1, k)
  else if k > length(A) - length(A2)
    # it's in the pile of big elements
    return QuickSelect(A2, k - (length(A) - length(A2))
  else
    # it's equal to the pivot
    return pivot

Qual è il tempo di esecuzione di questo algoritmo? Se l'avversario lancia monete per noi, potremmo scoprire che il perno è sempre l'elemento più grande e k è sempre 1, dando un tempo di esecuzione di

T(n) = Theta(n) + T(n-1) = Theta(n²)

Ma se le scelte sono effettivamente casuali, il tempo di esecuzione previsto è indicato da

T(n) <= Theta(n) + (1/n) ∑_{i=1 to n}T(max(i, n-i-1))

in cui stiamo assumendo il presupposto non del tutto ragionevole che la ricorsione atterri sempre nella maggiore di A1 o A2.

Supponiamo che T(n) <= an per alcuni a. Quindi otteniamo

T(n) 
 <= cn + (1/n) ∑_{i=1 to n}T(max(i-1, n-i))
 = cn + (1/n) ∑_{i=1 to floor(n/2)} T(n-i) + (1/n) ∑_{i=floor(n/2)+1 to n} T(i)
 <= cn + 2 (1/n) ∑_{i=floor(n/2) to n} T(i)
 <= cn + 2 (1/n) ∑_{i=floor(n/2) to n} ai

e ora in qualche modo dobbiamo ottenere la somma orrenda a destra del segno più per assorbire cn a sinistra. Se lo limitassimo come 2(1/n) ∑_{i=n/2 to n} an, otteniamo all'incirca 2(1/n)(n/2)an = an. Ma questo è troppo grande: non c'è spazio per aggiungere un ulteriore floor(n/2). Quindi espandiamo la somma usando la formula della serie aritmetica:

∑_{i=floor(n/2) to n} i  
 = ∑_{i=1 to n} i - ∑_{i=1 to floor(n/2)} i  
 = n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2  
 <= n²/2 - (n/4)²/2  
 = (15/32)n²

dove traggiamo vantaggio dal fatto che n è " sufficientemente grande " per sostituire i brutti n/4 fattori con quelli molto più puliti (e più piccoli) a > 16c. Ora possiamo continuare con

cn + 2 (1/n) ∑_{i=floor(n/2) to n} ai,
 <= cn + (2a/n) (15/32) n²
 = n (c + (15/16)a)
 <= an

fornito T(n) = O(n).

Questo dà Omega(n). È chiaramente T(n) = Theta(n), quindi otteniamo <=>.

Un rapido Google su questo ('kth più grande array di elementi') ha restituito questo: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17

"Make one pass through tracking the three largest values so far."

(era specificamente per 3d più grande)

e questa risposta:

Build a heap/priority queue.  O(n)
Pop top element.  O(log n)
Pop top element.  O(log n)
Pop top element.  O(log n)

Total = O(n) + 3 O(log n) = O(n)

Ti piace quicksort. Scegli un elemento a caso e spingi tutto in alto o in basso. A questo punto saprai quale elemento hai effettivamente scelto, e se è l'elemento kth che hai fatto, altrimenti ripeti con il cestino (superiore o inferiore), in cui cadrà l'elemento kth. Statisticamente parlando, il tempo serve per trovare l'elemento kth che cresce con n, O (n).

Un compagno di programmazione per l'analisi degli algoritmi fornisce una versione che È O(n), sebbene l'autore affermi che il fattore costante è così alto, probabilmente preferiresti l'ingenuo metodo "ordina l'elenco e poi seleziona".

Ho risposto alla lettera della tua domanda :)

La libreria standard C ++ ha quasi esattamente quella funzione call nth_element , sebbene modifichi i tuoi dati. Si è aspettato un tempo di esecuzione lineare, O (N), e fa anche un ordinamento parziale.

const int N = ...;
double a[N];
// ... 
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a

Sebbene non sia molto sicuro della complessità di O (n), sarà sicuramente compreso tra O (n) e nLog (n). Assicurati anche di essere più vicino a O (n) di nLog (n). La funzione è scritta in Java

public int quickSelect(ArrayList<Integer>list, int nthSmallest){
    //Choose random number in range of 0 to array length
    Random random =  new Random();
    //This will give random number which is not greater than length - 1
    int pivotIndex = random.nextInt(list.size() - 1); 

    int pivot = list.get(pivotIndex);

    ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
    ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();

    //Split list into two. 
    //Value smaller than pivot should go to smallerNumberList
    //Value greater than pivot should go to greaterNumberList
    //Do nothing for value which is equal to pivot
    for(int i=0; i<list.size(); i++){
        if(list.get(i)<pivot){
            smallerNumberList.add(list.get(i));
        }
        else if(list.get(i)>pivot){
            greaterNumberList.add(list.get(i));
        }
        else{
            //Do nothing
        }
    }

    //If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list 
    if(nthSmallest < smallerNumberList.size()){
        return quickSelect(smallerNumberList, nthSmallest);
    }
    //If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
    //The step is bit tricky. If confusing, please see the above loop once again for clarification.
    else if(nthSmallest > (list.size() - greaterNumberList.size())){
        //nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in 
        //smallerNumberList
        nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
        return quickSelect(greaterNumberList,nthSmallest);
    }
    else{
        return pivot;
    }
}

Ho implementato la ricerca del kth minimo in n elementi non ordinati usando la programmazione dinamica, in particolare il metodo del torneo. Il tempo di esecuzione è O (n + klog (n)). Il meccanismo utilizzato è elencato come uno dei metodi nella pagina Wikipedia sull'algoritmo di selezione (come indicato in uno dei post sopra). Puoi leggere l'algoritmo e trovare anche il codice (java) nella mia pagina del blog Alla ricerca del minimo Kth . Inoltre, la logica può eseguire un ordinamento parziale dell'elenco: restituisce i primi K min (o max) nel tempo O (klog (n)).

Sebbene il codice fornito abbia un risultato minimo di kth, è possibile utilizzare una logica simile per trovare il massimo di kth in O (klog (n)), ignorando il lavoro preliminare fatto per creare l'albero del torneo.

Puoi farlo in O (n + kn) = O (n) (per costante k) per tempo e O (k) per spazio, tenendo traccia dei k elementi più grandi che hai visto.

Per ogni elemento dell'array puoi scansionare l'elenco di k più grande e sostituire l'elemento più piccolo con quello nuovo se è più grande.

La soluzione di heap prioritario di Warren è comunque più ordinata.

Sexy selezione rapida in Python

def quickselect(arr, k):
    '''
     k = 1 returns first element in ascending order.
     can be easily modified to return first element in descending order
    '''

    r = random.randrange(0, len(arr))

    a1 = [i for i in arr if i < arr[r]] '''partition'''
    a2 = [i for i in arr if i > arr[r]]

    if k <= len(a1):
        return quickselect(a1, k)
    elif k > len(arr)-len(a2):
        return quickselect(a2, k - (len(arr) - len(a2)))
    else:
        return arr[r]

Trova la mediana dell'array in tempo lineare, quindi usa la procedura di partizione esattamente come in quicksort per dividere l'array in due parti, i valori a sinistra della mediana minori (<) rispetto a quelli della mediana e alla proprio maggiore della mediana (>), anche questo può essere fatto in tempo lineare, ora vai a quella parte dell'array in cui si trova kth element, Ora la ricorrenza diventa: T (n) = T (n / 2) + cn che mi dà O (n) overal.

Di seguito è riportato il collegamento alla piena implementazione con una spiegazione abbastanza ampia su come funziona l'algoritmo per trovare l'elemento Kth in un algoritmo non ordinato. L'idea di base è quella di partizionare l'array come in QuickSort. Ma per evitare casi estremi (ad es. Quando l'elemento più piccolo viene scelto come perno in ogni fase, in modo che l'algoritmo degenera in O (n ^ 2) tempo di esecuzione), viene applicata una speciale selezione di perno, chiamata algoritmo mediana delle mediane. L'intera soluzione funziona nel tempo O (n) nel peggiore dei casi e nel caso medio.

Ecco il link all'articolo completo (si tratta di trovare Kth elemento più piccolo, ma il principio è lo stesso per trovare Kth più grande ):

Trovare l'elemento Kth più piccolo in un array non ordinato

Come da questo documento Trovare l'elemento Kth più grande in un elenco di n articoli il seguente algoritmo impiegherà O(n) il tempo peggiore.

Dividi l'array in n / 5 elenchi di 5 elementi ciascuno.
Trova la mediana in ciascun sotto array di 5 elementi.
Ricorsivamente & # 64257; e la mediana di tutte le mediane, chiamiamola M
Partiziona l'array in due sotto-array Il primo sotto-array contiene gli elementi più grandi di M, supponiamo che questo sotto-array sia a1, mentre l'altro sotto-array contiene gli elementi più piccoli di M., chiamiamo questo sotto-array a2.
Se k < = | a1 |, restituisce la selezione (a1, k).
Se k & # 8722; 1 = | a1 |, restituisce M.
Se k > | A1 | + 1, ritorna alla selezione (a2, k & # 8722; a1 & # 8722; 1).

Analisi: Come suggerito nel documento originale:

Usiamo la mediana per dividere l'elenco in due metà (la prima metà, se k <= n/2 e la seconda metà altrimenti). Questo algoritmo richiede tempo cn al primo livello di ricorsione per alcune costanti c, cn/2 a il livello successivo (dato che ricerchiamo in un elenco di dimensioni n / 2), cn/4 al terzo livello e così via. Il tempo totale impiegato è cn + cn/2 + cn/4 + .... = 2cn = o(n).

Perché la dimensione della partizione è presa 5 e non 3?

Come menzionato nell'originale paper :

La divisione dell'elenco per 5 assicura una divisione nel caso peggiore di 70 & # 8722; 30. Atleast metà delle mediane è maggiore della mediana delle mediane, quindi almeno la metà dei n / 5 blocchi ha almeno 3 elementi e questo dà a 3n/10 split, il che significa che l'altra partizione è 7n / 10 nel peggiore dei casi. Questo dà T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1, il il tempo di esecuzione peggiore è O(nlogn).

Ora ho provato a implementare l'algoritmo sopra come:

public static int findKthLargestUsingMedian(Integer[] array, int k) {
        // Step 1: Divide the list into n/5 lists of 5 element each.
        int noOfRequiredLists = (int) Math.ceil(array.length / 5.0);
        // Step 2: Find pivotal element aka median of medians.
        int medianOfMedian =  findMedianOfMedians(array, noOfRequiredLists);
        //Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian.
        List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian
        List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian
        for (Integer element : array) {
            if (element < medianOfMedian) {
                listWithSmallerNumbers.add(element);
            } else if (element > medianOfMedian) {
                listWithGreaterNumbers.add(element);
            }
        }
        // Next step.
        if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k);
        else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian;
        else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1);
        return -1;
    }

    public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) {
        int[] medians = new int[noOfRequiredLists];
        for (int count = 0; count < noOfRequiredLists; count++) {
            int startOfPartialArray = 5 * count;
            int endOfPartialArray = startOfPartialArray + 5;
            Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray);
            // Step 2: Find median of each of these sublists.
            int medianIndex = partialArray.length/2;
            medians[count] = partialArray[medianIndex];
        }
        // Step 3: Find median of the medians.
        return medians[medians.length / 2];
    }

Solo per completezza, un altro algoritmo utilizza la coda di priorità e richiede tempo 18 18.

public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) {
        int p = 0;
        int numElements = nums.length;
        // create priority queue where all the elements of nums will be stored
        PriorityQueue<Integer> pq = new PriorityQueue<Integer>();

        // place all the elements of the array to this priority queue
        for (int n : nums) {
            pq.add(n);
        }

        // extract the kth largest element
        while (numElements - k + 1 > 0) {
            p = pq.poll();
            k++;
        }

        return p;
    }

Entrambi questi algoritmi possono essere testati come:

public static void main(String[] args) throws IOException {
        Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
        System.out.println(findKthLargestUsingMedian(numbers, 8));
        System.out.println(findKthLargestUsingPriorityQueue(numbers, 8));
    }

Come previsto, l'output è: <=>

Che ne dici di questo tipo di approccio

Mantieni un buffer of length k e un tmp_max, ottenere tmp_max è O (k) e viene fatto n volte quindi qualcosa come O(kn)

È giusto o mi sto perdendo qualcosa?

Sebbene non superi il caso medio di selezione rapida e il caso peggiore del metodo statistico mediano, ma è piuttosto facile da capire e implementare.

scorre l'elenco. se il valore corrente è maggiore del valore maggiore memorizzato, memorizzarlo come il valore più grande e cancellare 1-4 verso il basso e 5 eliminati dall'elenco. In caso contrario, confrontalo con il numero 2 e fai la stessa cosa. Ripetere, verificandolo con tutti e 5 i valori memorizzati. questo dovrebbe farlo in O (n)

vorrei suggerire una risposta

se prendiamo i primi k elementi e li ordiniamo in un elenco collegato di k valori

ora per ogni altro valore anche nel caso peggiore se eseguiamo un ordinamento di inserzione per i valori nk restanti anche nel caso peggiore il numero di confronti sarà k * (nk) e per i valori prev k da ordinare lascia che sia k * (k-1) quindi risulta essere (nk-k) che è o (n)

evviva

Spiegazione dell'algoritmo mediano dei mediani per trovare il k-esimo intero più grande di n può essere trovato qui: http://cs.indstate.edu/~spitla/presentation.pdf

L'implementazione in c ++ è di seguito:

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int findMedian(vector<int> vec){
//    Find median of a vector
    int median;
    size_t size = vec.size();
    median = vec[(size/2)];
    return median;
}

int findMedianOfMedians(vector<vector<int> > values){
    vector<int> medians;

    for (int i = 0; i < values.size(); i++) {
        int m = findMedian(values[i]);
        medians.push_back(m);
    }

    return findMedian(medians);
}

void selectionByMedianOfMedians(const vector<int> values, int k){
//    Divide the list into n/5 lists of 5 elements each
    vector<vector<int> > vec2D;

    int count = 0;
    while (count != values.size()) {
        int countRow = 0;
        vector<int> row;

        while ((countRow < 5) && (count < values.size())) {
            row.push_back(values[count]);
            count++;
            countRow++;
        }
        vec2D.push_back(row);
    }

    cout<<endl<<endl<<"Printing 2D vector : "<<endl;
    for (int i = 0; i < vec2D.size(); i++) {
        for (int j = 0; j < vec2D[i].size(); j++) {
            cout<<vec2D[i][j]<<" ";
        }
        cout<<endl;
    }
    cout<<endl;

//    Calculating a new pivot for making splits
    int m = findMedianOfMedians(vec2D);
    cout<<"Median of medians is : "<<m<<endl;

//    Partition the list into unique elements larger than 'm' (call this sublist L1) and
//    those smaller them 'm' (call this sublist L2)
    vector<int> L1, L2;

    for (int i = 0; i < vec2D.size(); i++) {
        for (int j = 0; j < vec2D[i].size(); j++) {
            if (vec2D[i][j] > m) {
                L1.push_back(vec2D[i][j]);
            }else if (vec2D[i][j] < m){
                L2.push_back(vec2D[i][j]);
            }
        }
    }

//    Checking the splits as per the new pivot 'm'
    cout<<endl<<"Printing L1 : "<<endl;
    for (int i = 0; i < L1.size(); i++) {
        cout<<L1[i]<<" ";
    }

    cout<<endl<<endl<<"Printing L2 : "<<endl;
    for (int i = 0; i < L2.size(); i++) {
        cout<<L2[i]<<" ";
    }

//    Recursive calls
    if ((k - 1) == L1.size()) {
        cout<<endl<<endl<<"Answer :"<<m;
    }else if (k <= L1.size()) {
        return selectionByMedianOfMedians(L1, k);
    }else if (k > (L1.size() + 1)){
        return selectionByMedianOfMedians(L2, k-((int)L1.size())-1);
    }

}

int main()
{
    int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};

    vector<int> vec(values, values + 25);

    cout<<"The given array is : "<<endl;
    for (int i = 0; i < vec.size(); i++) {
        cout<<vec[i]<<" ";
    }

    selectionByMedianOfMedians(vec, 8);

    return 0;
}

Esiste anche Algoritmo di selezione di Wirth , che ha un'implementazione più semplice di QuickSelect. L'algoritmo di selezione di Wirth è più lento di QuickSelect, ma con alcuni miglioramenti diventa più veloce.

Più in dettaglio. Usando l'ottimizzazione MODIFIND di Vladimir Zabrodsky e la selezione pivot mediana di 3 e prestando attenzione ai passaggi finali della parte di partizionamento dell'algoritmo, ho trovato il seguente algoritmo (immaginariamente chiamato & Quot; LefSelect quot;):

#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }

# Note: The code needs more than 2 elements to work
float lefselect(float a[], const int n, const int k) {
    int l=0, m = n-1, i=l, j=m;
    float x;

    while (l<m) {
        if( a[k] < a[i] ) F_SWAP(a[i],a[k]);
        if( a[j] < a[i] ) F_SWAP(a[i],a[j]);
        if( a[j] < a[k] ) F_SWAP(a[k],a[j]);

        x=a[k];
        while (j>k & i<k) {
            do i++; while (a[i]<x);
            do j--; while (a[j]>x);

            F_SWAP(a[i],a[j]);
        }
        i++; j--;

        if (j<k) {
            while (a[i]<x) i++;
            l=i; j=m;
        }
        if (k<i) {
            while (x<a[j]) j--;
            m=j; i=l;
        }
    }
    return a[k];
}

Nei benchmark che ho fatto qui , LefSelect è 20-30 % più veloce di QuickSelect.

Soluzione Haskell:

kthElem index list = sort list !! index

withShape ~[]     []     = []
withShape ~(x:xs) (y:ys) = x : withShape xs ys

sort []     = []
sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs)
  where
   ls = filter (<  x)
   rs = filter (>= x)

Questo implementa la mediana delle soluzioni mediane usando il metodo withShape per scoprire la dimensione di una partizione senza effettivamente calcolarla.

Ecco un'implementazione C ++ di Randomized QuickSelect. L'idea è di scegliere casualmente un elemento pivot. Per implementare la partizione randomizzata, usiamo una funzione random, rand () per generare un indice tra le r, scambiamo l'elemento in un indice generato casualmente con l'ultimo elemento e infine chiamiamo il processo di partizione standard che usa l'ultimo elemento come pivot.

#include<iostream>
#include<climits>
#include<cstdlib>
using namespace std;

int randomPartition(int arr[], int l, int r);

// This function returns k'th smallest element in arr[l..r] using
// QuickSort based method.  ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT
int kthSmallest(int arr[], int l, int r, int k)
{
    // If k is smaller than number of elements in array
    if (k > 0 && k <= r - l + 1)
    {
        // Partition the array around a random element and
        // get position of pivot element in sorted array
        int pos = randomPartition(arr, l, r);

        // If position is same as k
        if (pos-l == k-1)
            return arr[pos];
        if (pos-l > k-1)  // If position is more, recur for left subarray
            return kthSmallest(arr, l, pos-1, k);

        // Else recur for right subarray
        return kthSmallest(arr, pos+1, r, k-pos+l-1);
    }

    // If k is more than number of elements in array
    return INT_MAX;
}

void swap(int *a, int *b)
{
    int temp = *a;
    *a = *b;
    *b = temp;
}

// Standard partition process of QuickSort().  It considers the last
// element as pivot and moves all smaller element to left of it and
// greater elements to right. This function is used by randomPartition()
int partition(int arr[], int l, int r)
{
    int x = arr[r], i = l;
    for (int j = l; j <= r - 1; j++)
    {
        if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them
        {
            swap(&arr[i], &arr[j]);
            i++;
        }
    }
    swap(&arr[i], &arr[r]); // swap the pivot
    return i;
}

// Picks a random pivot element between l and r and partitions
// arr[l..r] around the randomly picked element using partition()
int randomPartition(int arr[], int l, int r)
{
    int n = r-l+1;
    int pivot = rand() % n;
    swap(&arr[l + pivot], &arr[r]);
    return partition(arr, l, r);
}

// Driver program to test above methods
int main()
{
    int arr[] = {12, 3, 5, 7, 4, 19, 26};
    int n = sizeof(arr)/sizeof(arr[0]), k = 3;
    cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k);
    return 0;
}

La complessità temporale peggiore della soluzione sopra è ancora O (n2). Nel peggiore dei casi, la funzione randomizzata può sempre scegliere un elemento d'angolo. La complessità temporale prevista di QuickSelect sopra randomizzato è & # 920; (n)

Creazione di una coda di priorità.
Inserisci tutti gli elementi nell'heap.

Chiama poll () k volte.

public static int getKthLargestElements(int[] arr)
{
    PriorityQueue<Integer> pq =  new PriorityQueue<>((x , y) -> (y-x));
    //insert all the elements into heap
    for(int ele : arr)
       pq.offer(ele);
    // call poll() k times
    int i=0;
    while(i&lt;k)
     {
       int result = pq.poll();
     } 
   return result;        
}

Questa è un'implementazione in Javascript.

Se si rilascia il vincolo che non è possibile modificare l'array, è possibile impedire l'uso di memoria aggiuntiva utilizzando due indici per identificare la " partizione corrente " (nel classico stile quicksort - http: // www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/ ).

function kthMax(a, k){
    var size = a.length;

    var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2) 

    //Create an array with all element lower than the pivot and an array with all element higher than the pivot
    var i, lowerArray = [], upperArray = [];
    for (i = 0; i  < size; i++){
        var current = a[i];

        if (current < pivot) {
            lowerArray.push(current);
        } else if (current > pivot) {
            upperArray.push(current);
        }
    }

    //Which one should I continue with?
    if(k <= upperArray.length) {
        //Upper
        return kthMax(upperArray, k);
    } else {
        var newK = k - (size - lowerArray.length);

        if (newK > 0) {
            ///Lower
            return kthMax(lowerArray, newK);
        } else {
            //None ... it's the current pivot!
            return pivot;
        }   
    }
}

Se vuoi testare il suo rendimento, puoi usare questa variante:

    function kthMax (a, k, logging) {
         var comparisonCount = 0; //Number of comparison that the algorithm uses
         var memoryCount = 0;     //Number of integers in memory that the algorithm uses
         var _log = logging;

         if(k < 0 || k >= a.length) {
            if (_log) console.log ("k is out of range"); 
            return false;
         }      

         function _kthmax(a, k){
             var size = a.length;
             var pivot = a[parseInt(Math.random()*size)];
             if(_log) console.log("Inputs:", a,  "size="+size, "k="+k, "pivot="+pivot);

             // This should never happen. Just a nice check in this exercise
             // if you are playing with the code to avoid never ending recursion            
             if(typeof pivot === "undefined") {
                 if (_log) console.log ("Ops..."); 
                 return false;
             }

             var i, lowerArray = [], upperArray = [];
             for (i = 0; i  < size; i++){
                 var current = a[i];
                 if (current < pivot) {
                     comparisonCount += 1;
                     memoryCount++;
                     lowerArray.push(current);
                 } else if (current > pivot) {
                     comparisonCount += 2;
                     memoryCount++;
                     upperArray.push(current);
                 }
             }
             if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray);

             if(k <= upperArray.length) {
                 comparisonCount += 1;
                 return _kthmax(upperArray, k);
             } else if (k > size - lowerArray.length) {
                 comparisonCount += 2;
                 return _kthmax(lowerArray, k - (size - lowerArray.length));
             } else {
                 comparisonCount += 2;
                 return pivot;
             }
     /* 
      * BTW, this is the logic for kthMin if we want to implement that... ;-)
      * 

             if(k <= lowerArray.length) {
                 return kthMin(lowerArray, k);
             } else if (k > size - upperArray.length) {
                 return kthMin(upperArray, k - (size - upperArray.length));
             } else 
                 return pivot;
     */            
         }

         var result = _kthmax(a, k);
         return {result: result, iterations: comparisonCount, memory: memoryCount};
     }

Il resto del codice è solo per creare un parco giochi:

    function getRandomArray (n){
        var ar = [];
        for (var i = 0, l = n; i < l; i++) {
            ar.push(Math.round(Math.random() * l))
        }

        return ar;
    }

    //Create a random array of 50 numbers
    var ar = getRandomArray (50);

Ora, esegui i test qualche volta. A causa di Math.random () produrrà ogni volta risultati diversi:

    kthMax(ar, 2, true);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 34, true);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);

Se lo provi alcune volte puoi vedere anche empiricamente che il numero di iterazioni è, in media, O (n) ~ = costante * n e il valore di k non influenza l'algoritmo.

Ho escogitato questo algoritmo e sembra essere O (n):

Diciamo k = 3 e vogliamo trovare il terzo oggetto più grande nella matrice. Vorrei creare tre variabili e confrontare ogni elemento dell'array con il minimo di queste tre variabili. Se l'articolo dell'array è maggiore del nostro minimo, sostituiremmo la variabile min con il valore dell'articolo. Continuiamo la stessa cosa fino alla fine dell'array. Il minimo delle nostre tre variabili è il terzo elemento più grande dell'array.

define variables a=0, b=0, c=0
iterate through the array items
    find minimum a,b,c
    if item > min then replace the min variable with item value
    continue until end of array
the minimum of a,b,c is our answer

E, per trovare il Kth più grande elemento abbiamo bisogno delle variabili K.

Esempio: (k = 3)

[1,2,4,1,7,3,9,5,6,2,9,8]

Final variable values:

a=7 (answer)
b=8
c=9

Qualcuno può rivedere questo e farmi sapere cosa mi sto perdendo?

Ecco l'implementazione dell'algoritmo suggerito da eladv (metto anche qui l'implementazione con pivot casuale):

public class Median {

    public static void main(String[] s) {

        int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16};
        System.out.println(selectK(test,8));

        /*
        int n = 100000000;
        int[] test = new int[n];
        for(int i=0; i<test.length; i++)
            test[i] = (int)(Math.random()*test.length);

        long start = System.currentTimeMillis();
        random_selectK(test, test.length/2);
        long end = System.currentTimeMillis();
        System.out.println(end - start);
        */
    }

    public static int random_selectK(int[] a, int k) {
        if(a.length <= 1)
            return a[0];

        int r = (int)(Math.random() * a.length);
        int p = a[r];

        int small = 0, equal = 0, big = 0;
        for(int i=0; i<a.length; i++) {
            if(a[i] < p) small++;
            else if(a[i] == p) equal++;
            else if(a[i] > p) big++;
        }

        if(k <= small) {
            int[] temp = new int[small];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] < p)
                    temp[j++] = a[i];
            return random_selectK(temp, k);
        }

        else if (k <= small+equal)
            return p;

        else {
            int[] temp = new int[big];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] > p)
                    temp[j++] = a[i];
            return random_selectK(temp,k-small-equal);
        }
    }

    public static int selectK(int[] a, int k) {
        if(a.length <= 5) {
            Arrays.sort(a);
            return a[k-1];
        }

        int p = median_of_medians(a);

        int small = 0, equal = 0, big = 0;
        for(int i=0; i<a.length; i++) {
            if(a[i] < p) small++;
            else if(a[i] == p) equal++;
            else if(a[i] > p) big++;
        }

        if(k <= small) {
            int[] temp = new int[small];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] < p)
                    temp[j++] = a[i];
            return selectK(temp, k);
        }

        else if (k <= small+equal)
            return p;

        else {
            int[] temp = new int[big];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] > p)
                    temp[j++] = a[i];
            return selectK(temp,k-small-equal);
        }
    }

    private static int median_of_medians(int[] a) {
        int[] b = new int[a.length/5];
        int[] temp = new int[5];
        for(int i=0; i<b.length; i++) {
            for(int j=0; j<5; j++)
                temp[j] = a[5*i + j];
            Arrays.sort(temp);
            b[i] = temp[2];
        }

        return selectK(b, b.length/2 + 1);
    }
}

è simile alla strategia quickSort, in cui scegliamo un perno arbitrario e portiamo gli elementi più piccoli alla sua sinistra e quelli più grandi a destra

    public static int kthElInUnsortedList(List<int> list, int k)
    {
        if (list.Count == 1)
            return list[0];

        List<int> left = new List<int>();
        List<int> right = new List<int>();

        int pivotIndex = list.Count / 2;
        int pivot = list[pivotIndex]; //arbitrary

        for (int i = 0; i < list.Count && i != pivotIndex; i++)
        {
            int currentEl = list[i];
            if (currentEl < pivot)
                left.Add(currentEl);
            else
                right.Add(currentEl);
        }

        if (k == left.Count + 1)
            return pivot;

        if (left.Count < k)
            return kthElInUnsortedList(right, k - left.Count - 1);
        else
            return kthElInUnsortedList(left, k);
    }

Vai alla fine di questo link: ...........

http: / /www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/

Puoi trovare il kth elemento più piccolo nel tempo O (n) e nello spazio costante. Se consideriamo l'array è solo per numeri interi.

L'approccio consiste nell'eseguire una ricerca binaria sull'intervallo dei valori dell'array. Se abbiamo un valore minimo e un valore massimo entrambi nell'intervallo intero, possiamo fare una ricerca binaria su quell'intervallo. Possiamo scrivere una funzione di confronto che ci dirà se un valore è il kth-piccolo o più piccolo del kth-più piccolo o più grande del kth-più piccolo. Esegui la ricerca binaria fino a raggiungere il numero più piccolo kth

Ecco il codice per quello

Soluzione di classe:

def _iskthsmallest(self, A, val, k):
    less_count, equal_count = 0, 0
    for i in range(len(A)):
        if A[i] == val: equal_count += 1
        if A[i] < val: less_count += 1

    if less_count >= k: return 1
    if less_count + equal_count < k: return -1
    return 0

def kthsmallest_binary(self, A, min_val, max_val, k):
    if min_val == max_val:
        return min_val
    mid = (min_val + max_val)/2
    iskthsmallest = self._iskthsmallest(A, mid, k)
    if iskthsmallest == 0: return mid
    if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k)
    return self.kthsmallest_binary(A, mid+1, max_val, k)

# @param A : tuple of integers
# @param B : integer
# @return an integer
def kthsmallest(self, A, k):
    if not A: return 0
    if k > len(A): return 0
    min_val, max_val = min(A), max(A)
    return self.kthsmallest_binary(A, min_val, max_val, k)

Esiste anche un algoritmo, che supera quello dell'algoritmo di selezione rapida. Si chiama algoritmo Floyd-Rivets (FR) .

Articolo originale: https://doi.org/10.1145/360680.360694

Versione scaricabile: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108 & amp; rep = rep1 & amp; type = pdf

Articolo di Wikipedia https://en.wikipedia.org/wiki/Floyd % E2% 80% 93Rivest_algorithm

Ho cercato di implementare l'algoritmo Quickselect e FR in C ++. Inoltre li ho confrontati con le implementazioni standard della libreria C ++ std :: nth_element (che è fondamentalmente un ibrido introselect di quickselect e heapselect). Il risultato è stato quickselect e nth_element ha funzionato comparativamente in media, ma l'algoritmo FR ha funzionato per ca. due volte più veloce rispetto a loro.

Codice di esempio che ho usato per l'algoritmo FR:

template <typename T>
T FRselect(std::vector<T>& data, const size_t& n)
{
    if (n == 0)
        return *(std::min_element(data.begin(), data.end()));
    else if (n == data.size() - 1)
        return *(std::max_element(data.begin(), data.end()));
    else
        return _FRselect(data, 0, data.size() - 1, n);
}

template <typename T>
T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n)
{
    size_t leftIdx = left;
    size_t rightIdx = right;

    while (rightIdx > leftIdx)
    {
        if (rightIdx - leftIdx > 600)
        {
            size_t range = rightIdx - leftIdx + 1;
            long long i = n - (long long)leftIdx + 1;
            long long z = log(range);
            long long s = 0.5 * exp(2 * z / 3);
            long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2);

            size_t newLeft = fmax(leftIdx, n - i * s / range + sd);
            size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd);

            _FRselect(data, newLeft, newRight, n);
        }
        T t = data[n];
        size_t i = leftIdx;
        size_t j = rightIdx;
        // arrange pivot and right index
        std::swap(data[leftIdx], data[n]);
        if (data[rightIdx] > t)
            std::swap(data[rightIdx], data[leftIdx]);

        while (i < j)
        {
            std::swap(data[i], data[j]);
            ++i; --j;
            while (data[i] < t) ++i;
            while (data[j] > t) --j;
        }

        if (data[leftIdx] == t)
            std::swap(data[leftIdx], data[j]);
        else
        {
            ++j;
            std::swap(data[j], data[rightIdx]);
        }
        // adjust left and right towards the boundaries of the subset
        // containing the (k - left + 1)th smallest element
        if (j <= n)
            leftIdx = j + 1;
        if (n <= j)
            rightIdx = j - 1;
    }

    return data[leftIdx];
}

template <typename T>
int sgn(T val) {
    return (T(0) < val) - (val < T(0));
}

Quello che vorrei fare è questo:

initialize empty doubly linked list l
for each element e in array
    if e larger than head(l)
        make e the new head of l
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element

Puoi semplicemente memorizzare i puntatori sul primo e sull'ultimo elemento nell'elenco collegato. Cambiano solo quando vengono effettuati aggiornamenti all'elenco.

Aggiornamento:

initialize empty sorted tree l
for each element e in array
    if e between head(l) and tail(l)
        insert e into l // O(log k)
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element

Per prima cosa possiamo costruire un BST dall'array non ordinato che impiega O (n) tempo e dal BST possiamo trovare il kth elemento più piccolo in O (log (n)) che su tutto conta per un ordine di O (n) .

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow