Scegliere un elemento casuale da un set

https://stackoverflow.com/questions/124671

02-07-2019
|

Domanda

Come scelgo un elemento casuale da un set? Sono particolarmente interessato a scegliere un elemento casuale da a HashSet o LinkedHashSet, in Java. Anche le soluzioni per altre lingue sono benvenute.

Soluzione

int size = myHashSet.size();
int item = new Random().nextInt(size); // In real life, the Random object should be rather more shared than this
int i = 0;
for(Object obj : myhashSet)
{
    if (i == item)
        return obj;
    i++;
}

Altri suggerimenti

Un po 'correlato Lo sapevi:

Esistono metodi utili in java.util.Collections per mescolare intere raccolte: Collections.shuffle(List<?>) e Collections.shuffle(List<?> list, Random rnd) .

Soluzione rapida per Java usando ArrayList e HashMap: [element - > indice].

Motivazione: avevo bisogno di un insieme di oggetti con RandomAccess proprietà, in particolare per scegliere un oggetto casuale dall'insieme (vedi metodo pollRandom). La navigazione casuale in un albero binario non è accurata: gli alberi non sono perfettamente bilanciati, il che non porterebbe a una distribuzione uniforme.

public class RandomSet<E> extends AbstractSet<E> {

    List<E> dta = new ArrayList<E>();
    Map<E, Integer> idx = new HashMap<E, Integer>();

    public RandomSet() {
    }

    public RandomSet(Collection<E> items) {
        for (E item : items) {
            idx.put(item, dta.size());
            dta.add(item);
        }
    }

    @Override
    public boolean add(E item) {
        if (idx.containsKey(item)) {
            return false;
        }
        idx.put(item, dta.size());
        dta.add(item);
        return true;
    }

    /**
     * Override element at position <code>id</code> with last element.
     * @param id
     */
    public E removeAt(int id) {
        if (id >= dta.size()) {
            return null;
        }
        E res = dta.get(id);
        idx.remove(res);
        E last = dta.remove(dta.size() - 1);
        // skip filling the hole if last is removed
        if (id < dta.size()) {
            idx.put(last, id);
            dta.set(id, last);
        }
        return res;
    }

    @Override
    public boolean remove(Object item) {
        @SuppressWarnings(value = "element-type-mismatch")
        Integer id = idx.get(item);
        if (id == null) {
            return false;
        }
        removeAt(id);
        return true;
    }

    public E get(int i) {
        return dta.get(i);
    }

    public E pollRandom(Random rnd) {
        if (dta.isEmpty()) {
            return null;
        }
        int id = rnd.nextInt(dta.size());
        return removeAt(id);
    }

    @Override
    public int size() {
        return dta.size();
    }

    @Override
    public Iterator<E> iterator() {
        return dta.iterator();
    }
}

Questo è più veloce del ciclo for-each nella risposta accettata:

int index = rand.nextInt(set.size());
Iterator<Object> iter = set.iterator();
for (int i = 0; i < index; i++) {
    iter.next();
}
return iter.next();

Il costrutto for-each chiama Iterator.hasNext() su ogni ciclo, ma da index < set.size(), quel controllo è un sovraccarico non necessario. Ho visto un aumento del 10-20% della velocità, ma YMMV. (Inoltre, questo viene compilato senza dover aggiungere una dichiarazione di ritorno extra.)

Nota che questo codice (e la maggior parte delle altre risposte) può essere applicato a qualsiasi Collezione, non solo a Set. In forma di metodo generico:

public static <E> E choice(Collection<? extends E> coll, Random rand) {
    if (coll.size() == 0) {
        return null; // or throw IAE, if you prefer
    }

    int index = rand.nextInt(coll.size());
    if (coll instanceof List) { // optimization
        return ((List<? extends E>) coll).get(index);
    } else {
        Iterator<? extends E> iter = coll.iterator();
        for (int i = 0; i < index; i++) {
            iter.next();
        }
        return iter.next();
    }
}

Se vuoi farlo in Java, dovresti considerare di copiare gli elementi in una specie di raccolta ad accesso casuale (come una ArrayList). Perché, a meno che il tuo set non sia piccolo, l'accesso all'elemento selezionato sarà costoso (O (n) invece di O (1)). [ed: copy list è anche O (n)]

In alternativa, potresti cercare un'altra implementazione di Set che corrisponda maggiormente alle tue esigenze. Il ListOrderedSet da Commons Collections sembra promettente.

In Java:

Set<Integer> set = new LinkedHashSet<Integer>(3);
set.add(1);
set.add(2);
set.add(3);

Random rand = new Random(System.currentTimeMillis());
int[] setArray = (int[]) set.toArray();
for (int i = 0; i < 10; ++i) {
    System.out.println(setArray[rand.nextInt(set.size())]);
}

List asList = new ArrayList(mySet);
Collections.shuffle(asList);
return asList.get(0);

Soluzione Clojure:

(defn pick-random [set] (let [sq (seq set)] (nth sq (rand-int (count sq)))))

Perl 5

@hash_keys = (keys %hash);
$rand = int(rand(@hash_keys));
print $hash{$hash_keys[$rand]};

Ecco un modo per farlo.

C ++. Questo dovrebbe essere ragionevolmente veloce, in quanto non richiede iterazione sull'intero set o ordinamento. Questo dovrebbe funzionare immediatamente con la maggior parte dei compilatori moderni, supponendo che supportino tr1 . In caso contrario, potrebbe essere necessario utilizzare Boost.

I Boost docs sono utili qui per spiegare questo, anche se non usi Boost.

Il trucco è sfruttare il fatto che i dati sono stati divisi in bucket e identificare rapidamente un bucket scelto in modo casuale (con la probabilità appropriata).

//#include <boost/unordered_set.hpp>  
//using namespace boost;
#include <tr1/unordered_set>
using namespace std::tr1;
#include <iostream>
#include <stdlib.h>
#include <assert.h>
using namespace std;

int main() {
  unordered_set<int> u;
  u.max_load_factor(40);
  for (int i=0; i<40; i++) {
    u.insert(i);
    cout << ' ' << i;
  }
  cout << endl;
  cout << "Number of buckets: " << u.bucket_count() << endl;

  for(size_t b=0; b<u.bucket_count(); b++)
    cout << "Bucket " << b << " has " << u.bucket_size(b) << " elements. " << endl;

  for(size_t i=0; i<20; i++) {
    size_t x = rand() % u.size();
    cout << "we'll quickly get the " << x << "th item in the unordered set. ";
    size_t b;
    for(b=0; b<u.bucket_count(); b++) {
      if(x < u.bucket_size(b)) {
        break;
      } else
        x -= u.bucket_size(b);
    }
    cout << "it'll be in the " << b << "th bucket at offset " << x << ". ";
    unordered_set<int>::const_local_iterator l = u.begin(b);
    while(x>0) {
      l++;
      assert(l!=u.end(b));
      x--;
    }
    cout << "random item is " << *l << ". ";
    cout << endl;
  }
}

La soluzione precedente parla in termini di latenza ma non garantisce la stessa probabilità di selezionare ciascun indice.
Se questo deve essere considerato, prova il campionamento del serbatoio. http://en.wikipedia.org/wiki/Reservoir_sampling .
Collections.shuffle ( ) (come suggerito da pochi) utilizza uno di questi algoritmi.

Dato che hai detto " Sono anche benvenute soluzioni per altre lingue " ;, ecco la versione per Python:

>>> import random
>>> random.choice([1,2,3,4,5,6])
3
>>> random.choice([1,2,3,4,5,6])
4

Non puoi semplicemente ottenere la dimensione / lunghezza dell'insieme / matrice, generare un numero casuale compreso tra 0 e la dimensione / lunghezza, quindi chiamare l'elemento il cui indice corrisponde a quel numero? HashSet ha un metodo .size (), ne sono abbastanza sicuro.

In psuedocode -

function randFromSet(target){
 var targetLength:uint = target.length()
 var randomIndex:uint = random(0,targetLength);
 return target[randomIndex];
}

PHP, assumendo " imposta " è un array:

$foo = array("alpha", "bravo", "charlie");
$index = array_rand($foo);
$val = $foo[$index];

Le funzioni di Mersenne Twister sono migliori ma non esiste un equivalente MT di array_rand in PHP.

Icona ha un tipo di set e un operatore ad elementi casuali, unario < !> quot;? " ;, quindi l'espressione

? set( [1, 2, 3, 4, 5] )

produrrà un numero casuale compreso tra 1 e 5.

Il seed casuale viene inizializzato su 0 quando viene eseguito un programma, quindi per produrre risultati diversi su ogni esecuzione utilizzare randomize()

In C #

        Random random = new Random((int)DateTime.Now.Ticks);

        OrderedDictionary od = new OrderedDictionary();

        od.Add("abc", 1);
        od.Add("def", 2);
        od.Add("ghi", 3);
        od.Add("jkl", 4);


        int randomIndex = random.Next(od.Count);

        Console.WriteLine(od[randomIndex]);

        // Can access via index or key value:
        Console.WriteLine(od[1]);
        Console.WriteLine(od["def"]);

Soluzione Javascript;)

function choose (set) {
    return set[Math.floor(Math.random() * set.length)];
}

var set  = [1, 2, 3, 4], rand = choose (set);

O in alternativa:

Array.prototype.choose = function () {
    return this[Math.floor(Math.random() * this.length)];
};

[1, 2, 3, 4].choose();

In lisp

(defun pick-random (set)
       (nth (random (length set)) set))

In Mathematica:

a = {1, 2, 3, 4, 5}

a[[ ⌈ Length[a] Random[] ⌉ ]]

O, nelle ultime versioni, semplicemente:

RandomChoice[a]

Questo ha ricevuto un voto negativo, forse perché manca di spiegazione, quindi eccone uno:

Random[] genera un float pseudocasuale tra 0 e 1. Questo viene moltiplicato per la lunghezza dell'elenco e quindi la funzione soffitto viene utilizzata per arrotondare al numero intero successivo. Questo indice viene quindi estratto da a.

Poiché la funzionalità della tabella hash viene spesso eseguita con le regole in Mathematica e le regole sono memorizzate in elenchi, è possibile utilizzare:

a = {"Badger" -> 5, "Bird" -> 1, "Fox" -> 3, "Frog" -> 2, "Wolf" -> 4};

Che ne dici di

public static <A> A getRandomElement(Collection<A> c, Random r) {
  return new ArrayList<A>(c).get(r.nextInt(c.size()));
}

Questo è identico alla risposta accettata (Khoth), ma con le variabili size e i non necessarie rimosse.

    int random = new Random().nextInt(myhashSet.size());
    for(Object obj : myhashSet) {
        if (random-- == 0) {
            return obj;
        }
    }

Anche se eliminando le due suddette variabili, la soluzione sopra rimane ancora casuale perché ci affidiamo al casuale (a partire da un indice selezionato casualmente) per diminuire se stesso verso 0 su ogni iterazione.

Sfortunatamente, questo non può essere fatto in modo efficiente (meglio di O (n)) in nessuno dei contenitori di set di librerie standard.

Questo è strano, dal momento che è molto facile aggiungere una funzione di scelta casuale ai set di hash e ai set binari. In un set di hash non sparse, puoi provare voci casuali, fino a quando non ottieni un successo. Per un albero binario, puoi scegliere in modo casuale tra la sottostruttura sinistra o destra, con un massimo di O (log2) passi. Ho implementato una demo di seguito:

import random

class Node:
    def __init__(self, object):
        self.object = object
        self.value = hash(object)
        self.size = 1
        self.a = self.b = None

class RandomSet:
    def __init__(self):
        self.top = None

    def add(self, object):
        """ Add any hashable object to the set.
            Notice: In this simple implementation you shouldn't add two
                    identical items. """
        new = Node(object)
        if not self.top: self.top = new
        else: self._recursiveAdd(self.top, new)
    def _recursiveAdd(self, top, new):
        top.size += 1
        if new.value < top.value:
            if not top.a: top.a = new
            else: self._recursiveAdd(top.a, new)
        else:
            if not top.b: top.b = new
            else: self._recursiveAdd(top.b, new)

    def pickRandom(self):
        """ Pick a random item in O(log2) time.
            Does a maximum of O(log2) calls to random as well. """
        return self._recursivePickRandom(self.top)
    def _recursivePickRandom(self, top):
        r = random.randrange(top.size)
        if r == 0: return top.object
        elif top.a and r <= top.a.size: return self._recursivePickRandom(top.a)
        return self._recursivePickRandom(top.b)

if __name__ == '__main__':
    s = RandomSet()
    for i in [5,3,7,1,4,6,9,2,8,0]:
        s.add(i)

    dists = [0]*10
    for i in xrange(10000):
        dists[s.pickRandom()] += 1
    print dists

Ho ricevuto [995, 975, 971, 995, 1057, 1004, 966, 1052, 984, 1001] come output, quindi la distribuzione è buona.

Ho lottato con lo stesso problema per me stesso e non ho ancora deciso che il guadagno in termini di prestazioni di questa scelta più efficiente valga la pena di usare una collezione basata su Python. Potrei ovviamente perfezionarlo e tradurlo in C, ma oggi è troppo lavoro per me :)

In Java 8:

static <E> E getRandomSetElement(Set<E> set) {
    return set.stream().skip(new Random().nextInt(set.size())).findFirst().orElse(null);
}

PHP, usando MT:

$items_array = array("alpha", "bravo", "charlie");
$last_pos = count($items_array) - 1;
$random_pos = mt_rand(0, $last_pos);
$random_item = $items_array[$random_pos];

Per divertimento ho scritto un RandomHashSet basato sul campionamento del rifiuto. È un po 'confuso, poiché HashMap non ci consente di accedere direttamente alla sua tabella, ma dovrebbe funzionare bene.

Non utilizza memoria aggiuntiva e il tempo di ricerca è O (1) ammortizzato. (Perché java HashTable è denso).

class RandomHashSet<V> extends AbstractSet<V> {
    private Map<Object,V> map = new HashMap<>();
    public boolean add(V v) {
        return map.put(new WrapKey<V>(v),v) == null;
    }
    @Override
    public Iterator<V> iterator() {
        return new Iterator<V>() {
            RandKey key = new RandKey();
            @Override public boolean hasNext() {
                return true;
            }
            @Override public V next() {
                while (true) {
                    key.next();
                    V v = map.get(key);
                    if (v != null)
                        return v;
                }
            }
            @Override public void remove() {
                throw new NotImplementedException();
            }
        };
    }
    @Override
    public int size() {
        return map.size();
    }
    static class WrapKey<V> {
        private V v;
        WrapKey(V v) {
            this.v = v;
        }
        @Override public int hashCode() {
            return v.hashCode();
        }
        @Override public boolean equals(Object o) {
            if (o instanceof RandKey)
                return true;
            return v.equals(o);
        }
    }
    static class RandKey {
        private Random rand = new Random();
        int key = rand.nextInt();
        public void next() {
            key = rand.nextInt();
        }
        @Override public int hashCode() {
            return key;
        }
        @Override public boolean equals(Object o) {
            return true;
        }
    }
}

puoi anche trasferire il set in array use array probabilmente funzionerà su piccola scala vedo che il ciclo for nella risposta più votata è comunque O (n)

Object[] arr = set.toArray();

int v = (int) arr[rnd.nextInt(arr.length)];

Se vuoi davvero scegliere " qualsiasi " oggetto dal Set, senza alcuna garanzia sulla casualità, il più semplice è prendere il primo restituito dall'iteratore.

    Set<Integer> s = ...
    Iterator<Integer> it = s.iterator();
    if(it.hasNext()){
        Integer i = it.next();
        // i is a "random" object from set
    }

Il più semplice con Java 8 è:

outbound.stream().skip(n % outbound.size()).findFirst().get()

dove n è un numero intero casuale. Ovviamente ha prestazioni inferiori rispetto a quelle con for(elem: Col)

Una soluzione generica che utilizza la risposta di Khoth come punto di partenza.

/**
 * @param set a Set in which to look for a random element
 * @param <T> generic type of the Set elements
 * @return a random element in the Set or null if the set is empty
 */
public <T> T randomElement(Set<T> set) {
    int size = set.size();
    int item = random.nextInt(size);
    int i = 0;
    for (T obj : set) {
        if (i == item) {
            return obj;
        }
        i++;
    }
    return null;
}

Se la dimensione impostata non è grande, usando Array questo può essere fatto.

int random;
HashSet someSet;
<Type>[] randData;
random = new Random(System.currentTimeMillis).nextInt(someSet.size());
randData = someSet.toArray();
<Type> sResult = randData[random];

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow