문제

A source provides a stream of items $x_1, x_2,\dots$ . At each step $n$ we want to save a random sample $S_n \subseteq \{ (x_i, i)|1 \le i \le n\}$ of size $k$, i.e. $S_n$ should be a uniformly chosen sample from all $\tbinom{n}{k}$ possible samples consisting of seen items. So at each step $n \ge k$ we must decide whether to add the next item to $S$ or not. If so we must also decide which of the current items to remove from $S$ .

State an algorithm for the problem. Prove its correctness.

도움이 되었습니까?

해결책

Due to the dubious nature of the question, I only provide hints.

Have you tried the obvious? With probability $\frac{1}{n}$, add the new element to the sample. If it is added, choose one of the elements already in the sample uniformly at random and drop it. Sounds about fair, does it not?

For a proof, you will have to proceed inductively. In the step, you assume that $S_{n-1}$ is indeed a uniform sample. From this and the way you choose whether to include $x_n$ and which element to drop, you have to get that $S_n$ is a uniform sample, too.

Try if you can prove above idea correct. If it is not, find out where the problem is and fix it. See this answer to a similar question for a detailed application of this technique.

다른 팁

The best algorithm for your problem is Reservoir Sampling Algorithm. Read this

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 cs.stackexchange
scroll top