To get a uniformly random block from a file of blank-line-separated blocks in one pass,
awk -v RS='\n\n' '
BEGIN { srand(); }
rand() < 1.0/NR { s=$0; }
END { print s; }
' file
This is a simple case of Reservoir Sampling.
문제
what is the simplest way of reading a random block of characters from a text file using bash?
A block is a set of characters which begin with X
and end with X
, where X
is a character sequence, usually it will be "\n\n"
We can assume that file has short lines, less than 200 characters each. Blocks don't have more than 20 lines.
I have seen threads like get random line, get text from between two tokens, but it's not exacly what I need.
I can write a simple program in C that will read how many blocks are in file, get a random number from a given range and then search for a block with this ID, but there must be an easier way.
Example:
X = "\n\n"
File: (the .
's are not in the file, I used them to make "empty" line at the begginning and end of code)
.
first line
second line and some other text
fourth line
sixth line
seventh line, more textęęę
.
Running the script for first time, output:
fourth line
Running the script for the second time, output:
first line
second line and some other text
Yours faithfully, user2420535
해결책
To get a uniformly random block from a file of blank-line-separated blocks in one pass,
awk -v RS='\n\n' '
BEGIN { srand(); }
rand() < 1.0/NR { s=$0; }
END { print s; }
' file
This is a simple case of Reservoir Sampling.