Pregunta

what is the simplest way of reading a random block of characters from a text file using bash?

A block is a set of characters which begin with X and end with X, where X is a character sequence, usually it will be "\n\n"

We can assume that file has short lines, less than 200 characters each. Blocks don't have more than 20 lines.

I have seen threads like get random line, get text from between two tokens, but it's not exacly what I need.

I can write a simple program in C that will read how many blocks are in file, get a random number from a given range and then search for a block with this ID, but there must be an easier way.

Example: X = "\n\n"

File: (the .'s are not in the file, I used them to make "empty" line at the begginning and end of code)

.
first line
second line and some other text

fourth line

sixth line
seventh line, more textęęę
.

Running the script for first time, output:

fourth line

Running the script for the second time, output:

first line
second line and some other text

Yours faithfully, user2420535

¿Fue útil?

Solución

To get a uniformly random block from a file of blank-line-separated blocks in one pass,

awk -v RS='\n\n'  '
  BEGIN { srand(); } 
  rand() < 1.0/NR { s=$0; }  
  END { print s; }
  '   file

This is a simple case of Reservoir Sampling.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top