Threads to multi search file

https://stackoverflow.com/questions/19888083

30-07-2022
|

Pregunta

I open the file:

FILE *fp = fopen("test.txt", "r");

file looks like this:

cookie    
monster 
test
...

I create 10 threads pthread_create

Each thread will look for a string inside the file. I want to break the file into sections.

Problem

I don't want to read the whole file into memory because the .txt file will be huge. So my approach is to seek. So I would give each thread an amount of lines to search for.

So Thread1 gets lines 1 to 50, Thread2 gets lines 51 to 101, Thread3 gets lines 102 to 152.

How do I say seek to line 51 then to line 102, etc.?

fseek (fp, 51, SEEK_SET); // does not do that

Solución

Since each line is presumably not the same length, you cannot know its position in the file without reading all the previous lines first. So you need to read through the file while counting the linefeeds to know the byte offset of each line – if you decide to do this, do it before spawning the threads, otherwise they will duplicate this work.

Of course you could change the behaviour so that you divide the file by bytes instead of by lines, and seek to a given byte offset, and then skip from there to the next newline (and allow the thread with the preceeding byte range to exceed its range up to the end of line to make sure the split line is covered).

Also, you need to give each thread its own handle on the file, otherwise actions in one thread will affect the file position in other threads.

Otros consejos

Each thread need its own handle to the file. Start 10 threads, fopen in each thread, fseek in each thread according to thread id and stride.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow