Reading a log file into an array reversed, is it best method when looking for keyword near the bottom?

https://stackoverflow.com/questions/17841256

04-06-2022
|

Question

I am reading from log files which can be anything from a small log file up to 8-10mb of logs. The typical size would probably be 1mb. Now the key thing is that the keyword im looking for is normally near the end of the document, in probably 95% of the cases. Then i extract 1000 characters after the keyword.

If i use this approach:

$lines = explode("\n",$body);
$reversed = array_reverse($lines);
foreach($reversed AS $line) {
// Search for my keyword
}

Would it be more efficent than using:

$pos = stripos($body,$keyword);  
$snippet_pre = substr($body, $pos, 1000);

What i am not sure on is with stripos does it just start searching through the document 1 character at a time so in theory if there is 10,000 characters after the keyword then i wont have to read those into memory, whereas the first option would have to read everything into memory even though it probably only needs the last 100 lines, could i alter it to read 100 lines into memory, then search another 101-200 lines if the first 100 was not successful or is the query so light that it doesnt really matter.

I have a 2nd question and this assumes the reverse_array is the best approach, how would i extract the next 1000 characters after i have found the keyword, here is my woeful attempt

$body = $this_is_the_log_content;

$lines = explode("\n",$body);
$reversed = array_reverse($lines);
foreach($reversed AS $line) {

$pos = stripos($line,$keyword);  
$snippet_pre = substr($line, $pos, 1000);

}

Why i don't think that will work is because each $line might only be a few hundred characters so would the better solution be to explode it every say 2,000 lines and also keep the previous $line as a backup variable so something like this.

$body = $this_is_the_log_content;

$lines = str_split($body, 2000);
$reversed = array_reverse($lines);
$previous_line = $line;
foreach($reversed AS $line) {

$pos = stripos($line,$keyword);  
    if ($pos) {
    $line = $previous_line . ' ' . $line;
    $pos1 = stripos($line,$keyword); 
    $snippet_pre = substr($line, $pos, 1000);
    }

}

Im probably massively over-complicating this?

Solution

I would strongly consider using a tool like grep for this. You can call this command line tool from PHP and use it to search the file for the word you are looking for and do things like give you the byte offset of the matching line, give you a matching line plus trailing context lines, etc.

Here is a link to grep manual. http://unixhelp.ed.ac.uk/CGI/man-cgi?grep

Play with the command a bit on the command line to get it the way you want it, then call it from PHP using exec(), passthru(), or similar depending on how you need to capture/display the content.

Alternatively, you can simply fopen() the file with the pointer at the end and move the file pointer forward in the file using fseek() searching for the string as you move along the way. Once you find you needle, you can then read the file from that offset until you get to the end of file or the number of log entries.

Either of these might be preferable to reading the entire log file into memory and then trying to work with it.

The other thing to consider is whether 1000 characters is meaningful. Typically log files would have lines that vary in length. To me it would seem that you should be more concerned about getting the next X lines from the log file, not the next Y characters. What if a line has 2000 characters, are you saying you only want to get half of it? That may not be meaningful at all.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow