Question

Trying to wrap my head around look-ahead and look-behind in regex processing.

Let's assume I have a file listing PIDs and other things. I want to build a regex to match the PID format \d{1,5} but that also excludes a certain PID.

$myself = $$;
@file = `cat $FILE`;
@pids = grep /\d{1,5}(?<!$myself)/, @file;

In this regex I try to combine the digits match with the exclusion using a negative look-behind by using the (?<!TO_EXCLUDE) construct. This doesn't work.

Sample file:

456
789
4567
345
22743
root
bin
sys

Would appreciate if someone could point me in the right direction.

Also would be interested to find out if this negative look-behind would be the most efficient in this scenario.

Was it helpful?

Solution

"Look behind" really looks behind. So, you can check whether a PID is preceded by something, not whether it matches something. If you just want to exclude $$, you can be more straightforward:

@file = `cat $FILE`;
@pids = grep /(\d{1,5})/ && $1 ne $$, @file;

OTHER TIPS

I've upvoted the choroba solution, just wanted to explain why your original approach didn't work.

See, the regex parser is a complicated beast: it suffers from internal struggle of trying to match as many symbols as possible - and trying to match at any cost. And the latter, well, usually wins. )

For example, let's analyze the following:

my $test_line = '22743';
my $pid = '22743';
print 'Matched?', "\n" if $test_line =~ /\d{1,5}(?<!$pid)/;
print $&, "\n";

Why did it print 'Matched', you may ask? Because that's what happened: first the engine tried to consume all the five numbers, then match the next subexpression - and failed (that was the point of negative lookbehind, wasn't it?)

If it was you, you've stopped already - but not the engine! It still feels that dark desire to match no-matter-what! So it takes the next possible quantifier - four instead of five - and now, of course, the lookbehind subexpression is destined to succeed. ) That's quite easy to check by examining what's printed by print $&;

Can it be solved yet within the realm of regular expressions? Yep, with so called atomics:

print 'No match for ya!', "\n" unless $test_line =~ /(?>\d{1,5})(?<!$pid)/;

But that's usually considered a dark magic, I guess. )

And if you are curious how it could be done with regex here are some examples:

/\b\d{1,5}+(?<!\b$pid)/

/\b\d{1,5}\b(?<!\b$pid)/

/\b(?!$pid\b)\d+/

/^(?!$pid$)\d+$/

How's about:

chomp(@file);      # remove newlines that will otherwise mess things up
my @pids = grep /\d{1,5}/, @file;
my %pids = map { $_ => 1 }, @pids;

delete $pids{$$};  # delete one specific pid

@pids = keys %pids;

I.e. funnel the list of PIDs through a hash and delete the own PID. Needs to chomp the lines read from file to match the PID.

I feel pretty sure there's a module on CPAN that handles processes though.

ETA:

If you are reading the values from readdir as you mentioned in comments, something like this might be your best option (untested):

opendir my $dh, "/proc" or die $!;
my @pids;
while ( my $line = readdir $dh ) {     # iterate through directory content
    next unless $line =~ /^\d{1,5}$/;  # skip non-numbers
    next if $line == $$;               # skip own PID
    push @pids, $line;
}

A slightly different way (I try to avoid @file = cat text.txt)

my @pids;
open my $fi, "<", "pids.txt";
while (<$fi>) {
   if (/(\d{1,5})/) {
      push @pids, $1 if $1 ne $$;
   }
}
close $fi;

print join(", ", @pids), "\n";

This is my second post to SO, I hope it's ok offering an alternate method.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top