Question

I have many logs with commands in it. I filtered all logs with "useradd" in them, but now I want to dicard some false positives:

  • ... /etc/default/useradd ...
  • ... .../man8/useradd ...

The problem is that I want to see lines with false positive AND real command in them (see test cases).

I can only use (one or more) python regular expressions as I am using a log analyzer program - so no real python program. These are the expressions I tried:

(!/etc/default/|/man8/)useradd # no match
(?<!/etc/default/|/man8/)useradd # look-behind requires fixed-width pattern
(?<!fault/|/man8/)useradd # works, but that's strange

In answers to other questions the regex was changed so that a lookahead could be used - but I don't see how this is possible here.

[Edit: added some test cases]

## no match
cat /etc/default/useradd 
less /usr/share/man/ja/man8/useradd.8.gz
## match:
useradd evil
/usr/sbin/useradd
cat /etc/default/useradd; useradd evil
cat /etc/default/useradd; /usr/sbin/useradd evil
cat /etc/default/useradd; cd /usr/lib/; ../sbin/useradd evil
Was it helpful?

Solution

You can use a lookahead assertion instead:

^(?!.*(?:/etc/default|/man8)/useradd(?!.*useradd)).*useradd

Explanation:

^               # Start of string
(?!             # Assert that it's impossible to match...
 .*             # any string, followed by...
 (?:            # this non-capturing group containing...
  /etc/default  # either "/etc/default"
 |              # or
  /man8         # "/man8"
 )              # End of group, followed by...
 /useradd       # "/useradd"
 (?!.*useradd)  # UNLESS another "useradd" follows further up ahead.
)               # End of lookahead
.*              # Match anything, then match
useradd         # "useradd"

See it live on regex101.com.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top