Question

I just had a look at regex and I'm a bit confused. I wrote a program which analyses the "auth.log" file in realtime, line by line. Now I need different informations out of the entries.

if "sshd" in line
    if "Accepted password" in line
        REGEX Query to get the username and ip
    elif "session closed" in line
        REGEX Query to get the username

This are the entries in the log file:

Nov 29 13:20:33 Debian sshd[4043]: Accepted password for patrick from ::1 port 50864 ssh2
Nov 29 13:20:33 Debian sshd[4043]: pam_unix(sshd:session): session opened for user patrick by (uid=0)
Nov 29 13:21:23 Debian sshd[4043]: pam_unix(sshd:session): session closed for user patrick

Which Tool should I choose to do this? re.search?

Was it helpful?

Solution

Since log entries are strongly formatted, you might not need to use a regex:

$ cat t.txt 
Nov 29 13:20:33 Debian sshd[4043]: Accepted password for patrick from ::1 port 50864 ssh2
Nov 29 13:20:33 Debian sshd[4043]: pam_unix(sshd:session): session opened for user patrick by (uid=0)
Nov 29 13:21:23 Debian sshd[4043]: pam_unix(sshd:session): session closed for user patrick
$ cat t.py 
#/usr/bin/env python
for line in open('t.txt'):
    if "sshd" in line:
        if "Accepted password" in line:
            print "User: ", line.split()[8]
            print "IP: ", line.split()[10]
        if "session closed" in line:
            print "User: ", line.split()[10]
$ python t.py 
User:  patrick
IP:  ::1
User:  patrick

Of course you need to be more careful with lines like if "sshd" in line: but you get the idea.

OTHER TIPS

Here is how I find IPv6 and IPv4 addresses:

import re
ip6 =   '''(?:(?x)(?:(?:[0-9a-f]{1,4}:){1,1}(?::[0-9a-f]{1,4}){1,6})|
(?:(?:[0-9a-f]{1,4}:){1,2}(?::[0-9a-f]{1,4}){1,5})|
(?:(?:[0-9a-f]{1,4}:){1,3}(?::[0-9a-f]{1,4}){1,4})|
(?:(?:[0-9a-f]{1,4}:){1,4}(?::[0-9a-f]{1,4}){1,3})|
(?:(?:[0-9a-f]{1,4}:){1,5}(?::[0-9a-f]{1,4}){1,2})|
(?:(?:[0-9a-f]{1,4}:){1,6}(?::[0-9a-f]{1,4}){1,1})|
(?:(?:(?:[0-9a-f]{1,4}:){1,7}|:):)|
(?::(?::[0-9a-f]{1,4}){1,7})|
(?:(?:(?:(?:[0-9a-f]{1,4}:){6})(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}))|
(?:(?:(?:[0-9a-f]{1,4}:){5}[0-9a-f]{1,4}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}))|
(?:(?:[0-9a-f]{1,4}:){5}:[0-9a-f]{1,4}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?:(?:[0-9a-f]{1,4}:){1,1}(?::[0-9a-f]{1,4}){1,4}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?:(?:[0-9a-f]{1,4}:){1,2}(?::[0-9a-f]{1,4}){1,3}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?:(?:[0-9a-f]{1,4}:){1,3}(?::[0-9a-f]{1,4}){1,2}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?:(?:[0-9a-f]{1,4}:){1,4}(?::[0-9a-f]{1,4}){1,1}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?:(?:(?:[0-9a-f]{1,4}:){1,5}|:):(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})|
(?::(?::[0-9a-f]{1,4}){1,5}:(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}))
'''
ip4 =   '(?:[12]?\\d?\\d\\.){3}[12]?\\d?\\d'
ip = re.findall(ip4 + '|' + ip6, "111:111::1 1.1.1.1")

I got the regex for IPv6 from an other website Regular expression that matches valid IPv6 addresses

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top