What would the regex to match the hostname of this string?
Question
I'm trying to match the hostname in this extracted SNMP packet
2012-07-27 12:16:03 SUP-V5-ISA-1 [10.165.26.10] (via UDP: [10.165.26.10]:61151->[0.0.0.0]:0) TRAP, SNMP v1, community public
ISAMANAGER-MIB::isaManager Enterprise Specific Trap (ISAMANAGER-MIB::clipModified) Uptime: 1:22:15.08
ISAMANAGER-MIB::vClipId = INTEGER: 42059
SUP-V5-ISA-1 is the hostname, and unusually in this instance it's not an FQDN (it depends on the system it's coming from)
I'm trying to feed it into splunk, but I can't for the life of me get my head around how I'd choose the 3rd word, and not treat hyphens as word boundaries. I've been able to choose the 3rd 'word' every time, being '27' and SUP, but never to grab the whole 'word'
It always follows a timestamp, and is always followed by an IP in square brackets, but generally doesn't include as many hyphens.
Solution
In Splunk you can transform the host name at index time by extracting the field from your log event.
To do this you would add entrys to 2 files in $SPLUNK_HOME/etc/apps/yourapp/local
Replace yourapp and yoursourcetype to fit your environment.
props.conf
[yoursourcetype]
TRANFORMS-h1=set-host-name
SHOULD_LINEMERGE=false
transforms.conf
[set-host-name]
DEST_KEY = MetaData:Host
REGEX =^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s((?:\w|-)+)\s\[\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\].+$
FORMAT = host::$1
OTHER TIPS
Go with regex ^\S+\s+\S+\s+(\S+)
and use first capture group as result
Use Field-Splitting Instead of Regular Expressions
Regular expressions aren't always the best solution. This problem has a trivial solution if you can split your log record into fields, since it's much easier to get the third field of a record than to create a bullet-proof regular expression.
I'm not familiar with Splunk, but consider the following illustrative examples. Assuming that your log record is stored in the record variable:
# Field-splitting with awk.
hostname=$(echo "$record" | awk '{print $3}')
# Field-splitting with cut.
hostname=$(echo "$record" | cut -d' ' -f3)
In both cases, hostname is set to SUP-V5-ISA-1
by parsing the record into fields, and extracting the value of the third field. That seems like a more reliable approach for this particular use case, assuming your framework allows you to perform comparable operations.