Need to extract and re-format with RegEx

Question 1

Theres two ways you can achieve what you're looking to do...

Using search

Extract the fields with rex and use eval to concatenate the values.

| rex field=_raw "Hub:\[(?<Hub>[^\]]*)\]\sComp:\[(?<Comp>[^\]]*)\]" | eval someNewField=Hub."-".Comp

The rex command allows you to run a regular expression against a field, _raw is a special field name that contains the entire event data. The regex itself captures any characters between [ and ] and extracts it to the field named within the <>.

This is the easiest way as you don't need to modify any configuration to do it, but the drawback is that you'll need to add this to your search string to get the values extracted and formatting the way you want.

Using search time extraction with prop.conf and transforms.conf

In transforms.conf, add a transform to extract the fields...

[hubCompExtract]
REGEX = Hub:\[(?<Hub>[^\]]*)\]\sComp:\[(?<Comp>[^\]]*)\]

In props.conf, execute the extract and concatenate the values using an eval...

[yourSourceTypeName]
REPORT-fieldExtract = hubCompExtract
EVAL-yourNewFieldName = Hub."-".Comp

No need to add anything to your search string, but it does require config file changes.

Regex example

gSkinner example (without the capture group names).

Question 2

I'm not familiar with splunk, but I suppose the regexp support named grouping.

To create fully proper regexp I need to couple things

Is it always formatted like Hub:[HHHH] Comp:[CCCC]? Always Hub, single space then Comp?
Is it always 4 characters in IDs?
Is it any letters or numbers, or it could be anything like special char *?
How you receive this IDs? By using some kind of Match function or replace?

This is my regex: Hub:\s*\[(?<Hub>.{4})\]\s+Comp:\s*\[(?<Comp>.{4})\]

And sample in C# (assuming str variable contains line with one record)

var regEx = new Regex(@"Hub:\s*\[(?<Hub>.{4})\]\s+Comp:\s*\[(?<Comp>.{4})\]");

var m = regEx.Match(str);

Console.WriteLine(String.Format("{0}-{1}", m.Groups["Hub"], m.Groups["Comp"]));

Explanation:

If you want to use Match, you don't care about nothing but your IDs, so you don't need to put anything to parenthesis, except IDs. To easy locate them, we use named grouping (?<someName>pattern)

Assuming there are always 4 characters of IDs, we use {4}. Any characters - so .{4}.
If you want to ensure there is only letters and numbers, you can change it to [A-Z0-9]{4}.
If you don't know how many letters/numbers will be, you could change {4} to + - this is the same as {1,} (from 1 to infinity)

When you posting example, you place extra space between colon and bracket, so I place :\s*\[.
This means it could be :[, : [ or any other white space in any repetition.

Assuming that Comp is place just after closing bracket of Hub: \]\s+Comp - one or more white space between them.

FYI: If you planning to use is for replace method add at the beginning and at the end .*, meaning anything else.

var regEx = new Regex(@".*Hub:\s*\[(?<Hub>.{4})\]\s+Comp:\s*\[(?<Comp>.{4})\].*");
Console.WriteLine(regEx.Replace(str, @"${Hub}-${Comp}"));

But using replace method instead of match may cause unpredictable results: when the string has no match with pattern the output string are the same as input. So in cases like this (when extracting some values) use always "Match" method

Question 3

You were close. Try capturing you targets:

Hub:\[([A-Z0-9]{4}) Comp:\[([A-Z0-9]{4})

Then use groups in your output:

$1-$2

Note that I am unfamiliar with splunk, so the syntax for groups may be the backslashes variety, ie \1-\2

Question 4

Have a look at this regex:

(?:Hub|Comp):\[[A-Z0-9]{4}\]

Description

Regular expression visualization

Demo

http://regexr.com?37gkh

There is more

You can match a whole line: ^(.*?)(Hub:\[[A-Z0-9]{4}\])(.*?)(Comp:\[[A-Z0-9]{4}\])(.*?)$. And then replace this line with this: $2-$4. I assume Hub always come before Comp.

Question 5

You can do this (if I understand well):

pattern: Hub:\[([^\]]+)\] Comp:\[([^\]]+)\]
replacement: $1-$2

[^\]] means all characters except ]

The pattern can be shorten to : Hub:\[([^]]+)\] Comp:\[([^]]+)] with regex flavors that don't need to escape closing square brackets.

Your approach doesn't work since you use lookbehinds that are zero-width assertions and don't match anything.

Question 6

Here you go:

Hub:\[([^\]]{4})\] Comp:\[([^\]]{4})\]

Here is the gskinner.com link

In order to format it use the backreferences $1 and $2 like so:

[$1]-[$2]

This works assuming Comp always goes after Hub and that there are only 4 entries in between the brackets.

I was tempted to do the same as Alex, however this brings up three problems:

There is no way to reformat because it has no backreferences to extract the inside of the brackets only.
There is no way to know which is which thus rendering formatting impossible.
Matching is done individually for each component, treating Hub and Comp as different matches and, again, rendering formatting impossible unless you use some other form of processing.

It is a good approach though, less regex is better whenever you can.