I have not tested it, but I guess you need to remove .* with [^{]*. This way your regex does not eat the next "{".
This looks strange to me: (.*\n)*?
Have a look at DOTALL: If you set this flag the dot eats newlines.
Question
Im having an issue where my regex is matching too much. I've tried making it as non-greedy as possible. My RE is:
define host( |\t)*{(.*\n)*?( |\t)*host_name( |\t)*HOST_B(.*\n)*?( |\t)*}
meaning
"define host" followed by any spaces or tabs followed by "{". Any text and newlines until any number of spaces or tabs followed by "host_name" followed by any number of spaces or tabs followed by "HOST_B". Any text plus newlines until any spaces or tabs followed by "}"
My text is
define host{
field stuff
}
define timeperiod{
sunday 00:00-03:00,07:00-24:00
}
define stuff{
hostgroup_name things
service_description load
dependent_service_description cpu_util
execution_failure_criteria n
notification_failure_criteria w,u,c
}
define host{
use things
host_name HOST_A
0alias stuff
}
define host{
use things
host_name HOST_B
alias ughj
address 1.6.7.6
}
define host{
use things
host_name HOST_C
}
The match is going from the first define to host_b's end bracket. It is not getting host_c's group (it should not get host_c), however I would like only host b's group and not the whole thing.
Any help? My regex is rusty. You can test on http://regexpal.com/
Solution
I have not tested it, but I guess you need to remove .* with [^{]*. This way your regex does not eat the next "{".
This looks strange to me: (.*\n)*?
Have a look at DOTALL: If you set this flag the dot eats newlines.
OTHER TIPS
It's a bit different than what you asked for, but I think you may like the results. This will parse all your structures and load them into python dictionaries. From there, manipulation should be really nice and easy for you.
mDefHost = re.findall(r"\define host{(.*?)\}",a,re.S)
mInHost = re.compile("(\S+)\s+(\S+)")
hostDefs = []
for item in mDefHost:
hostDefs.append( dict(mInHost.findall(item)) )
ex output
>>> m = re.findall(r"define host\{(.*?)\}",a,re.S)
>>> m
['\n use things\n host_name HOST_B\n alias ughj\n address 1.6.7.6\n ']
>>> item = m[0]
>>> item
'\n use things\n host_name HOST_B\n alias ughj\n address 1.6.7.6\n '
>>> results = re.findall("(\S+)\s+(\S+)",item)
>>> results
[('use', 'things'), ('host_name', 'HOST_B'), ('alias', 'ughj'), ('address', '1.6.7.6')]
>>> dict(results)
{'alias': 'ughj', 'use': 'things', 'host_name': 'HOST_B', 'address': '1.6.7.6'}
The problem is that you're using regex to search the entire string, but you're trying to find a substring that starts in a way indistinguishable from the start of the entire string. You can't use non-greedy matching to ensure that your starting point is as late as possible; the non-greedy modifier only affects how far ahead the Regex engine will look to find a match.
What you need is to make sure that you have no closing brackets between your define host
and your HOST_B
. Try this (untested):
define host\s*{[^}]HOST_B.*?}
(Make sure you use a flag to allow .
to match newlines.)