One of the objectives of writing code is to make it maintainable. Making it maintainable involves making it easily read and understood by those who follow along when taking care of that code.
Regular expressions are often a maintenance nightmare, and in my experience can often be reduced in their complexity, or replaced entirely, to come up with code that is just as useful. Parsing this sort of text is a great example of when to not use a complex pattern.
I'd do it this way:
str1 = <<eos
Burp
FirstName: Al Bundy
Ref person:
Some address: loststreet 4
Some other address: loststreet 4
Zip code:
eos
def get_value(s)
_, value = s.split(':')
value.strip if value
end
rows = str1.split("\n")
firstname = get_value(rows[1]) # => "Al Bundy"
ref_person = get_value(rows[2]) # => nil
some_address = get_value(rows[3]) # => "loststreet 4"
some_other_address = get_value(rows[4]) # => "loststreet 4"
zip_code = get_value(rows[5]) # => nil
Split the text into rows, and pick out the data needed.
That can be reduced using map
into something more succinct:
firstname, ref_person, some_address, some_other_address, zip_code = rows[1..-1].map{ |s| get_value(s) }
firstname # => "Al Bundy"
ref_person # => nil
some_address # => "loststreet 4"
some_other_address # => "loststreet 4"
zip_code # => nil
If you absolutely have to have a regex, just to have a regex, then simplify it and isolate its task. While it's possible to write a regex that can span multiple lines, skipping and capturing text as it goes, getting there is painful and it'll become more and more fragile as it grows and will likely break if the incoming text changes. By reducing its complexity you're more likely to avoid fragility and will make your code more robust:
def get_value(s)
s[/^([^:]+):(.*)/]
name, value = $1, $2
value.strip! if value
[name.downcase.tr(' ', '_'), value]
end
data_hash = Hash[
str1.split("\n").select{ |s| s[':'] }.map{ |s| get_value(s) }
]
data_hash # => {"firstname"=>"Al Bundy", "ref_person"=>"", "some_address"=>"loststreet 4", "some_other_address"=>"loststreet 4", "zip_code"=>""}