Question

My example string is listed here. i want to split every value result in array or hash to process value of each element.

<div id="test">
           accno:          123232323 <br>
           id:            5443534534534 <br>
           name:            test_name <br>
           url:                  www.google.com <br>

 </div>

How can i fetch each values in a hash or array.

Was it helpful?

Solution

With regex it's easy:

s = '<div id="test">
           accno:          123232323 <br>
           id:            5443534534534 <br>
           name:            test_name <br>
           url:                  www.google.com <br>

 </div>'

 p s.scan(/\s+(.*?)\:\s+(.*?)<br>/).map.with_object({}) { |i, h| h[i[0].to_sym] = i[1].strip }

Or you can precise you keys (accno, id, name, url) like ([a-z]+) if they contains only lower case letters:

 p s.scan(/\s+([a-z]+)\:\s+(.*?)<br>/).map.with_object({}) { |i, h| h[i[0].to_sym] = i[1].strip }

Result:

 {:accno=>"123232323", :id=>"5443534534534", :name=>"test_name", :url=>"www.google.com"}

Update

in case of:

<div id="test"> accno: 123232323 id: 5443534534534 name: test_name url: www.google.com </div>

regex will be:

 /([a-z]+)\:\s*(.*?)\s+/

([a-z]+) - this is hash key, and it could contains - or _, then just add it like: ([a-z]+\-_). This scheme presume that after key follows : (perhaps with space) and then some text until the space. Or (\s+|<) at the end if line ends without space: url: www.google.com</div>

OTHER TIPS

If you are processing html, use a html/xml parser like nokogiri to pull out the text content of the required <div> tag using a CSS selector. Then parse the text into fields.

To install nokogiri:

gem install nokogiri

Then process the page and text:

require "nokogiri"
require "open-uri"

# re matches: spaces (word) colon spaces (anything) space
re_fields  = /\s+(?<field>\w+):\s+(?<data>.*?)\s/

# Somewhere to store the results
record = {}

page      = Nokogiri::HTML( open("http://example.com/divtest.html") )

# Select the text from <div id=test> and scan into fields with the regex 
page.css( "div#test" ).text.scan( re_fields ){ |field, data|
    record[ field ] = data
}
p record

Results in:

{"accno"=>"123232323", "id"=>"5443534534534", "name"=>"test_name", "url"=>"www.google.com"}

The page.css( "blah" ) selector can also be accessed as an array if you are processing multiple elements, which can be looped through with .each

# Somewhere to store the results
records    = []

# Select the text from <div id=test> and scan into fields with the regex 
page.css( "div#test" ).each{ |div| 
    record = {}
    div.text.scan( re_fields ){ |field, data|
        record[field] = data
    }
    records.push record
}
p records
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top