Question

I'm trying to figure out how to extract dates from unstructured text using Ruby.

For example, I'd like to parse the date out of this string "Applications started after 12:00 A.M. Midnight (EST) February 1, 2010 will not be considered."

Any suggestions?

Was it helpful?

Solution

Assuming you just want dates and not datetimes:

require 'date'
string = "Applications started after 12:00 A.M. Midnight (EST) February 1, 2010 will not be considered."
r = /(January|February|March|April|May|June|July|August|September|October|November|December) (\d+{1,2}), (\d{4})/
if string[r]
  date =Date.parse(string[r])
  puts date
end

OTHER TIPS

Try Chronic (http://chronic.rubyforge.org/) it might be able to parse that otherwise you're going to have to use Date.strptime.

Also you can try a gem that can help find date in string.

Exapmle:

input = 'circa 1960 and full date 07 Jun 1941'
dates_from_string = DatesFromString.new
dates_from_string.get_structure(input)

#=> return
# [{:type=>:year, :value=>"1960", :distance=>4, :key_words=>[]},
# {:type=>:day, :value=>"07", :distance=>1, :key_words=>[]},
# {:type=>:month, :value=>"06", :distance=>1, :key_words=>[]},
# {:type=>:year, :value=>"1941", :distance=>0, :key_words=>[]}]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top