Question

I am new to UIMA.

I want to develop an app using UIMA and uimaFIT that can parse any email related to air tickets, such as Confirmation Email, Cancellation Email etc. and extract the valuable information from it like Ticket Number, Flight Number, Departure Time, Arrival Time, Passenger Name etc. How can I achieve this using uimaFIT. Currently I tried to use uimaFIT to just read a String and with Regular Expression tried to extract the information, but it seems too complicated as Email is not structured. Any suggestions of how to connect with the emails and perform parsing without using RegEx.

Any suggestions.

Was it helpful?

Solution

Is your set of types of emails (Confirmation Email, Cancellation Email etc) small enough? If yes, in a first step, try to do a simple classification into types of email. Then in the next steps, you can apply different tools based on the type of email.

For the rest, I think it's best to use regexes, even if it is tedious. You might want to look at UIMA TextMarker to quickly implement your regexes/rules.

  • Ticket Number: regex
  • Flight Number: regex
  • Departure Time, Arrival Time: regex
  • Passenger Name: Person NER (here a uima example) (or match with email To: field?)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top