Question

My friend has a small business where customers order services using email. He receives several emails a day and sorting thru it is becoming cumbersome.

There are about 10 different kind of tasks the customer can request, and for each there are one or two words that specify it. The other info present in the emails is the place where the service is to be delivered, the time, and the involved people's names. The email also contains an ID, a long number with a fairly standard format.

The emails are very unstructured, but all contain the key info above. My question is: what is the best method to sweep thru these emails and extract the key info (such as type of service, place, people's names, the ID etc)?

I thought about some kind of pre-processing, then pass it thru AlchemyAPI and then test the Alchemy output using Neural Networks for each feature (key info). This can be supervised learning as I can do a feedback loop all the time, as once the info is inputted, I can have someone to validate.

Any ideas? Thanks

No correct solution

OTHER TIPS

I guess some parts (ID, task, time) can be captured by a regular expression and dictionary matching. Have a look at GATE's JAPE tool.

It should be fairly easy to assemble a dictionary and then use the lookups for the "task", also you can reuse the available jape rules for date/time and write a new one for the ID (also, a simple regex could be fine).

For matching the location and people's names you should be careful, openCalais and alchemyAPI can give you good results if names and places are used in well defined sentences and will probably make more mistakes with some tabular or weird format. Also you can never be sure you captured the place and person correctly so don't rely on that for processing orders directly.

If you have more information about mails' structure or expected names and places (i.e. you have a "clients" table with all possible names), you would probably want to do your own tagging, otherwise I'd stick to openCalais or alchemyAPI + some regular expressions.

P.S. I assume all mails are in English.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top