Pergunta

I'm not even sure "relevancy" is the most accurate word, so I'll just describe the problem:

I'm building an app that needs to somehow parse product descriptions from a popular website (let's just say it's Amazon) and figure out which certifications the product has based on the text in the description alone. The descriptions for these products are not always written the same way (because they're written by different companies), but do always contain certain keywords that I'm looking for -- and the keywords have to be "close together" in the description in order to be considered for the resultset.

For example, given the following CSV data:

ProductName,ProductDescription
Product1,Product1 is a really cool product that is certified for Certification1 on Region1
Product2,Product2 has Region2 which has Certification3 and Region3 with Certification4. It also has Certification5

I'd want to generate the following output:

{  
   "Product1":{  
      "Region1":"Certification1",
      "UnknownRegions": []
   },
   "Product2":{  
      "Region2":"Certification3",
      "Region3":"Certification4",
      "UnknownRegions":[  
         "Certification5"
      ]
   }
}

I have almost no idea how to solve this problem, other than one thought: can some NLP algorithm help me to achieve the desired output above? If so, which one? I've heard of a technique called Named Entity Extraction but I don't know if it applies here or not.

Any advice is much appreciated here. Thank you in advance!

Foi útil?

Solução

Have a look at the Microsoft LUIS offering

https://azure.microsoft.com/en-us/services/cognitive-services/language-understanding-intelligent-service/

I believe amazon have a similar api offering as well.

These allow you to utilise NLP and other AI systems in your apps without having to program your own. You just call the API and get json back with the sentence broken down into subjects and intents.

here is an example of what you get back from LUIS enter image description here

Licenciado em: CC-BY-SA com atribuição
scroll top