best approach to analyze text in PHP?
-
21-08-2019 - |
Question
I need to analyze a users' post and categorize it. For example: I have to categorize every post as a "buy" post or a "sell" post based on the text - "I'm looking to sell my house" is categorized as "sell". The problem is that often its not so simple - "I'm looking to get rid of my old house" also needs to be categorized as "sell". "I'm looking for a house" becomes "buy". I also would like to categorize these posts based on the item in question - for example, the post above would be categorized as "buy" and as "house".
Can anyone recommend a good approach / good framework / technique when it comes to analyzing and understanding user input? Thanks.
Solution
You're right; it's a hard thing to do.
Yahoo! has a Term Extraction API/Web service you can use. It's a pretty good way to use language analysis on your own text without writing a million lines of code to do it yourself. I haven't used it, so I've no idea how well it works with similar meanings, as your question asks.
OTHER TIPS
What you're talking about is basically a Bayesian filtering problem, also used for spam filtering. See also this talk. It's a reasonably complicated area.