Pergunta

A recent endeavor of mine requires asking users how related two concepts are. Example: Kiwi and Fruit. Identical? Fairly Similar? Rather different? Unrelated?

It occurs to me that for certain combinations of terms (such as Strawberries and Tasty) people may have varying opinions. This, in effect, poses problems for an algorithm that tries to "average" the responses to get a global truth. Instead, it makes sense that for rather divergent answers one would have different niches or "universes of truth" that are self-consistent with the relationship data they provide.

One possible idea is that of basic context: if you live in a house you are likely to think of a chair as something to sit on, but if you live in Asia, floormats and cushions on the ground may satisfy that "idea" so which one is correct? Both are correct for their specific context.

So, given a bunch of relationships (Kiwi is a Fruit, Kiwi is Delicious, Kiwi is Not Delicious) divergent results will emerge. Are there any algorithms or studies that take this sort of divergence of tagging/labeling into account?

Foi útil?

Solução

This is a fairly broad question. It's not clear to me if you're looking for algorithms that find the potential tags, or algorithms that can deal witht the fact that the universe is not black or white.

For the first aspect, you'd certainly have to look on statistical clustering, neural nets algorithms or other ML techniques. But on the second part, there are some older and simpler stuff that could help as well:

  • Fuzzy logic and other multivalued logic are a form of logic where a predicates (e.g. "Kiwi is Delicious", "Kiwi is Not Delicious") can have several truth. It can for example be 60% true and 40% false at the same time. Fuzzy logic deal however with a global view on the truth (in general, according to most people, people are delicious), and not a contextual view.

  • Graph based algorithms allow to keep sets of related knowledge interdependencies together. For example, reason maintenance system, keeps track of the source of the knowledge and facts, and the applied truth propagation algorithms/inference rules/agents , so to be able to work with conflict between different beliefs (e.g. Doyle's Truth Maintenance System tracks the source of each truth it keeps, in order to backtrack when conflicts occurs), or different universes of truth (e.g. De Kleer's ATMS-Assumption based Truth Maintenance System).

Outras dicas

What you're describing is the task of cluster analysis. The goal is to find distinct clusters in data where the elements in the cluster are correlated. The attributes of the data form the "context" that you refer to, and usually cluster analysis is used to classify new data after being trained. For example whether a fruit is considered delicious may correlate to sales and sugar content. Furniture choices may correlate to location.

For example you ask a population of users "what is a kiwi?" Most users respond that a kiwi is a fruit, but there is a correlation in users from New Zealand answering that a kiwi is a fruit, a bird and a people. Once you've trained your clustering algorithm, if you have a new user from outside NZ, you can show the tag "fruit" for kiwi, if you have a user from NZ, you can show "fruit", "bird", and "people" tags.

The more data you have about your entities, the more successful the algorithm may be. For example asking a person if a food is delicious is unlikely to be correlated strongly with anything about that person, but it may correlate to sugar/salt content and ingredients of the food, and cluster analysis can help predict if a new food is delicious or not.

Licenciado em: CC-BY-SA com atribuição
scroll top