Pergunta

Im looking for a good way to find a value from the given list based on a text. Example:

This computer has 16GB ram and with the best processor in it. Case is made from aluminium.

And I have criterion like amount of RAM with possible values:

  • 4GB
  • 6GB
  • 8GB
  • 16GB

I can't do search for value in text because it can find e.g. 6GB and not 16GB.

It would be great too if this could find similar text and match those as well e.g. healed and sealed based on some correctness factor.

I've tried with Sørensen–Dice coefficient it kinda works, but with low correctness factor.

Foi útil?

Solução

I wonder if you could convert the text to ideas, instead. So for example, an algorithm that picks out words. Then, you have your criteria like "RAM", which has it's own search algorithm which goes through the list of words, to identify whether that word is related to it or not. There are probably famous algorithms for that, I wish I could name one though. This allows you to break away from the visual representation of the words, and into a "meta" space where anything could be anything, but it's your choice.

To me, your question reads as "I need to identify face-cards in a deck, but the word KING and the words EIGHT, and NINE, all contain the letter I so what do I do?" .. Well, if you have a criteria for "FACE CARD", and it is either "JACK", "QUEEN", "KING", then it can iterate across the deck, collecting the face cards. You can then apply another criteria. From this, you could have pipelines of criteria, where you would either remove the card from the deck, or leave it there, such as in the case of criteria like "CARDS ABOVE 5", then running "CARDS BELOW 9" right afterwards. If you removed all of the cards above 5, you would only get 1-4 for CARDS BELOW 9 and that would be a fail.

So for criteria "RAM", you could program in "any token/word that ends in MB, GB, that is less than or equal to 1024 for MB, and less than or equal to 1024 for GB". Then you can run another check for the criteria to say "AND followed in sequence by RAMIDENTIFIER".. which RAMIDENTIFIER is also not text (yet), it represents different ways to identify RAM, such as "Memory", "Mem", "ECC", etc.. has it's own logic that is wholely separate so you can chain things together.

By pipeline I mean branches of logic against further criteria at that point. Make sense?

Licenciado em: CC-BY-SA com atribuição
scroll top