Question

This question is for a concept check. I have a string 000.00-010.0.0.0 that I'd like to find the closest match to from the list {000.00-012.0.0.0 and 000.00-008.0.0.0} (include with the edit measure a numerical distance measure) I'd like to take '012', '010' and '008' as tokens and measure the distance between these.

The standard approach to string match will look for a change in each char position, sum the changes and return a distance. A modified distance will also measure the ASCII distance between the CHARS - G is farther from E than D.

To measure that '012' is to '010' as '008' is, requires bundling three chars into a token. Can this token be easily measured for edit distance and distance? The problem seems more complicated by the removal of delimiters in the tree database.

My proposed solution I want a reality check on is to convert '012', '010', and '008' into single CHAR ASCII symbols, say ), *, and +, measure the char distance and string edit distance, then on print convert back into '012', '010', and '008'.

Sample string: MER99.C0.00M.14.006.00.060.350

And, there are wildcards:

  • MER99.*.006.00.060.350
  • MER99.C0.00M.??.006.00.060.350

Since the strings are the same length (some need dummy char for length, '00M' is actually 'M') matching is with the Hamming distance.

I do not need help with the match algorithm, the Hamming distance approach, wildcards, or the dummy char, I added this for context to the question. Right now, I treat the token as separate char and get good results, but know they are not as exact as could be if handled as a token. The limiting factor is probably the inconsistency within the coding schema. But, I'd like to have that as the limit and not my algorithm.

Était-ce utile?

La solution

Your strings contains alpha-numerical characters, ie base 36 number. Furthermore, these characters are grouped in 'tokens'. It cannot be stored in a char, but you can store it in an int.

Instead of storing ints in your tree, you can store a pair, where the char tells the type of the value:

  • 0 for a numeric value
  • 1 for *
  • 2 for xxxx? (mask)
  • etc...
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top