Question

This question is for a concept check. I have a string 000.00-010.0.0.0 that I'd like to find the closest match to from the list {000.00-012.0.0.0 and 000.00-008.0.0.0} (include with the edit measure a numerical distance measure) I'd like to take '012', '010' and '008' as tokens and measure the distance between these.

The standard approach to string match will look for a change in each char position, sum the changes and return a distance. A modified distance will also measure the ASCII distance between the CHARS - G is farther from E than D.

To measure that '012' is to '010' as '008' is, requires bundling three chars into a token. Can this token be easily measured for edit distance and distance? The problem seems more complicated by the removal of delimiters in the tree database.

My proposed solution I want a reality check on is to convert '012', '010', and '008' into single CHAR ASCII symbols, say ), *, and +, measure the char distance and string edit distance, then on print convert back into '012', '010', and '008'.

Sample string: MER99.C0.00M.14.006.00.060.350

And, there are wildcards:

  • MER99.*.006.00.060.350
  • MER99.C0.00M.??.006.00.060.350

Since the strings are the same length (some need dummy char for length, '00M' is actually 'M') matching is with the Hamming distance.

I do not need help with the match algorithm, the Hamming distance approach, wildcards, or the dummy char, I added this for context to the question. Right now, I treat the token as separate char and get good results, but know they are not as exact as could be if handled as a token. The limiting factor is probably the inconsistency within the coding schema. But, I'd like to have that as the limit and not my algorithm.

Was it helpful?

Solution

Your strings contains alpha-numerical characters, ie base 36 number. Furthermore, these characters are grouped in 'tokens'. It cannot be stored in a char, but you can store it in an int.

Instead of storing ints in your tree, you can store a pair, where the char tells the type of the value:

  • 0 for a numeric value
  • 1 for *
  • 2 for xxxx? (mask)
  • etc...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top