Вопрос

I'm putting together a VBA script that compares two columns of data (about 15,000 rows each) and determines if any cell is a permutation/ of another cell.

For example, if A1 = 15091 and B52 = 19510 then the function would identify them as having the same set of characters.

I have a loop set up that checks each individual cell in column A against every other cell in column B and various functions within the loop, but have been unsuccessful thus far in anything that will accomplish this task.

In addition, the problem gets compounded by the fact that "number" formatted cells will drop all zeroes after the decimal so that 15091.1 will not be identified as the same set of characters as 15091.01.

Это было полезно?

Решение

You can do this without VBA, using a pure Excel approach. (Though find the VBA solution below) The idea is to build a kind of "hash-value" for each value that is the same for each permutation of a set of digits - without overlapping with other hashes.

One would to do so is to:

  1. Count the number of each digit 0-9 (e.g. 15091 and 19510 would be 1x0, 2x1, 1x5 and 1x9)
  2. Multiply each count with 10^digit(e.g. 1*10^0=1, 2*10^1=20, 1*10^5=100000, 1x10^9=1000000000)
  3. Sum these products, (e.g. 1000100021)

Then, all you need to do is to match these hashes against each other (using Excel's MATCH function) and see if something is found (using the ISERROR function).

Step by step instruction for Excel (assuming that your data is in Sheet1 and Sheet2, column A, starting in A1:

  1. In Sheet1:
  2. Insert two rows on top
  3. In B3, place this formula =TEXT(A3,"0") - this will get rid of the remainder each number and convert it to a text. Copy the formula down till the end of your range
  4. In C1:L1, place the number 0, 1, 2, ...
  5. In C2:L2, place the formula =10^C1
  6. In C3, place this formula: =LEN($B3)-LEN(SUBSTITUTE($B3,C$1,"")) - and copy it to the right till column L and down till the end of your list. This will count the number of digits
  7. In M3, place this formula: =SUMPRODUCT(C3:L3,$C$2:$L$2) - this will calculate the hash
  8. Repeat steps 2-7 in Sheet2
  9. In Sheet1, place this formula in N3: =NOT(ISERROR(MATCH(M3,Sheet2!$M:$M,0)))

Done!

Here's a VBA solution:

Option Explicit

Sub IdentifyMatches()
    Dim rngKeys As Range, rngToMatch As Range, rngCell As Range
    Dim dicHashes As Object
    'the range you want to have highlighted in case of a match
    Set rngKeys = Sheets("Sheet1").Range("A3:A5")

    'the range to search for matches
    Set rngToMatch = Sheets("Sheet2").Range("A3:A5")

    Set dicHashes = CreateObject("Scripting.Dictionary")

    'Create dictionary of hashes (dictionary is used for its .Exists property
    For Each rngCell In rngToMatch
        dicHashes(GetHash(rngCell)) = True
    Next

    'Check each cell in rngKey if it has a match
    For Each rngCell In rngKeys
        If dicHashes.Exists(GetHash(rngCell)) Then
            'Action to take in case of a match
            rngCell.Font.Bold = True
            Debug.Print rngCell.Value & " has a match!"
        Else
            rngCell.Font.Bold = False
        End If
    Next

End Sub


Function GetHash(rngValue As Range) As Long
    Dim strValue As String
    Dim i As Integer, digit As Integer
    Dim result As Long
    Dim digits(0 To 9) As Integer

    'Potentially add error check here
    strValue = Format(rngValue.Value, "0")

    For i = 1 To Len(strValue)
        digit = Int(Mid(strValue, i, 1))
        digits(digit) = digits(digit) + 1
    Next i

    For i = 0 To 9
        result = result + 10 ^ i * digits(i)
    Next i

    GetHash = result
End Function

Last but not least, here's the example file.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top