optimal code for detecting formatting differences?
Question
I need to compare two formatted strings. The text in the two of them is the same, only the formatting differs, meaning that some words are bold. The code should tell me if the location of the bold substrings are different e.g. the strings are formatted differently. So far I tried a char-to-char approach, but it is far too slow.
It's a plain legal current text in MS Word, with cca 10-500 chars per string. Two people independently formatted the strings.
my code so far:
Function collectBold(r As Range) As String
Dim chpos As Integer
Dim ch As Variant
Dim str, strTemp As String
chpos = 1
Do
If r.Characters(chpos).Font.Bold Then
Do
Dim boold As Boolean
strTemp = strTemp + r.Characters(chpos)
chpos = chpos + 1
If (chpos < r.Characters.Count) Then boold = r.Characters(chpos).Font.Bold
Loop While (boold And chpos < r.Characters.Count)
str = str + Trim(strTemp) + "/"
strTemp = ""
Else: chpos = chpos + 1
End If
Loop While (chpos < r.Characters.Count)
collectBold = str
End Function
This code collect all bold substrings (strTemp) and merges them into one string (str), separating them with "/". The function runs for both strings to compare, and then checks if the outputs are the same.
Solution
If you only need to see if they are different, this function will do it:
Function areStringsDifferent(range1 As Range, range2 As Range) As Boolean
Dim i As Integer, j As Integer
For i = 1 To range1.Words.Count
'check if words are different formatted
If Not range1.Words(i).Bold = range2.Words(i).Bold Then
areStringsDifferent = True
Exit Function
'words same formatted, but characters may not be
ElseIf range1.Words(i).Bold = wdUndefined Then
For j = 1 To range1.Words(i).Characters.Count
If Not range1.Words(i).Characters(j).Bold = range2.Words(i).Characters(j).Bold Then
areStringsDifferent = True
Exit Function
End If
Next
End If
Next
areStringsDifferent = False
End Function
It first looks if the words are different formatted... If they have the same format but the format is undefinied, it looks into the characters of the word.