How Do I Find All Acronyms in an MS Word Document Using a Macro?
Question
I have a document with many acronyms that need to be captured and put into an acronyms table at the end of the document.
The term acronym has various meanings. I'd like to create a table that has all of the words that are initialized; two or more capitalized letters that are short for a longer meaning. I.e., CD-ROM, USB, SYNC, MMR, ASCAP, etc.
How do I create a macro to do this?
Solution
Something like this might get you started. Add a reference to "Microsoft VBScript Regular Expressions" (Edit Macro: Tools > References). This library is the file, "vbscript.dll".
You may need to adjust the regexp if all your acronyms aren't only upper-case letters (eg some may contain numbers).
Sub Acronyms()
Dim dict, k, tmp
Dim regEx, Match, Matches
Dim rngRange As Range
Set regEx = New RegExp
Set dict = CreateObject("scripting.dictionary")
regEx.Pattern = "[A-Z]{2,}" '2 or more upper-case letters
regEx.IgnoreCase = False
regEx.Global = True
Set Matches = regEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
tmp = Match.Value
If Not dict.Exists(tmp) Then dict.Add tmp, 0
dict(tmp) = dict(tmp) + 1
Next
For Each k In dict.Keys
Debug.Print k, dict(k)
Next k
End Sub
OTHER TIPS
Thanks Tim, your code works great!
If it will be of any use to others, the pattern [A-Z]{1,}([a-z]*|\&|\.*)[A-Z]{1,}
will find more acronyms...
(I do not have permission to post comments, hence adding this as answer)
Edit (still no way to add comments): \b[A-Z]{1,}([a-z*]|\&|\.|\-)[A-Z]{1,}\b
is more robust, but will fail if the last character of the acronym is not capitalized.
I have found the following works well (where some business name acronyms are tolerable). I use this to test data entries in Access, it should also work for a Word document range.
objRegExp.Pattern = "([A-Z]{1,}((\&(?![A-Z]\s[\w]{3})\w*)+|\.\w*)+)|[A-Z]{2,}(?![A-Z]*\s[A-Z]{1}[a-z])"
- J&K =Match
- JK&S =Match
- J.S.S =Match
- JK&S.K =Match
- JSK =Match
- JK =Match
- DKD And Sons =No Match
- J&K Engineering =No Match
- PKF Rogers and Associates =No Match
I use RegExHero to test my expressions
I used the following to find abbreviations in my PhD thesis. They were all in "()".
regEx.Pattern = "\([A-Z]{1,}([a-z]*|\&|\.|\-*)[A-Z]{1,}\)"
You will be running a macro on the main Word document. Open a separate Word document that is blank. This will be used to store discovered the acronyms.
- Press "Record Macro". Choose a unique name, and assign a shortcut key such as CTRL + ALT + A.
- Open the Find dialogue (CTRL + F). Paste the following search text:
<[A-Z]{2,}>.
In the Find dialogue, choose "More" > check the box for "Use Wildcards". Click the Find Next button. - Right-click on the selected text, being careful not to change the highlight. Select Copy from the context menu.
- Navigate to the separate Word document (ALT + TAB, select the Word document). Paste the copied text, and hit Enter. ALT + TAB back to the original Word document.
- Close the find dialogue and click the right arrow once. This moves the cursor off the highlighted text, and readies it for the next search.
- Stop the macro recording.
You now have a macro that finds a word containing two or more capitalized letters, and saves the text to a separate document. In order to search for the remaining acronyms, press CTRL + ALT + A continuously until the end of the document has been reached. Or, edit the macro, and add while a loop.
Here is what the macro looks like (without the loop):
Sub GetAcronyms()
Selection.Find.ClearFormatting
With Selection.Find
.Text = "<[A-Z]{2,}>"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute
Selection.Copy
Windows("Document1.docx").Activate
Selection.PasteAndFormat (wdPasteDefault)
Selection.TypeParagraph
Windows("TheOriginalDocument.docx").Activate
Selection.MoveRight Unit:=wdCharacter, Count:=1
End Sub