
I did look around but I couldn't find an answer which had what I needed. Apologies in advance as I am currently teaching myself regex (using VB in excel) and I think I am having a syntax problem.

What I want:
To find all 5 digit numbers in a text document, to associate them with a date and to print them to an excel spreadsheet.

What I am getting:
A single instance of that number, per set, to each date.

What I think is wrong:
My regex pattern definition. I want to find a 5 digit number that can have a comma or space following the number.

oRegEx.Pattern = "\d{5}(?=([\s]*)|[,])"

I'm pretty confident that is the issue here and I'm also sure it is syntactical in nature, but I am so new to this I don't know what I am doing wrong. I've posted my entire code below.

Public Sub ParseMail()
    Dim i As Integer
    Dim x As Integer

    Dim oFSO As Scripting.FileSystemObject
    Dim oFile As Scripting.TextStream
    Dim sHeaderDate As String
    Dim sIDList As String
    Dim sTemp As String
    Dim oRegEx As VBScript_RegExp_55.RegExp
    Dim oMatches As Object

    Set oFSO = New Scripting.FileSystemObject
    Set oFile = oFSO.OpenTextFile("C:\Users\source doc.txt", ForReading) 'Open the exported file. Change path as needed.
    Set oRegEx = New VBScript_RegExp_55.RegExp 'Instantiate RegEx object

    oRegEx.IgnoreCase = True
    oRegEx.Pattern = "\d{5}(?=([\s]*)|[,])" 'Regular expression to identify 5 digit numbers... not working well."

    i = 1 ' init variable to 1. This is the first row to start writing in spreadsheet.

    Do While Not oFile.AtEndOfStream ' Read the file until it reaches the end.
        sTemp = oFile.ReadLine 'Get the first line
        'Debug.Print sTemp
        If Left(sTemp, 5) = "Sent:" Then 'Look for the date in the header.
            sHeaderDate = Mid(sTemp, 7) 'set this variable starting at pos 7 of this line.
            'Debug.Print sHeaderDate
            'This is not the date header so start checking for IDs.
            Set oMatches = oRegEx.Execute(sTemp)
            If Not oMatches Is Nothing Then 'Find anything?
              If oMatches.Count > 0 Then
                        For x = 0 To oMatches.Count - 1 'walk thru all found values and write to active spreadsheet.
                            ActiveSheet.Cells(i, 1).Value = sHeaderDate
                            ActiveSheet.Cells(i, 2).Value = oMatches(x)
                            i = i + 1

                    End If
                End If

            End If


    Set oFile = Nothing
    Set oFSO = Nothing
    Set oRegEx = Nothing

End Sub
Это было полезно?


For a regex that matches five digits followed by either a space or a comma, try:

\d{5}(?=[ ,])

or if you really want any whitespace character:


Note the space in the lookahead. \s, which you used, will match any whitespace character, but those include more than just the space.

In your regex, you use


So first you lookahead for a whitespace character that occurs zero or more times -- since the character may frequently occur zero times, you may not be matching what you expect.

With regard to your code:

oRegEx.IgnoreCase = True

is irrelevant, but you need to add

oRegEx.Global = True

in order to collect all the matches.

Другие советы

your regex to find all 5 digit numbers (and 5 digits only) would be

oRegEx.Pattern = "\b\d{5}\b"

\b is a word boundary and \d{5} matches 5 digits

you can test this out here

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top