Вопрос

I did look around but I couldn't find an answer which had what I needed. Apologies in advance as I am currently teaching myself regex (using VB in excel) and I think I am having a syntax problem.

What I want:
To find all 5 digit numbers in a text document, to associate them with a date and to print them to an excel spreadsheet.

What I am getting:
A single instance of that number, per set, to each date.

What I think is wrong:
My regex pattern definition. I want to find a 5 digit number that can have a comma or space following the number.

oRegEx.Pattern = "\d{5}(?=([\s]*)|[,])"

I'm pretty confident that is the issue here and I'm also sure it is syntactical in nature, but I am so new to this I don't know what I am doing wrong. I've posted my entire code below.

Public Sub ParseMail()
    Dim i As Integer
    Dim x As Integer

    Dim oFSO As Scripting.FileSystemObject
    Dim oFile As Scripting.TextStream
    Dim sHeaderDate As String
    Dim sIDList As String
    Dim sTemp As String
    Dim oRegEx As VBScript_RegExp_55.RegExp
    Dim oMatches As Object

    Set oFSO = New Scripting.FileSystemObject
    Set oFile = oFSO.OpenTextFile("C:\Users\source doc.txt", ForReading) 'Open the exported file. Change path as needed.
    Set oRegEx = New VBScript_RegExp_55.RegExp 'Instantiate RegEx object

    oRegEx.IgnoreCase = True
    oRegEx.Pattern = "\d{5}(?=([\s]*)|[,])" 'Regular expression to identify 5 digit numbers... not working well."


    i = 1 ' init variable to 1. This is the first row to start writing in spreadsheet.

    Do While Not oFile.AtEndOfStream ' Read the file until it reaches the end.
        sTemp = oFile.ReadLine 'Get the first line
        'Debug.Print sTemp
        If Left(sTemp, 5) = "Sent:" Then 'Look for the date in the header.
            sHeaderDate = Mid(sTemp, 7) 'set this variable starting at pos 7 of this line.
            'Debug.Print sHeaderDate
        Else
            'This is not the date header so start checking for IDs.
            Set oMatches = oRegEx.Execute(sTemp)
            If Not oMatches Is Nothing Then 'Find anything?
              If oMatches.Count > 0 Then
                        For x = 0 To oMatches.Count - 1 'walk thru all found values and write to active spreadsheet.
                            ActiveSheet.Cells(i, 1).Value = sHeaderDate
                            ActiveSheet.Cells(i, 2).Value = oMatches(x)
                            i = i + 1

                        Next
                    End If
                End If

            End If
    Loop

    oFile.Close

    Set oFile = Nothing
    Set oFSO = Nothing
    Set oRegEx = Nothing

End Sub
Это было полезно?

Решение

For a regex that matches five digits followed by either a space or a comma, try:

\d{5}(?=[ ,])

or if you really want any whitespace character:

\d{5}(?=[\s,])

Note the space in the lookahead. \s, which you used, will match any whitespace character, but those include more than just the space.

In your regex, you use

(?=([\s]*)|[,])

So first you lookahead for a whitespace character that occurs zero or more times -- since the character may frequently occur zero times, you may not be matching what you expect.

With regard to your code:

oRegEx.IgnoreCase = True

is irrelevant, but you need to add

oRegEx.Global = True

in order to collect all the matches.

Другие советы

your regex to find all 5 digit numbers (and 5 digits only) would be

oRegEx.Pattern = "\b\d{5}\b"

\b is a word boundary and \d{5} matches 5 digits

you can test this out here

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top