Question

For example, I have a small function that returns a string in between two other strings (think in between single quotes, double quotes or even a simple html tag).

        Dim exp As String = String.Format("{0}(.*?){1}", beginMarker, endMarker)

Now, if I pass "<b>" in for the beginMarker and "</b>" in for the end marker and I don't specify RegEx.Ignore case it returns correctly for the matching lower case <b></b>. Once I specify IgnoreCase however, it never returns (assuming the same input). Here's an example function (remove RegexOptions.IgnoreCase and it works). Also, whether I escape the markers being inputed it doesn't seem to change the output, the only difference is the IgnoreCase:

My question is, what am I missing (I used a simple example because I'm not actually parsing HTML with attributes)?

Input: beginMarker = "<b>"
Input: endMarker = "</b>"
Input: searchText = "<b>this is a test</b>"
Input: beginMakers (doesn't matter, True or False)

Public Shared Function GetStringInBetween(beginMarker As String, endMarker As String, searchText As String, includeMarkers As Boolean) As List(Of String)
    beginMarker = RegularExpressions.Regex.Escape(beginMarker)
    endMarker = RegularExpressions.Regex.Escape(endMarker)
    Dim exp As String = String.Format("{0}(.*?){1}", beginMarker, endMarker)
    Dim regEx As New RegularExpressions.Regex(exp)
    Dim returnList As New List(Of String)

    For Each m As Match In regEx.Matches(searchText, 0, RegexOptions.IgnoreCase)
        If includeMarkers = True Then
            returnList.Add(m.Value)
        Else
            returnList.Add(m.Value.TrimStart(beginMarker.ToCharArray).TrimEnd(endMarker.ToCharArray))
        End If
    Next

    Return returnList
End Function
Was it helpful?

Solution

I wouldn't use a .NET class name for the name of a variable as things could get confusing.

This works, and I changed out the Trim functions so that case is ignored:

Imports System.Text.RegularExpressions

Module Module1

    Public Function GetStringInBetween(beginMarker As String, endMarker As String, searchText As String, includeMarkers As Boolean) As List(Of String)
        Dim exp As String = String.Format("{0}(.*?){1}", Regex.Escape(beginMarker), Regex.Escape(endMarker))
        Dim returnList As New List(Of String)

        For Each m As Match In Regex.Matches(searchText, exp, RegexOptions.IgnoreCase)
            If includeMarkers Then
                returnList.Add(m.Value)
            Else
                ' return the portion of the matched string without the markers
                returnList.Add(m.Value.Substring(beginMarker.Length, m.Value.Length - beginMarker.Length - endMarker.Length))
            End If
        Next

        Return returnList

    End Function

    Sub Main()
        ' include a \ to confirm the regex escaping 
        ' outputs: "hello, again"
        Console.WriteLine(String.Join(", ", GetStringInBetween("<x>", "</\x>", "<X>hello</\x> world <x>again</\x>", False).ToArray))
        Console.ReadLine()
    End Sub

End Module

Edit: Oh yeah, use Option Strict On too. And there is no overload of RegEx.Matches that takes (String, Int32, String) as parameters.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top