Question

This question is heavily related to this one, but it has to do with grabbing methods that contain references to global variables (not commented out).

I'm using the following regular expression and test string to check to see if it works, but it's only partially working:

Regular Expression

^((?:(?:Public|Private)\s+)?(?:Function|Sub).+)[\s\S]+?(GLOBAL_VARIABLE_1)[\s\S]+?End\s+(?:Function|Sub)$

(I need part of the regular expression this way with the capturing group so that I can grab the name of the method as a sub-match).

Test String

'-----------------------------------------------------------------------------------------
'
'   the code:   Header
'
'-----------------------------------------------------------------------------------------

Dim GLOBAL_VARIABLE_1
Dim GLOBAL_VARIABLE_2
Dim GLOBAL_VARIABLE_3

Public Function doThis(byVal xml)
'' Created               : dd/mm/yyyy
'' Return                : string
'' Param            : xml- an xml blob

     return = replace(xml, "><", ">" & vbLf & "<")

     GLOBAL_VARIABLE_1 = 2 + 2

     doThis = return

End Function


msgbox GLOBAL_VARIABLE_1



Public Function doThat(byVal xPath)
'' Created               : dd/mm/yyyy
'' Return                : array
' 'Param            : xPath

     return = split(mid(xPath, 2), "/")

     GLOBAL_VARIABLE_2 = 2 + 2


     doThat = return

End Function


GLOBAL_VARIABLE_2 = 2 + 2


Public Sub butDontDoThis()
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj

     For i = 0 To 5
          return = return & "bye" & " "

     Next

End Sub


GLOBAL_VARIABLE_3 = 3 + 3


Public Sub alsoDoThis(byRef obj)
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj, an xml document object

     For i = 0 To 4
          return = return & "hi" & " "

     Next

     GLOBAL_VARIABLE_1 = 1 + 1

End Sub


GLOBAL_VARIABLE_3 = 3 + 3

Using http://www.regexpal.com/, I'm able to highlight the first method that references a global variable. However, the regular expression is not doing what I expect it to do with the other methods. The regular expression is also picking up other methods that don't have references to a specific global variable, and it ends with the last method that is actually using the global variable. I've determined the problem to be that the [\s\S]+?(GLOBAL_VARIABLE_1)[\s\S]+?End\s+(?:Function|Sub)$ part is doing a minimal / non-greedy match so that it keeps looking until it finds an actual match.

In summary the expression should follow these rules:

  • to stop scanning the method it is currently checking when it sees the first end of a method's declaration. In this example, only the doThis and alsoDoThis methods should be matched for GLOBAL_VARIABLE_1, but I'm not sure what the regular expression should be.
  • The regular expression should also only match methods that are actually using global variables
  • If a GLOBAL_VARIABLE_1 is commented out, then it is really not being used by the method. A commented GLOBAL_VARIABLE_1 should not trigger a positive match for the method.
Was it helpful?

Solution

Description

I'd do this in two steps, first identify each of your functions and subs. Here I'm using a reference \1 to ensure we're matching the correct end function or end sub. This regex also grabs the function name and places that into group 2. This can then be used later if part 2 is correct

(?:Public|Private)\s+(Function|Sub)\s+([a-z0-9]*).*?End\s+\1 enter image description here

Then test each of these to see if they contain your variable, note in this test I'm using multiline matching to ensure the comment character does not appear before Global_Variable on the same line. This also checks that the GLOBAL_VARIABLE_1 is not preceded by any of the following

  • alphanumeric with or without a _ seperater. This would need to be updated with all the characters you might find in a variable name. Including a hyphen - here might be confused with a minus sign used in an equation.
  • comment character '

^[^']*?(?![a-z0-9][_]?|['])\bGLOBAL_VARIABLE_1

enter image description here

VB Part 1

Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("(?:Public|Private)\s+(Function|Sub)\s+([a-z0-9]*).*?End\s+\1",RegexOptions.IgnoreCase OR RegexOptions.Singleline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
  End Sub
End Module

$matches Array:
(
    [0] => Array
        (
            [0] => Public Function doThis(byVal xml)
'' Created               : dd/mm/yyyy
'' Return                : string
'' Param            : xml- an xml blob

     return = replace(xml, "><", ">" & vbLf & "<")

     GLOBAL_VARIABLE_1 = 2 + 2

     doThis = return

End Function
            [1] => Public Function doThat(byVal xPath)
'' Created               : dd/mm/yyyy
'' Return                : array
' 'Param            : xPath

     return = split(mid(xPath, 2), "/")

     GLOBAL_VARIABLE_2 = 2 + 2


     doThat = return

End Function
            [2] => Public Sub butDontDoThis()
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj

     For i = 0 To 5
          return = return & "bye" & " "

     Next

End Sub
            [3] => Public Sub alsoDoThis(byRef obj)
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj, an xml document object

     For i = 0 To 4
          return = return & "hi" & " "

     Next

     GLOBAL_VARIABLE_1 = 1 + 1

End Sub
        )

    [1] => Array
        (
            [0] => Function
            [1] => Function
            [2] => Sub
            [3] => Sub
        )

    [2] => Array
        (
            [0] => doThis
            [1] => doThat
            [2] => butDontDoThis
            [3] => alsoDoThis
        )

)

VB Part 2

Found in this text

Public Function doThis(byVal xml)
'' Created               : dd/mm/yyyy
'' Return                : string
'' Param            : xml- an xml blob

     return = replace(xml, "><", ">" & vbLf & "<")

     GLOBAL_VARIABLE_1 = 2 + 2

     doThis = return

End Function

example

Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("^[^']*?GLOBAL_VARIABLE_1",RegexOptions.IgnoreCase OR RegexOptions.Multiline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
  End Sub
End Module

$matches Array:
(
    [0] => Array
        (
            [0] =>  Param            : xml- an xml blob

     return = replace(xml, "><", ">" & vbLf & "<")

     GLOBAL_VARIABLE_1
        )

)

not found in this text

Public Function doThis(byVal xml)
'' Created               : dd/mm/yyyy
'' Return                : string
'' Param            : xml- an xml blob

     return = replace(xml, "><", ">" & vbLf & "<")

  '   GLOBAL_VARIABLE_1 = 2 + 2

     doThis = return

End Function

example

Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("^[^']*?GLOBAL_VARIABLE_1",RegexOptions.IgnoreCase OR RegexOptions.Multiline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
  End Sub
End Module

Matches Found:
NO MATCHES.

also not found in this text

Public Sub butDontDoThis()
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj

     For i = 0 To 5
          return = return & "bye" & " "

     Next

End Sub

example

   Imports System.Text.RegularExpressions
    Module Module1
      Sub Main()
        Dim sourcestring as String = "Public Sub butDontDoThis()
    '' Created               : dd/mm/yyyy
     '' Return                : string
     ' 'Param            : obj

     For i = 0 To 5
          return = return & ""bye"" & "" ""

     Next

End Sub"
        Dim re As Regex = New Regex("^[^']*?GLOBAL_VARIABLE_1",RegexOptions.IgnoreCase OR RegexOptions.Multiline)
        Dim mc as MatchCollection = re.Matches(sourcestring)
        Dim mIdx as Integer = 0
        For each m as Match in mc
          For groupIdx As Integer = 0 To m.Groups.Count - 1
            Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
          Next
          mIdx=mIdx+1
        Next
      End Sub
    End Module

    Matches Found:
    NO MATCHES.

Disclaimer

There are a lot of edge cases which can trip this up, for example if you have a comment with ' end function or have a if you assign a string value to a variable like thisstring = "end sub"

Yes I realize OP was for VBscript, I've included these examples to demonstrate the overall logic and that the regular expressions work.

OTHER TIPS

Found the culprit. The issue is caused by the highlighted part of your regular expression:

((?:(?:Public|Private)\s+)?(?:Function|Sub).+)[\s\S]+?(GLOBAL_VARIABLE_1)[\s\S]+?End\s+(?:Function|Sub)

[\s\S]+? is a non-greedy match, but that does not necessarily mean it's the shortest match. Simplified example:

Public Function doThis(byVal xml)
  GLOBAL_VARIABLE_1
End Function

Public Function doThat(byVal xPath)
  GLOBAL_VARIABLE_2
End Function

Public Sub butDontDoThis()
  GLOBAL_VARIABLE_3
End Sub

Public Sub alsoDoThis(byRef obj)
  GLOBAL_VARIABLE_1
End Sub

When the regular expression is applied to the sample text, it first matches the first function (groups marked as bold text):

Public Function doThis(byVal xml)
  GLOBAL_VARIABLE_1
End Function

However, after that match, the first part of the expression (((?:(?:Public|Private)\s+)?(?:Function|Sub).+)) matches the next function definition (Public Function doThat(byVal xPath)), and [\s\S]+?(GLOBAL_VARIABLE_1) then matches all text until the next occurrence of GLOBAL_VARIABLE_1:

Public Function doThat(byVal xPath)
  GLOBAL_VARIABLE_2
End Function

Public Sub butDontDoThis()
  GLOBAL_VARIABLE_3
End Sub

Public Sub alsoDoThis(byRef obj)
  GLOBAL_VARIABLE_1
End Sub

There is no implicit "don't include End Function" in [\s\S]+?.

The simplest solution to your problem may be a combination of regular expression and string match:

Set fso = CreateObject("Scripting.FileSystemObject")
text = fso.OpenTextFile("C:\Temp\sample.txt").ReadAll

Set re = New RegExp
re.Pattern = "((?:(?:Public|Private)\s+)(Function|Sub).+)([\s\S]+?)End\s+\2"
re.Global  = True
re.IgnoreCase = True

For Each m In re.Execute(text)
  If InStr(m.SubMatches(2), "GLOBAL_VARIABLE_1") > 0 Then
    WScript.Echo m.SubMatches(0)
  End If
Next

It extracts the body of each procedure/function (SubMatches(2)) and then checks with InStr() if the body contains GLOBAL_VARIABLE_1.

Description

This regex will break up the text into strings where each string contains a single function or sub. It will also validate the that the string has a non-commented GLOBAL_VARIABLE_1 by looking for the first line of code inside the function which doesn't have a ' preceding the desired GLOBAL_VARIABLE_1 value. The expression will also handle ' as regular characters if they are embedded in a double quoted string like variable = "sometext ' more text" + GLOBAL_VARIABLE_1

(?:Public|Private)\s+(Function|Sub)\s+([a-z0-9]*)(?:(?!^End\s+\1\s+(?:$|\Z)).)*^(?:[^'\r\n]|"[^"\r\n]*")*GLOBAL_VARIABLE_1.*?^End\s\1\b

enter image description here

Groups

Group 0 will contain the entire matched function/sub

  1. will contain function or sub accordingly
  2. will contain the name of the function/sub

Examples

Input text

Public Function ValidEdgeCase1(byRef obj)
  SomeVariable = "some text with an embedded ' single quote" + GLOBAL_VARIABLE_1
End Sub

Public Sub SkipEdgeCase(byRef obj)
  SomeVariable = "some text with an embedded ' single quote" ' + GLOBAL_VARIABLE_1
End Sub

Public Function FailCommented(byVal xml)
'  GLOBAL_VARIABLE_1
End Function

Public Function FAilWrongName1(byVal xPath)
  GLOBAL_VARIABLE_2
End Function

Public Sub FAilWrongName1()
  GLOBAL_VARIABLE_3
End Sub

Public Sub alsoDoThis(byRef obj)
  GLOBAL_VARIABLE_1
End Sub

Public Sub IHeartKitten(byRef obj)
  GLOBAL_VARIABLE_1
End Sub

Public Sub IHeartKitten2(byRef obj)
  GLOBAL_VARIABLE_1
End Sub

Public Function FailCommented(byVal xml)
'  GLOBAL_VARIABLE_1
End Function

Sample Code

Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("(?:Public|Private)\s+(Function|Sub)\s+([a-z0-9]*)(?:(?!^End\s+\1\s+(?:$|\Z)).)*^(?:[^'\r\n]|"[^"\r\n]*")*GLOBAL_VARIABLE_1.*?^End\s\1\b",RegexOptions.IgnoreCase OR RegexOptions.IgnorePatternWhitespace OR RegexOptions.Multiline OR RegexOptions.Singleline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
  End Sub
End Module

$matches Array:

(
    [0] => Array
        (
            [0] => Public Function ValidEdgeCase1(byRef obj)
  SomeVariable = "some text with an embedded ' single quote" + GLOBAL_VARIABLE_1
End Sub
            [1] => Public Sub alsoDoThis(byRef obj)
  GLOBAL_VARIABLE_1
End Sub
            [2] => Public Sub IHeartKitten(byRef obj)
  GLOBAL_VARIABLE_1
End Sub
            [3] => Public Sub IHeartKitten2(byRef obj)
  GLOBAL_VARIABLE_1
End Sub
        )

    [1] => Array
        (
            [0] => Function
            [1] => Sub
            [2] => Sub
            [3] => Sub
        )

    [2] => Array
        (
            [0] => ValidEdgeCase1
            [1] => alsoDoThis
            [2] => IHeartKitten
            [3] => IHeartKitten2
        )

)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top