Question

You can test everything out here:

I would like to extract the value of individual variables paying attention to the different ways they have been defined. For example, for dtime we want to extract 0.004. It also has to be able to interpret exponential numbers, like for example for variable vis it should extract 10e-6.

The problem is that each variable has its own number of white spaces between the variable name and the equal sign (i dont have control on how they have been coded)

Text to test:

dtime = 0.004D0

case = 0

newrun = 1

periodic = 0

iscalar = 1

ieddy = 1

mg_level = 5

nstep = 20000

vis = 10e-6

ak = 10e-6

g = 9.81D0

To extract dtime's value this REGEX works:

(?<=dtime    =\s)[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?

To extract dtime's value this REGEX works:

(?<=vis         =\s)[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?

The problem is that I need to know the exact number of spaces between the variable name and the equal sign. I tried using \s+ but it does not work, why?

(?<=dtime\s+=\s)[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
Was it helpful?

Solution

If you are using PHP or PERL or more generally PCRE then you can use the \K flag to solve this problem like this:

dtime\s+=\s\K[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
           ^^
        Notice the \K, it tells the expression to ignore everything
        behind it as if it was never matched

Regex101 Demo

Edit: I think you need to capture the number in a capturing group if you can't use look behinds or eliminate what was matched so:

dtime\s*=\s*([-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)

OTHER TIPS

(?<=dtime\s+=\s) is a variable length lookbehind because of \s+. Most(not all) engines support only a 'fixed' length lookbehind.

Also, your regex requires a digit before the exponential form, so if there is no digit, it won't match. Something like this might work -

 # dtime\s*=\s*([-+]?[0-9]*\.?[0-9]*(?:[eE][-+]?[0-9]+)?)

 dtime \s* = \s* 
 (                                     # (1)
      [-+]? [0-9]* \.? [0-9]* 
      (?: [eE] [-+]? [0-9]+ )?
 )

Edit: After review, I see you're trying to fold multiple optional forms into one regex.
I think this is not really that straight forward. Just as interest factor, this is probably a baseline:

 # dtime\s*=\s*([-+]?(?(?=[\d.]+)(\d*\.\d+|\d+\.\d*|\d+|(?!))|)(?(?=[eE][-+]?\d+)([eE][-+]?\d+)|))(?(2)|(?(3)|(?!)))

 dtime \s* = \s* 
 (                         # (1 start)
      [-+]?                     # optional -+

      (?(?=                     # conditional check for \d*\.\d*
           [\d.]+ 
        )
           (                         # (2 start), yes, force a match on one of these
                \d* \. \d+                #  \. \d+
             |  \d+ \. \d*                #  \d+ \.
             |  \d+                       #  \d+
             |  (?!)                      # or, Fail the match,  the '.' dot is there without a number
           )                         # (2 end)
        |                            # no, match nothing

      )
      (?(?=                     # conditional check for [eE] [-+]? \d+
           [eE] [-+]? \d+ 
        )
           ( [eE] [-+]? \d+ )        # (3), yes, force a match on it
        |                            # no, match nothing
      )

 )                         # (1 end)
 (?(2)                     # Conditional check - did we match something? One of grp2 or grp3 or both
   |  (?(3)
        |  (?!)                      # Did not match a number, Fail the match
      )
 )
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top