How do I write a regular expression to match numbers with or without leading zeros AND exclude numbers with certain text?

StackOverflow https://stackoverflow.com/questions/22817385

  •  26-06-2023
  •  | 
  •  

Question

I am trying to extract measurements from file names, and they are very inconsistent; for example:

  • FSTCAR.5_13UNC_1.00
  • FSTCAR.5_13UNC_1.00GR5P
  • FSTCAR.5_13UNC_1.00SS316

I have to be able to match all numbers (with decimals and with without leading zeros). I think I have that working with this:

/\d*\.?\d+/i

However, I also want to be able to exclude numbers preceded by SS or GR. Something like this seems to partial work:

/(?<!GR|SS)\d*\.?\d+/i

That will exclude the 5 from FSTCAR.5_13UNC_1.00GR5P above but anything more than a single digit is not excluded so 16 from the 316 would be a match. I am doing this in ruby.

Was it helpful?

Solution

Anytime you have to dither floating number strings its not a trivial feat.
This just takes your last regex and adds some extra stuff to the lookbehind.
This secures that the engine won't bypass a number just to match the regex.

 #  (?<!GR)(?<!SS)(?<![.\d])\d*\.?\d+

 # (?<! GR | SS | [.\d] )
 (?<! GR )
 (?<! SS )
 (?<! [.\d] )
 \d* \.? \d+ 

Perl test case

 @ary = (
   'FSTCAR.5_13UNC_1.00 ',
   'FSTCAR.5_13UNC_1.00GR5P',
   'FSTCAR.5_13UNC_1.00SS316'
 );

 foreach $fname (@ary)
 {
    print "filename:  $fname\n";
    while ( $fname =~ /(?<!GR)(?<!SS)(?<![.\d])\d*\.?\d+/ig ) {

       print " found $&\n";
    }
 }

Output >>

 filename:  FSTCAR.5_13UNC_1.00
  found .5
  found 13
  found 1.00
 filename:  FSTCAR.5_13UNC_1.00GR5P
  found .5
  found 13
  found 1.00
 filename:  FSTCAR.5_13UNC_1.00SS316
  found .5
  found 13
  found 1.00

OTHER TIPS

To fix the SS and GR exclusion, try this:

/(?<!GR|SS)[\d\.]+/i

I'm not sure exactly what your layout is, but using this would be faster for your negative look behind:

(?<![GRS]{2})

Edit: the + still isn't greedy enough.

You might need to use two regex. One to remove the GR/SS numbers, and one to match (note: I'm not very familiar with Ruby):

val.gsub('/[GRS]{2}[\d\.]+/', '')
val =~ /[\d\.]+/
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top