Question

I have a regex I need to match against a path like so: "C:\Documents and Settings\User\My Documents\ScanSnap\382893.pd~". I need a regex that matches all paths except those ending in '~' or '.dat'. The problem I am having is that I don't understand how to match and negate the exact string '.dat' and only at the end of the path. i.e. I don't want to match {d,a,t} elsewhere in the path.

I have built the regex, but need to not match .dat

[\w\s:\.\\]*[^~]$[^\.dat]

[\w\s:\.\\]* This matches all words, whitespace, the colon, periods, and backspaces. [^~]$[^\.dat]$ This causes matches ending in '~' to fail. It seems that I should be able to follow up with a negated match for '.dat', but the match fails in my regex tester.

I think my answer lies in grouping judging from what I've read, would someone point me in the right direction? I should add, I am using a file watching program that allows regex matching, I have only one line to specify the regex.

This entry seems similar: Regex to match multiple strings

Was it helpful?

Solution

You want to use a negative look-ahead:

^((?!\.dat$)[\w\s:\.\\])*$

By the way, your character group ([\w\s:\.\\]) doesn't allow a tilde (~) in it. Did you intend to allow a tilde in the filename if it wasn't at the end? If so:

^((?!~$|\.dat$)[\w\s:\.\\~])*$

OTHER TIPS

The following regex:

^.*(?<!\.dat|~)$

matches any string that does NOT end with a '~' or with '.dat'.

^             # the start of the string
.*            # gobble up the entire string (without line terminators!)
(?<!\.dat|~)  # looking back, there should not be '.dat' or '~'
$             # the end of the string

In plain English: match a string only when looking behind from the end of the string, there is no sub-string '.dat' or '~'.

Edit: the reason why your attempt failed is because a negated character class, [^...] will just negate a single character. A character class always matches a single character. So when you do [^.dat], you're not negating the string ".dat" but you're matching a single character other than '.', 'd', 'a' or 't'.

^((?!\.dat$)[\w\s:\.\\])*$

This is just a comment on an earlier answer suggestion:

. within a character class, [], is a literal . and does not need escaping.

^((?!\.dat$)[\w\s:.\\])*$

I'm sorry to post this as a new solution, but I apparently don't have enough credibility to simply comment on an answer yet.

I believe you are looking for this:

[\w\s:\.\\]*([^~]|[^\.dat])$

which finds, like before, all word chars, white space, periods (.), back slashes. Then matches for either tilde (~) or '.dat' at the end of the string. You may also want to add a caret (^) at the very beginning if you know that the string should be at the beginning of a new line.

^[\w\s:\.\\]*([^~]|[^\.dat])$
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top