Question

Given the following 3 example paths representing server paths i am trying to create a skiplist for my FTP client via PCRE regular expressions but can't seem to get the wished result.

/subdir-level-1/subdir-level-2/.../Author1_-_Title1-(1234)-Publisher1
/subdir-level-1/subdir-level-2/.../Author2_-_Title2_(5678)-PUBLiSHER2
/subdir-level-1/subdir-level-2/.../Author3_-_Title3-4951-publisher3

I want to skip all folders (not paths) that do not end with

-Publisher1

I am trying to create a working pattern with the help of this online help and and this regex tester but don't get any further than to this negative lookahead pattern

.*-(?!Publisher1)

But with this pattern all lines match because with all of them the substrings up to the pattern do all not contain the pattern.

/subdir/subdir/.../Author1_-_Title1-(1234)  -Publisher1
/subdir/subdir/.../Author2_-_Title2_(5678)  -PUBLiSHER2
/subdir/subdir/.../Author3_-_Title3-4951    -publisher3

What is my mistake and how would the correct pattern be just to match only the second and third line as line to be skipped but keep the first line?

EDIT to make it clearer what to highlight and what not.

Everything from the beginning of the path to the last slash must be ignored (allowed). Everything after the last slash that matches the defined regex must be skipped.

see screenshot

EDIT to present an advanced pattern matching only the red part

[^/]*(?<!-Publisher2)$

Regular expression visualization

Debuggex Demo

Was it helpful?

Solution

The regex which you have used is:

.*-(?!Publisher1)

I will tell you whats the fault in it.

According to this regex it will match those lines which dont have a - followed by Publisher1. Okay, do you notice the - there in between on yur text, yes. between author and title or after title. So all the strings satisfy this condition. Instead if you search with a negative lookahead in such a way that hiphen is with Publisher1 then your match should work.

So you plan on moving the hiphen inside the parenthesis so that it matches and make your regex like this :

^.*(?!-Publisher1)

but this will also not work, because here .* matches everything, so when we do a lookahead, we are not able to find a single character to match . Thus we will use a negative lookbehind. <.

.*(?<!-Publisher1)

what now ? . I have done everything but still I cannot get it to work. why is it so ?

because a negative lookbehind will lookback and tell if it is not followed by -Publisher1.

this is complex, just bear with me :

suppose your string

/subdir/subdir/.../Author1_-_Title1-(1234)-Publisher1

we do a negative lookbehind for -Publisher1. From the postition after 1 . i.e. at the end of the string -Publisher1 is visible when we lookback. BUT our condition is negative lookbehind. So it will move one character left to reach a position where it will no more be able to lookback and say that "Hey I can see -Publisher1 from here" because from here we are able to see "-Publisher" only. Our condtin satisfies but the regex still matches the rest of the string.

So it is essential to bind the lookbehind to the end of the string so that it doesnot move one character to the left to search for its match.

final regex:

.*(?<!-Publisher1)$

demo here : http://regex101.com/r/lE1vW2

OTHER TIPS

This should suit your needs:

^.*(?<!-Publisher1)$

Regular expression visualization

Debuggex Demo

I want to skip all folders that do not end with -Publisher1

You can use this negative lookahead based regex:

^(?!.*?-Publisher1$).+$

Working Demo

You could use the following regex in order to exclude lines containing Publisher1:

^((?!Publisher1).)*$

Online demo: http://regex101.com/r/gD8jK0

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top