Question

I'm performing regex matching in .NET against strings that look like this:

1;#Lists/General Discussion/Waffles Win
2;#Lists/General Discussion/Waffles Win/2_.000
3;#Lists/General Discussion/Waffles Win/3_.000

I need to match the URL portion without the numbers at the end, so that I get this:

Lists/General Discussion/Waffles Win

This is the regex I'm trying:

(?:\d+;#)(?<url>.+)(?:/\d+_.\d+)*

The problem is that the last group is being included as part of the middle group's match. I've also tried without the * at the end but then only the first string above matches and not the rest.

I have the multi-line option enabled. Any ideas?

Was it helpful?

Solution

A few different alternatives:

@"^\d+;#([^/]+(?:/[^/]+)*?)(?:/\d+_\.\d+)?$"

This matches as few path segments as possible, followed by an optional last part, and the end of the line.

@"^\d+;#([^/]+(?:/(?!\d+_\.\d+$)[^/]+)*)"

This matches as many path segments as possible, as long as it is not the digit-part at the end of the line.

@"^\d+;#(.*?)(?:/\d+_\.\d+)?$"

This matches as few characters as possible, followed by an optional last part, and the end of the line.

OTHER TIPS

You could try

^(\d+;#)([^/]+(/[^\d][^/]*)*)

and get the 2nd group. The first group matches the 1;#; the second group is split into the first part or the URL (assumed to contain any character other than /), then match any number of groups of /, followed by a non-digit, followed by anything other than /.

Tested on this site, appears to do what you want. Give it a try with some more samples.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top