Question

I'm trying to match url request that have literal components and variable components in their path to a list of predefined regex rules. Similar to the routes python library. I'm new to regex so if you could explain the anchors and control characters used in your regex solution I would really appreciate it.

assumming I have the following list of rules. components containing : are variables and can match any string value.

(rule 1) /user/delete/:key
(rule 2) /user/update/:key
(rule 3) /list/report/:year/:month/:day
(rule 4) /show/:categoryid/something/:key/reports

here are example test cases which show request urls and the rules they should match

/user/delete/222 -> matches rule 1
/user/update/222 -> matches rule 2
/user/update/222/bob -> does not match any rule defined
/user -> does not match any rule defined
/list/report/2004/11/2 -> matches rule 3
/show/44/something/222/reports -> matches rule 4

can someone help me write the regex rules for rule 1,2,3,4 ?

Thank you!!

Was it helpful?

Solution

I'm not sure why you need a regex to do something like that. You can split and count:

if len(url.split("/")) == 4:
    # do something

You make sure the length is 4 because there's an additional element at the beginning which is an empty string.

Of using something like:

if url.count("/") == 3:
    # do something

If you really want to use regex, them maybe you could use something like this:

if re.match(r'^(?:/[^/]*){3}$', url):
    # do something

As per your edit:

You could use this for rule 1:

^/user/delete/[0-9]+$

For rule 2:

^/user/update/[0-9]+$

For rule 3:

^/list/report/[0-9]{4}/[0-9]{1,2}/[0-9]{1,2}$

For rule 4:

^/show/[0-9]+/something/[0-9]+/reports$

^ matches the beginning of the string. $ matches the end of the string. Together, they make sure that the string you are testing begins and ends with the regex; there's nothing before or after the 'template'.

[0-9] matches any 1 digit.

+ is a quantifier. It will allow for the repetition of the character or group just before it. [0-9]+ thus means 1 or more digits.

{4} is a fixed quantifier. It is a bit like +, but it repeats only 4 times. {1,2} is a variation of it, it means between 1 and 2 times.

All the other characters in the regex above are literal characters and will match themselves.

OTHER TIPS

Well you can specify that you want only three matches as follows:

'^((\/(\w+)){3})$' with the g and m flags enabled

^ matches from start of string (\/(\w+)){3} matches (a forward slash followed by alphanumeric characters) exactly 3 times $ matches end of string

g flag to return more than just one match m flag to make the ^ and $ treat each line of text as a separate string rather than just one huge multi-line string.

Demo:

http://regex101.com/r/xT3xW2

I will assume my_str is a string that is valid by the regex above Then, to make a method call with that, you can do:

eval(my_str.split('/')[1]+my_str.split('/')[2].capitalize()+'('+my_str.split('/')[3]+')')

Here is what the above string within eval returns:

>>> print my_str.split('/')[1]+my_str.split('/')[2].capitalize()+'('+my_str.split('/')[3]+')'
userUpdate(222)

OR

userDelete(222)

Then you simply do eval() on it to get the method call. That is the best I can do right now.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top