Pergunta

I have a regular expression that captures three backreferences though one (the 2nd) may be null.

Given the flowing string:

http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajonathonoat.es&source=web&cd=1&ved=0CC8QFjAA&url=http%3A%2F%2Fjonathonoat.es%2Fbritish-mozcast%2F&ei=MQj9UKejDYeS0QWruIHgDA&usg=AFQjCNHy1cDoWlIAwyj76wjiM6f2Rpd74w&bvm=bv.41248874,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1

I wish to capture the TLD (in this case .co.uk), q param and cd param.

I'm using the following RegEx:

/.*\.google([a-z\.]*).*q=(.*[^&])?.*cd=(\d*).*/i

Which works except the 2nd backreference includes the other parameters upto the cd param, I current get this:

["http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajo…,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1 ", ".co.uk", "site%3Ajonathonoat.es&source=web", "1", index: 0, input: "http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajo…,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1"]

The 1st backreference is correct, it's .co.uk and so is the 3rd; it's 1. I want the 2nd backreference to be either null (or undefined or whatever) or just the q param, in this example site%3Ajonathonoat.es. It currently includes the source param too (site%3Ajonathonoat.es&source=web).

Any help would be much appreciated, thanks!

I've added a JSFiddle of the code, look in your browser console for the output, thanks!

Foi útil?

Solução 2

You want the middle group to be:

q=([^&]*)

This will capture characters other than ampersand. This also allows zero characters, so you can remove the optional group (?).

Working example: http://rubular.com/r/AJkXxgeX5K

Outras dicas

if negating character classes, i always add a multiplier to the class itself:

/.*\.google([a-z\.]*).*q=([^&]*?)?.*cd=(\d*).*/i

i also recoomend not using * or + as they are "greedy", always use *? or +? when you are going to find delimiters inside your string. For more on greedyness check J.F.Friedls Mastering Rgeular Expressions or simply here

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top