Question

I've searched around quite a bit now, but I can't get any suggestions to work in my situation. I've seen success with negative lookahead or lookaround, but I really don't understand it.

I wish to use RegExp to find URLs in blocks of text but ignore them when quoted. While not perfect yet I have the following to find URLs:

(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?

I want it to match the following:

www.test.com:50/stuff
http://player.vimeo.com/video/63317960
odd.name.amazone.com/pizza

But not match:

"www.test.com:50/stuff
http://plAyerz.vimeo.com/video/63317960"
"odd.name.amazone.com/pizza"

Edit:

To clarify, I could be passing a full paragraph of text through the expression. Sample paragraph of what I'd like below:

I would like the following link to be found www.example.com. However this link should be ignored "www.example.com". It would be nice, but not required, to have "www.example.com and www.example.com" ignored as well.

A sample of a different one I have working below. language is php:

$articleEntry = "Hey guys! Check out this cool video on Vimeo: player.vimeo.com/video/63317960";

$pattern = array('/\n+/', '/(https?\:\/\/)?(player\.vimeo\.com\/video\/[0-9]+)/');
                    
$replace = array('<br/><br/>',
    '<iframe src="http://$2?color=40cc20" width="500" height="281" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>'); 
    $articleEntry = preg_replace($pattern,$replace,$articleEntry);

The result of the above will replace any new lines "\n" with a double break "

" and will embed the Vimeo video by replacing the Vimeo address with an iframe and link.

Was it helpful?

Solution

I've found a solution!

(?=(([^"]+"){2})*[^"]*$)((https?:\/\/)?(\w+\.)+\w{2,}(:[0-9]+)?((\/\w+)+(\.\w+)?)?\/?)

The first part from (? to *$) what makes it work for me. I found this as an answer in java Regex - split but ignore text inside quotes? by https://stackoverflow.com/users/548225/anubhava

While I had read that question before, I had overlooked his answer because it wasn't the one that "solved" the question. I just changed the single quote to double quote and it works out for me.

OTHER TIPS

add ^ and $ to your regex

 ^(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?$

please notice you might need to escape the slashes after http (meaning https?\:\/\/)

update

if you want it to be case sensitive, you shouldn't use \w but [a-z]. the \w contains all letters and numbers, so you should be careful while using it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top