Regular expression troubles, can't seem to match what I want
Question
I have a load of jibberish data with this somewhere in the middle:
"video_id": "hGosI8rBVe8"
And from this, I want to extract hGosI8rBVe8
. Note that what I want to extract can be of any length, and can include upper/lowercase letters and numbers. This is what I've tried so far:
"video_id": "(.*)"
and:
"video_id": "([a-zA-Z0-9]*)"
But they carry on matching way past the " at the end of what I want returned. I'm pretty sure this is because of the *
(greedy)... but I see no other way to do it because what I want returned will be of variable length.
Any help is appreciated, cheers.
Solution
Make it ungreedy by appending the ?
"video_id": "([a-zA-Z0-9]+?)"
I also changed *
to +
as the former is 0 or more and the latter is 1 or more. Which is more appropriate in this case.
OTHER TIPS
The "video_id": "([a-zA-Z0-9]*)"
shouldn't match beyond the closing "
simply because that's not included in the [a-zA-Z0-9]
character class. I'm not sure why you think it's doing that.
However, the .*
will match more characters if avalaible so that applying the "(.*)"
regex to My name is "Pax" and yours is "George"
will get you:
Pax" and yours is "George
If you have a regex engine that doesn't support non-greediness, you can use:
"video_id": "([^"]*)"
which will basically match "
followed by the maximum number of non-"
characters, followed by the "
again.