Question

I have a load of jibberish data with this somewhere in the middle:

"video_id": "hGosI8rBVe8"

And from this, I want to extract hGosI8rBVe8. Note that what I want to extract can be of any length, and can include upper/lowercase letters and numbers. This is what I've tried so far:

"video_id": "(.*)"

and:

"video_id": "([a-zA-Z0-9]*)"

But they carry on matching way past the " at the end of what I want returned. I'm pretty sure this is because of the * (greedy)... but I see no other way to do it because what I want returned will be of variable length.

Any help is appreciated, cheers.

Was it helpful?

Solution

Make it ungreedy by appending the ?

"video_id": "([a-zA-Z0-9]+?)"

I also changed * to + as the former is 0 or more and the latter is 1 or more. Which is more appropriate in this case.

OTHER TIPS

The "video_id": "([a-zA-Z0-9]*)" shouldn't match beyond the closing " simply because that's not included in the [a-zA-Z0-9] character class. I'm not sure why you think it's doing that.

However, the .* will match more characters if avalaible so that applying the "(.*)" regex to My name is "Pax" and yours is "George" will get you:

Pax" and yours is "George

If you have a regex engine that doesn't support non-greediness, you can use:

"video_id": "([^"]*)"

which will basically match " followed by the maximum number of non-" characters, followed by the " again.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top