How to extract data from URLs using preg_match()?

https://stackoverflow.com/questions/20984564

25-09-2022
|

문제

I need to extract ASIN numbers (10-character alphanumeric SKU) from Amazon URLs. The URLs are always in these formats:

http://www.amazon.com/gp/product/ASIN
http://www.amazon.com/gp/product/[text]/ASIN
http://www.amazon.com/o/ASIN
http://www.amazon.com/dp/ASIN
http://www.amazon.com/[text]/dp/ASIN
http://www.amazon.com/[text]/dp/[text]/ASIN

There are usually more directories, as well as variables, after the ASIN number in the URL. Here is a full URL as an example:

http://www.amazon.com/Google-Nexus-Tablet-7-Inch-Black/dp/B00DVFLJDS/ref=sr_1_1?ie=UTF8&qid=1387937682&sr=8-1&keywords=nexus+7

I think this might be possible to do using preg_match(), but I'm very new to regex and don't have a clue to formulate the expression.

Is this possible to do with preg_match()? If not, what would be the best approach to solving this problem?

UPDATE:

I've been reading up on regex and was able to modify the answer to work when the ASIN isn't at the very end of the URL string (which it rarely is):

#\/([A-Za-z0-9]{10})#

I also made it so that there has to be a forward slash before the match.

해결책

preg_match('#([A-Za-z0-9]{10})$#', $url, $matches);

In short: [A-Za-z0-9] takes any alphanumeric character, ucase and lcase both allowed, {10} requires it exactly 10 times, and $ requires it to be at the end of the string. The parentheses ( and ) define which part(s) you want to get back in the 3rd $matches output variable. Finally it's all surrounded by 2 #'s as regex delimiters.

Now go read every article in the left sidebar of this page so you can do it yourself next time :)

다른 팁

In addition to Niels's answer:

preg_match('#.*/([A-Za-z0-9]{10})/?$#', $url, $matches);

In case [text] is an alphanumeric with 10 characters.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow