Crawler4j Regex Pattern for url

https://stackoverflow.com/questions/22262501

11-06-2023
|

Question

im using crawler4J , and i want to make some patterns to urls only but i couldn't solve regex for that url :

http://www.site.com/liste/product_name_changable/productDetails.aspx?productId={id}&categoryId={category_id}

i try that :

liste\/*\/productDetails:aspx?productId=*&category_id=*

and

private final static Pattern FILTERS = Pattern.compile("^/liste/*/productDetails.aspx?productId=*$");

but it's not working.

how can i make it regex pattern ?

Solution

You have several errors in your regex. All of the asterixes should be .+, to indicate that you want to match at least one or more character. The question mark symbol needs to be escaped. category_id should be categoryId. productDetails:aspx should be productDetails.aspx. With all of these fixes, the regex looks like this:

liste\/.+\/productDetails\.aspx\?productId=.+&categoryId=.+

Also, you shouldn't have ^ or $ at the start and end of the regex. Those match the start and end of the input, so they won't work if you're trying to get a portion of the url, which you are.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow