Question

I have this problem where I need to queue a page link with TaskQueue:

        Queue queue = QueueFactory.getDefaultQueue();
        for (String href : hrefs){
            href = baseUrl + href;
            pageLinks = pageLinks + "\n" + href;
            queue.add(TaskOptions.Builder
                .withUrl("/crawler")
                .param("url", href));
          l("Added to queue url=["+href+"]");               
        }

The problem here is that, I think the URL that gets passed into the queue contains ?'s for Arabic characters. As it keeps on rescheduling.

The String pageLinks however is outputed in the browser through Spring MVC, and I can properly see the Arabic character being displayed. So I'm pretty the links are ok.

If I copy one of the links output on the browser, and paste it to the browser URL it works fine. So I'm pretty sure that the reason that the queue keeps on recheduling because it gets the wrong URL.

What could I be missing here? Do I need to convert the String href before passing it into the queue?

The crawl service looks like this:

@RequestMapping(method = RequestMethod.GET, value = "/crawl",
produces = "application/json; charset=iso-8859-6")
public @ResponseBody String crawl(HttpServletRequest req, HttpServletResponse res,
            @RequestParam(value="url", required = false) String url) {
        l("Processs url:" + url);
}

Also do I need to convert the @QueryParam String url here to Arabic or not?

Was it helpful?

Solution

You must Url-encode the parameters. See this question: Java URL encoding of query string parameters

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top