Question

I'm using Jsoup to get html from web sites. I'm using

String url="http://www.example.com";
Document doc=Jsoup.connect(url).get();

this code to get html. But when I use some Turkish letters in the link like this;

String url="http://www.example.com/?q=Türkçe";
Document doc=Jsoup.connect(url).get();

Jsoup sends the request like this: "http://www.example.com/?q=Trke"

So I can't get the correct result. How can I solve this problem?

Was it helpful?

Solution

Working solution, if encoding is UTF-8 then simply use

Document document = Jsoup.connect("http://www.example.com")
        .data("q", "Türkçe")
        .get();

with result

URL=http://www.example.com?q=T%C3%BCrk%C3%A7e

For custom encoding this can be used:

String encodedUrl = URLEncoder.encode("http://www.example.com/q=Türk&#231e", "ISO-8859-3");
String encodedBaseUrl = URLEncoder.encode("http://www.example.com/q=", "ISO-8859-3");
String query = encodedUrl.replace(encodedBaseUrl, "");

Document doc= Jsoup.connect("http://www.example.com")
        .data("q", query)
        .get();

OTHER TIPS

Unicode Characters are not allowed in URLs as per the specification. We're used to see them, because browsers display them in adress bars, but they are not sent to servers.

You have to URL encode your path before passing it to JSoup. Jsoup.connect("http://www.example.com").data("q", "Türkçe") as proposed by MariuszS does just that

I found this on google: http://turkishbasics.com/resources/turkish-characters-html-codes.php Maybe u can add it like this:

 String url="http://www.example.com/?q=Türk&#231e";
 Document doc=Jsoup.connect(url).get();
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top