سؤال

I'm using Jsoup to get html from web sites. I'm using

String url="http://www.example.com";
Document doc=Jsoup.connect(url).get();

this code to get html. But when I use some Turkish letters in the link like this;

String url="http://www.example.com/?q=Türkçe";
Document doc=Jsoup.connect(url).get();

Jsoup sends the request like this: "http://www.example.com/?q=Trke"

So I can't get the correct result. How can I solve this problem?

هل كانت مفيدة؟

المحلول

Working solution, if encoding is UTF-8 then simply use

Document document = Jsoup.connect("http://www.example.com")
        .data("q", "Türkçe")
        .get();

with result

URL=http://www.example.com?q=T%C3%BCrk%C3%A7e

For custom encoding this can be used:

String encodedUrl = URLEncoder.encode("http://www.example.com/q=Türk&#231e", "ISO-8859-3");
String encodedBaseUrl = URLEncoder.encode("http://www.example.com/q=", "ISO-8859-3");
String query = encodedUrl.replace(encodedBaseUrl, "");

Document doc= Jsoup.connect("http://www.example.com")
        .data("q", query)
        .get();

نصائح أخرى

Unicode Characters are not allowed in URLs as per the specification. We're used to see them, because browsers display them in adress bars, but they are not sent to servers.

You have to URL encode your path before passing it to JSoup. Jsoup.connect("http://www.example.com").data("q", "Türkçe") as proposed by MariuszS does just that

I found this on google: http://turkishbasics.com/resources/turkish-characters-html-codes.php Maybe u can add it like this:

 String url="http://www.example.com/?q=Türk&#231e";
 Document doc=Jsoup.connect(url).get();
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top