Domanda

I'm using Jsoup to get html from web sites. I'm using

String url="http://www.example.com";
Document doc=Jsoup.connect(url).get();

this code to get html. But when I use some Turkish letters in the link like this;

String url="http://www.example.com/?q=Türkçe";
Document doc=Jsoup.connect(url).get();

Jsoup sends the request like this: "http://www.example.com/?q=Trke"

So I can't get the correct result. How can I solve this problem?

È stato utile?

Soluzione

Working solution, if encoding is UTF-8 then simply use

Document document = Jsoup.connect("http://www.example.com")
        .data("q", "Türkçe")
        .get();

with result

URL=http://www.example.com?q=T%C3%BCrk%C3%A7e

For custom encoding this can be used:

String encodedUrl = URLEncoder.encode("http://www.example.com/q=Türk&#231e", "ISO-8859-3");
String encodedBaseUrl = URLEncoder.encode("http://www.example.com/q=", "ISO-8859-3");
String query = encodedUrl.replace(encodedBaseUrl, "");

Document doc= Jsoup.connect("http://www.example.com")
        .data("q", query)
        .get();

Altri suggerimenti

Unicode Characters are not allowed in URLs as per the specification. We're used to see them, because browsers display them in adress bars, but they are not sent to servers.

You have to URL encode your path before passing it to JSoup. Jsoup.connect("http://www.example.com").data("q", "Türkçe") as proposed by MariuszS does just that

I found this on google: http://turkishbasics.com/resources/turkish-characters-html-codes.php Maybe u can add it like this:

 String url="http://www.example.com/?q=Türk&#231e";
 Document doc=Jsoup.connect(url).get();
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top