Pergunta

so currently I'm retrieving the data from a url using the following code

Document doc = Jsoup.connect(url).get();

Before I fetch the data I've decided I want to get the content type, so I do that using the following.

Connection.Response res = Jsoup.connect(url).timeout(10*1000).execute();
String contentType = res.contentType(); 

Now I'm wondering, is this making 2 separate connections? Is this not efficient? Is there a way for me to get the content type and the document data in 1 single connection?

Thanks

Foi útil?

Solução

Yes Jsoup.connect(url).get() and Jsoup.connect(url).timeout(10*1000).execute(); are two separate connections. Maybe you are looking for something like

Response resp = Jsoup.connect(url).timeout(10*1000).execute();
String contentType = res.contentType(); 

and later parse body of response as a Document

Document doc = resp.parse();

Anyway Jsoup by default parses only text/*, application/xml, or application/xhtml+xml and if content type is other, like application/pdf it will throw UnsupportedMimeTypeException so you shouldn't be worried about it.

Outras dicas

Without looking at the Jsoup internals we can't know. Typically when you want to obtain just the headers of a file (the content type in your case) without downloading the actual file content, you use the HTTP GET method instead of the GET method to the same url. Perhaps the Jsoup API allows you to set the method, that code doesn't seem like it's doing it so I'd wager it's actually getting the entire file.

The HTTP spec allows clients to reuse the connection later, they are called HTTP persistent connections, and it avoids having to create a connection for each call to the same server. However it's up to the client, Jsoup in this case since you aren't handling the connections in your code, to make sure it's not closing the connections after each request.

I believe that the overhead of creating two connections is offset by not downloading the entire file if you're code decides that it shouldn't download the file if it's not of the content type that you want.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top