Question

i have the url : http://pasca.undiksha.ac.id/e-journal/index.php/jurnal_bahasa/article/view/500 (it's not directly access pdf, but directed to pdf file. I want parse this pdf file and get pdf text. i try using jsoup : `

String url = "http://pasca.undiksha.ac.id/e-journal/index.php/jurnal_ep/article/download/380/172";
File in = new File(url);
Document doc = Jsoup.parse(in, "UTF-8");
System.out.println(doc.toString());`

the output is :

java.io.FileNotFoundException: http:\pasca.undiksha.ac.id\e-journal\index.php\jurnal_ep\article\download\380\172 (The filename, directory name, or volume label syntax is incorrect)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at org.jsoup.helper.DataUtil.load(DataUtil.java:36)
        at org.jsoup.Jsoup.parse(Jsoup.java:103)

Any have idea?thank u

Was it helpful?

Solution

Use URLConnection to connect the pdf read content using :

URL url = 
new URL( "http://pasca.undiksha.ac.id/e-journal/index.php/jurnal_bahasa/article/view/500" );

URLConnection connection = url.openConnection();

input = connection.getInputStream();

Document doc = Jsoup.parse(in, "UTF-8");
System.out.println(doc.toString());

OTHER TIPS

you cannot use File with URL other than file://, of course it would be an error. use commons-http-client to access your file on the web.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top