문제

I am trying to parse a non well formed DTD html file which i retrieve by a inputstream with JSOUP, and get all the data in the TD fields. How can i do that with JSoup? I already looked at the http://jsoup.org/cookbook/ but i should need som example to get it started.

Thank you in advance.

I already tried the saxparser but i can`t get the DTD to work.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-             strict.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="nl" lang="nl"> 
<TABLE class=personaltable cellSpacing=0 cellPadding=0> 
 <TBODY> 
  <TR class=alternativerow> 
   <TD>Nieuw beltegoed:</TD> 
   <TD>€ 1,00</TD></TR> 
  <TR> 
   <TD>Tegoed vorige periode:  
   <TD>€ 2,00</TD></TD></TR> 
  <TR class=alternativerow> 
   <TD>Tegoed tot 09-11-2011:  
   <TD>€ 10,00</TD></TD></TR> 
  <TR> 
   <TD> 
   <TD height=25></TD> 
  <TR class=alternativerow> 
   <TD>Verbruik sinds nieuw tegoed:</TD> 
   <TD>€ 0,33</TD></TR> 
  <TR> 
   <TD>Ongebruikt tegoed:</TD> 
   <TD>€ 12,00</TD></TR> 
  <TR class=alternativerow> 
   <TD class=f-Orange>Verbruik boven bundel:</TD> 
   <TD class=f-Orange>€ 0,00</TD></TR> 
  <TR> 
   <TD>Verbruik dat niet in de bundel zit*:</TD> 
   <TD>€ 0,00</TD></TR> 
  </TBODY> 
 </TABLE> 
</html> 

Edit: I am getting a force close, i need the JSoup in my AsyncTask. Here is the LOgcat:

10-20 21:07:36.679: ERROR/AndroidRuntime(1396): FATAL EXCEPTION: main
10-20 21:07:36.679: ERROR/AndroidRuntime(1396): java.lang.NullPointerException
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at   com.sencide.AndroidLogin$MyTask.onPostExecute(AndroidLogin.java:276)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at com.sencide.AndroidLogin$MyTask.onPostExecute(AndroidLogin.java:1)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at android.os.AsyncTask.finish(AsyncTask.java:417)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at android.os.AsyncTask.access$300(AsyncTask.java:127)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at android.os.AsyncTask$InternalHandler.handleMessage(AsyncTask.java:429)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at android.os.Handler.dispatchMessage(Handler.java:99)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at android.os.Looper.loop(Looper.java:130)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at android.app.ActivityThread.main(ActivityThread.java:3835)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at java.lang.reflect.Method.invokeNative(Native Method)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at java.lang.reflect.Method.invoke(Method.java:507)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:847)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:605)
10-20 21:07:36.679: ERROR/AndroidRuntime(1396):     at dalvik.system.NativeStart.main(Native Method)

Here is the AsyncTask code:

public class MyTask extends AsyncTask<String, Integer, String> {
    private Elements tdsFromSecondColumn=null;
}

protected String doInBackground(String... params) {
      InputStream inputStreamActivity = response.getEntity().getContent();

                BufferedReader reader = new BufferedReader(new InputStreamReader(inputStreamActivity));
                StringBuilder sb = new StringBuilder();
                String line = null;

                while ((line = reader.readLine()) != null) {
                    sb.append(line + "\n");
                }

                /******* CLOSE CONNECTION AND STREAM *******/

                System.out.println(sb);
                inputStreamActivity.close();

                String kpn;
                kpn = sb.toString();

                Document doc = Jsoup.parse(kpn);
                Elements tdsFromSecondColumn = doc.select("table.personaltable td:eq(1)");
}

@Override 
    protected void onPostExecute(String result) { 
        //publishProgress(false); 
        TextView tv = (TextView)findViewById(R.id.lbl_top);

        for (Element tdFromSecondColumn : tdsFromSecondColumn) { 
            //System.out.println(tdFromSecondColumn.text()); 
            tv.setText("");
            tv.setText(tdFromSecondColumn.text());
        }
}
}
도움이 되었습니까?

해결책

So, you have an InputStream and not an URL? You should then use the Jsoup#parse() method which takes an InputStream:

Document document = Jsoup.parse(inputStream, charsetName, baseUri);
// ...

The charsetName should be the charset the document is originally encoded in. You can leave it null to let Jsoup decide or fallback to UTF-8. The baseUri should be the URL from which the HTML was originally served. You can leave it null, you'll only not be able to resolve relative links.

But if you actually have the original URL, then you could also just use Jsoup#connect():

Document document = Jsoup.connect(url).get();
// ...

Regardless of the way you obtained the Document, you can use CSS selectors to select elements of interest in the document. See also the Jsoup cookbook on that subject. Here's an example which extracts all the data from the 2nd column of the <table> with a class name of personaltable:

Elements tdsFromSecondColumn = document.select("table.personaltable td:eq(1)");

for (Element tdFromSecondColumn : tdsFromSecondColumn) {
    System.out.println(tdFromSecondColumn.text());
}

which results in:

€ 1,00
€ 2,00
€ 10,00

€ 0,33
€ 12,00
€ 0,00
€ 0,00
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top