Question

I was under the impression you can convert HTML to XHTML using TagSoup. I have the tagsoup jar file saved as tagsoup.jar I used the following command wget -O usa_stock.html "http://markets.usatoday.com/custom/usatoday-com/new/html-mktscreener.asp#" | java -jar tagsoup.jar usa_stock.html When I use this command, it generates both the html and xhtml file but when I open the xhtml in firefox it's empty. I'm suspecting that when I pipeline it just doesn't know which file I was trying to convert.

Can someone help me out with this one?

Thanks.

Was it helpful?

Solution

The pipeline (|) used in your code is wrong for sure, change it with && could possible solve your problem.

  1. As the wget didn't output the retrieve webpage to stdout, so you piped nothing into tagsoup.
  2. Although you also specified input file and output file for jsoup, you used pipeline. so at the time java-jar starts to execute, wget is still running. The input file you specified for tagsoup isn't ready yet.

So you need wget quit with 0 exit status first before jsoup start, && here will serve this purpose.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top