I have a document object as:

Document secDoc = Jsoup.connect(a.attr("abs:href")).timeout(30*1000).get();
String txt = secDoc.text();

Now when I debug the above and I check the value of secDoc, I get the normal page source which has an element as:

For questions about your order, including anything shipping or billing related, please email <script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>.

If you see the webpage yourself, you can see a line as: For questions about your order, including anything shipping or billing related, please email oatmealsupport@gmail.com. We only do email support at this time. Interestingly, this script generates the email id on the page. On doing an inspect element, I get:

<p>
                For questions about your order, including anything shipping or billing related, please email <a href="mailto:oatmealsupport@gmail.com">oatmealsupport@gmail.com</a><script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>.
                We only do email support at this time.<br><br>
                Hours of operation: <strong>Monday-Friday 8am - 6pm PT.</strong>
                <br>
                <strong>Shipping Times</strong>:
                We strive to fulfill the orders within 3-5 working days. When we are really busy we may take a day or two longer. 
              We ship orders Monday - Friday, so if your order is placed Friday evening we may not be able to process it until the following Monday. 
                If we are behind, it may be a few days before we respond.  The Oatmeal is an extremely small operation so please be patient. 
                <br>
                <a href="http://shop.theoatmeal.com/pages/shipping">More Shipping Info</a><br><br>
                Questions about shirt sizes? <a href="http://shop.theoatmeal.com/pages/shipping#shirts">Shirt Sizing Info</a>
            </p>

So the anchor: <a href="mailto:oatmealsupport@gmail.com">oatmealsupport@gmail.com</a> Is getting generated by the script.

Is there anyway I can get this anchor using Jsoup (or any other means)?

有帮助吗?

解决方案

For this specific site, the user and domain parts of the address are in the script tag, so select the script tag, get its text, parse that text with a regular expression, and concatenate the user and email with an @ in between. Your selector might just be script:contains(write_email), assuming write_email isn't used elsewhere on the page. This only works because the address is exposed in the text, even if it's in two pieces.

In general, Jsoup is not a JavaScript engine. You might try a browser automation tool like Selenium if you want to see the same page a human using a web browser would see.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top