Question

I have a document object as:

Document secDoc = Jsoup.connect(a.attr("abs:href")).timeout(30*1000).get();
String txt = secDoc.text();

Now when I debug the above and I check the value of secDoc, I get the normal page source which has an element as:

For questions about your order, including anything shipping or billing related, please email <script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>.

If you see the webpage yourself, you can see a line as: For questions about your order, including anything shipping or billing related, please email oatmealsupport@gmail.com. We only do email support at this time. Interestingly, this script generates the email id on the page. On doing an inspect element, I get:

<p>
                For questions about your order, including anything shipping or billing related, please email <a href="mailto:oatmealsupport@gmail.com">oatmealsupport@gmail.com</a><script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>.
                We only do email support at this time.<br><br>
                Hours of operation: <strong>Monday-Friday 8am - 6pm PT.</strong>
                <br>
                <strong>Shipping Times</strong>:
                We strive to fulfill the orders within 3-5 working days. When we are really busy we may take a day or two longer. 
              We ship orders Monday - Friday, so if your order is placed Friday evening we may not be able to process it until the following Monday. 
                If we are behind, it may be a few days before we respond.  The Oatmeal is an extremely small operation so please be patient. 
                <br>
                <a href="http://shop.theoatmeal.com/pages/shipping">More Shipping Info</a><br><br>
                Questions about shirt sizes? <a href="http://shop.theoatmeal.com/pages/shipping#shirts">Shirt Sizing Info</a>
            </p>

So the anchor: <a href="mailto:oatmealsupport@gmail.com">oatmealsupport@gmail.com</a> Is getting generated by the script.

Is there anyway I can get this anchor using Jsoup (or any other means)?

Était-ce utile?

La solution

For this specific site, the user and domain parts of the address are in the script tag, so select the script tag, get its text, parse that text with a regular expression, and concatenate the user and email with an @ in between. Your selector might just be script:contains(write_email), assuming write_email isn't used elsewhere on the page. This only works because the address is exposed in the text, even if it's in two pieces.

In general, Jsoup is not a JavaScript engine. You might try a browser automation tool like Selenium if you want to see the same page a human using a web browser would see.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top