Question

I have some code (incidentally, it is for Omniture SiteCatalyst) that renders a 1x1 pixel based on some JavaScript object variables I set in the page's source code. The JavaScript eventually creates an img based on the scripting code, but the img src isn't hard-coded into the HTML. How can I figure out what the img src is, given the URL of a page? If I just grab the page, I'll get the pre-rendered JavaScript.

EDIT

For example, let's say I have this code for StackOverflow.html:

<html>
<script type="text/javascript">
a = 2
document.write(a)
</script>
</html>

How can I fetch StackOverflow.html and somehow get the value "2" instead of all of my scripting code?

Thanks!

Was it helpful?

Solution 4

I think the best way to do this is with Selenium, and then inject some javascript in the page to either mine the DOM, or retrieve the value from the window global if appropriate.

OTHER TIPS

If you're trying to get the value of a after the script has run on the client-side (i.e. in the browser), you should just be able to retrieve it in the normal way.

Take the following setup:

index.html

This file is your webpage. It contains some content, a tracking script that inserts an image and your own script.

<!doctype html>
<html>
<head><title>My Page</title></head>
<body>
  <p>My Content<p>
  <!-- Start tracking code -->
  <script src="tracking.js"></script>
  <!-- End tracking code -->
  <script src="mycode.js"></script>
</body>
</html>

tracking.js

This is the tracking code, presumably provided by the tracking company.

var id = '1234foobar';
var visitorUserAgent = encodeURIComponent(navigator.userAgent);
document.write(
  '<img src="http://tracking.com/1x1.gif?id='
  + id + '&ua=' + visitorUserAgent + '" />'
);

mycode.js

If you know what variables (if any) the tracking code creates, you should be able to retrieve the variables themselves or at least the src attribute of the img tag that the tracking code creates.

var imgs = document.getElementsByTagName('img');
alert([id, visitorUserAgent, imgs[imgs.length - 1].src].join('\n'));

Edit:

to answer your restated question:

it seems to me that your problem is figuring out what the page will look like after the JS is run on it.

There is no simple way of doing this that will give you 100% accurate results, for that you will need to actually RUN the javascript and see what the results are, which is really not-easy when you arent in a browser.

Now you have several options. You didnt mention what tool you are using for grabbing the page, ill assume you are using a custom built scraper. If you want to keep using the scarper you can:

  • look into using rhino to evaluate the JS. I am not sure what this will give you, you can research this.
  • if document.write is the only call you care about, you can parse out the variables it uses, and then try to evaluate their values. this will require writing a parser, probably difficult.
  • best thing you can do is use an functional testing tool like tellurium or selenium. This will give you access to the page where the JS has already run, and you can use my original answer to get the value you need.

I would use the Net panel on firebug and filter by image requests. You'll see it go out the moment it's created. Also if you're making analytics requests, try installing the Omnibug firebug plugin to track and break down requests.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top