Question

I want to get the id cookie that Google issues when you opt-in at the ads settings page (if you're already accepting target advertisement, you must opt out first to see the page to which I am referring).

I've found that, in order to get this cookie, you have to perform an HTTP GET to the action URL in the form that is in this page. The problem is that this URL contains a hash that changes for every new HTTP connection so, first, I must go to this page and get this URL and, then, perform the GET to the URL.

I'm using HttpComponents to get http://www.google.com/ads/preferences but when I parse the contents with JSOUP there is only a script and no form can be found.

I'm afraid that this happens becauses contents are loaded dynamically using some sort of timeout... Does anyone know a workaround for this?

EDIT: by the way, the code that I use by now is:

        HttpClient httpclient = new DefaultHttpClient();

        // Create a local instance of cookie store
        CookieStore cookieStore = new BasicCookieStore();
        // Bind custom cookie store to the local context
        ((AbstractHttpClient) httpclient).setCookieStore(cookieStore);
        CookieSpecFactory csf = new CookieSpecFactory() {
            public CookieSpec newInstance(HttpParams params) {
                return new BrowserCompatSpec() {
                    @Override
                    public void validate(Cookie cookie, CookieOrigin origin)
                            throws MalformedCookieException {
                        // Allow all cookies
                        System.out.println("Allowed cookie: " + cookie.getName() + " "
                                + cookie.getValue() + " " + cookie.getPath());
                    }
                };
            }
        };
        ((AbstractHttpClient) httpclient).getCookieSpecs().register("EASY", csf);

        // Create local HTTP context
        HttpContext localContext = new BasicHttpContext();
        // Bind custom cookie store to the local context
        localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
        HttpGet httpget = new HttpGet(doubleClickURL);
        // Override the default policy for this request
        httpclient.getParams().setParameter(
                ClientPNames.COOKIE_POLICY, "EASY"); 

        // Pass local context as a parameter
        HttpResponse response = httpclient.execute(httpget, localContext);

        HttpEntity entity = response.getEntity();

        if (entity != null) {                               
                InputStream instream = entity.getContent();

                BufferedReader reader = new BufferedReader(
                        new InputStreamReader(instream));
                instream.close();
                // Find action attribute of form
                Document document = Jsoup.parse(reader.readLine());
                Element form = document.select("form").first();         
                String optinURL = form.attr("action");
                URL connection = new URL(optinURL);
                // ... get id Cookie

        }
Was it helpful?

Solution 2

Finally I found it! I found the following site describing the doubleclick cookie protocol:

Privacy Advisory

Then, is as easy as setting a cookie in that domain with name id and value A. Then make an HTTP request to http://www.google.com/ads/preferences and they'll set a correct ID value.

It is a very specific question but I hope that serves to future viewers.

By the way, I found that amazon.com is for example a member of the Ad-sense Network. An HTTP request to doubleclick is sent by means of script in the main page to:

http://ad.doubleclick.net/adj/amzn.us.gw.atf

There you can find a script that seems the actual code to give you the id cookie. Nevertheless, if you access this with the cookie with value A it will set the id of doubleclick.

OTHER TIPS

You may have more chance using HtmlUnit, Selenium or jWebUnit for such a task. JSoup does not interpret Javascript, and the Google page your pointing to is full of Javascript that should be executed by a browser to produce what you're seeing.

HtmlUnit is OS independent and does not need anything else installed, but I've never used it for complicated Javascript sites. HtmlUnit can also extract data from the web page like JSoup does, but you can still feed the html to JSoup if you prefer using it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top