Question

While scraping a single order (full HTML code can be found here: http://pastebin.com/SaLc5jHu) page (admin part of my OpenCart shop) for the customer's email address I get the following as the email address value:

[email protected]
/* <![CDATA[ */
(function(){try{var s,a,i,j,r,c,l,b=document.getElementsByTagName("script");l=b[b.length-1].previousSibling;a=l.getAttribute('data-cfemail');if(a){s='';r=parseInt(a.substr(0,2),16);for(j=2;a.length-j;j+=2){c=parseInt(a.substr(j,2),16)^r;s+=String.fromCharCode(c);}s=document.createTextNode(s);l.parentNode.replaceChild(s,l);}}catch(e){}})();
/* ]]> */

Here's the code:

require 'mechanize'

a = Mechanize.new

a.get('http://exampleshop.nl/admin/') do |page|

    # Select the login form
    login_form = page.forms.first

    # Insert the username and password
    login_form.username = 'username'
    login_form.password = 'password'

    # Submit the login information
    dashboard_page = a.submit(login_form, login_form.buttons.first)

    # Check if the login was successfull
    puts check_1 = dashboard_page.title == 'Dashboard' ?  "CHECK 1 DASHBOARD SUCCESS" : "CHECK 1 DASHBOARD FAIL"

    # Visit the orders index page to scrape some standard information
    orders_page = a.click(dashboard_page.link_with(:text => /Bestellingen/))

    # pp orders_page # => http://pastebin.com/L3zASer6

    # Check if the visit is successful
    puts check_2 = orders_page.title == 'Bestellingen' ?  "CHECK 2 ORDERS SUCCESS" : "CHECK 2 ORDERS FAIL"

    # Search for all #singleOrder table row's and put them in variable all_single_orders
    all_single_orders = orders_page.search("#singleOrder") 

    # Scrape the needed information (the actual save to database is omitted)
    all_single_orders.each do |order|
        # Set links for each order
        order_link = order.at_css("a")['href']  #Assuming first link in row

        order_id = order.search("#orderId").text                    # => 259    
        order_status = order.search("#orderStatus").text    # => Bestelling ontvangen           
        order_amount = order.search("#orderAmount").text        # => € 41,94

        # Visit a single order page to fetch more detailed information
        single_order_page = orders_page.link_with(:href => order_link).click

        # Fetch more information
        puts first_name = single_order_page.search(".firstName").text
        puts last_name = single_order_page.search(".lastName").text
        puts email = single_order_page.search(".email").text # => [email protected] /* <![CDATA[ */...
        puts postal_code = single_order_page.search(".postalCode").text
        puts address = single_order_page.search(".address").text
        puts product_quantity = single_order_page.search(".orderQuantity").text
    end
end

Any ideas? I'm using Ruby 2.0.0 and Mechanize 2.7.3 and have CloudFlare setup.

UPDATE

Working now. To get this to work simply disable the ScrapeShield E-mail obfuscation option inside the CloudFlare's Apps panel (https://www.cloudflare.com/cloudflare-apps).

Was it helpful?

Solution

It was not working because a CloudFlare app called ScrapeShield was activated.

To get this to work simply disable the ScrapeShield E-mail obfuscation option inside the Apps panel (https://www.cloudflare.com/cloudflare-apps).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top