Python Selenium handling Timeout Exceptions with a long list of URLs

https://stackoverflow.com/questions/1621806

06-07-2019
|

Question

I am using selenium RC to cycle through a long list of URLs, sequentially writing the HTML from each URL to a csv file. Problem: the program frequently exits at various points in list due to URL "Timed out after 30000ms" exceptions. Instead of stopping the program when it hits a URL time-out, I was trying to have the program simply write a note of the time-out in the CSV file (in the row where the HTML for the URL would have gone) and move on to the next URL in the list. I attempted to add an 'else' clause to my program but it doesnt seem to help (see below) -- ie: the program still stops every time it hits a timeout. I also seem to get 30000ms timeout exceptions even when I open selenium-server with a 60000ms timeout window --eg: "java -jar selenium-server.jar -timeout 600000" ???

Any advice would be much appreciated. Thank you.

from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.MainDomain.com")
        self.selenium.start()

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('SubDomainList.csv', 'rb'))
        for row in spamReader:
            sel.open(row[0])
            sel.wait_for_page_to_load("400000")
            time.sleep(5)
            html = sel.get_html_source()
            ofile = open('output4001-5000.csv', 'ab')
            ofile.write(html + '\n')
            ofile.close
        else:
            ofile = open('outputTest.csv', 'ab')
            ofile.write("URL Timeout" + '\n')
            ofile.close

     def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
     unittest.main()

Solution

Try the following:

from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://example.com")
        self.selenium.start()
        self.selenium.set_timeout("60000")

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('SubDomainList.csv', 'rb'))
        for row in spamReader:
            try:
                sel.open(row[0])
            except Exception, e:
                ofile = open('outputTest.csv', 'ab')
                ofile.write("error on %s: %s" % (row[0],e))
            else:
                time.sleep(5)
                html = sel.get_html_source()
                ofile = open('output4001-5000.csv', 'ab')
                ofile.write(html.encode('utf-8') + '\n')
            ofile.close()

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
     unittest.main()

Some comments:

You don't need a wait_for_page_to_load after an open, that will cause you timeouts because once the page is loaded after the opeen, it will start waiting again and the page will not be loading.
Most of the failures you get from selenium (timeouts, object not found) can be caught with try-except statements
You should set the timeout in your tests withing the test itself (using set_timeout), that way it doesn't depend on the way you start the server, it will always wait the time you wanted

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow