문제

My task is crawling the google search results using headless webkit(PyQt4.QtWebkit) in python. The module was crawling the results fine using PyQt4.I should have to execute this script in amazon ec2.So,I should have to use Xvfb (no x server in ec2).

At the same time my module has to be executed in a loop.So, It was working fine for some iterations.After some looping module runs into "xvfb-run: error: Xvfb failed to start"

How it is supposed to solve?

This is my looping:

for i in range(10):
    try:
        query_dict["start"] = i * 10
        url = base_url + ue(query_dict)
        flag = True
        while flag:
            parsed_dict = main(url)
            time.sleep(8.4)
            flag = False
    except:
        pass

main(url) :

def main(url):
    cmd = "xvfb-run python /home/shan/temp/hg_intcen/lib/webpage_scrapper.py"+" "+str(url)
    print "Cmd EXE:"+ cmd
    proc = subprocess.Popen(cmd,shell=True,stdin=subprocess.PIPE,stdout=subprocess.PIPE)
    proc.wait()
    sys.stdout.flush()
    result = proc.stdout.readlines()
    print "crawled: ",result[1]
    return result

webpage_scrapper will fetch all the html results using pyqt4. How to avoid the xvfb failing for looping?

도움이 되었습니까?

해결책

You need to add --auto-servernum parameter for xvfb-run. Otherwise, it tries to spawn Xvfb on the same display (by default :99), which will fail if you already have one running.

다른 팁

Run like this,

xvfb-run --auto-servernum --server-num=1 python webpage_scrapper.py http://google.com
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top