Creating a headless Chrome instance in Python
-
25-05-2021 - |
Question
This question describes my conclusion after researching available options for creating a headless Chrome instance in Python and asks for confirmation or resources that describe a 'better way'.
From what I've seen it seems that the quickest way to get started with a headless instance of Chrome in a Python application is to use CEF (http://code.google.com/p/chromiumembedded/) with CEFPython (http://code.google.com/p/cefpython/). CEFPython seems premature though, so using it would likely mean further customization before I'm able to load a headless Chrome instance that loads web pages (and required files), resolves a completed DOM and then lets me run arbitrary JS against it from Python.
Have I missed any other projects that are more mature or would make this easier for me?
Solution
Any reason you haven't considered Selenium with the Chrome Driver?
OTHER TIPS
This question is 5 years old now and at the time it was a big challenge to run a headless chrome using python, but the good news is:
Starting from version 59, released in June 2017, Chrome comes with a headless driver, meaning we can use it in a non-graphical server environment and run tests without having pages visually rendered etc which saves a lot of time and memory for testing or scraping. Setting Selenium for that is very easy:
(I assume that you have installed selenium and chrome driver):
from selenium import webdriver
#set a headless browser
options = webdriver.ChromeOptions()
options.add_argument('headless')
browser = webdriver.Chrome(chrome_options=options)
and now your chrome will run headlessly, if you take out options from the last line, it will show you the browser.
casperjs is a headless webkit, but it wouldn't give you python bindings that I know of; it seems command-line oriented, but that doesn't mean you couldn't run it from python in such a way that satisfies what you are after. When you run casperjs, you provide a path to the javascript you want to execute; so you would need to emit that from Python.
But all that aside, I bring up casperjs because it seems to satisfy the lightweight, headless requirement very nicely.
I use this to get the driver:
def get_browser(storage_dir, headless=False):
"""
Get the browser (a "driver").
Parameters
----------
storage_dir : str
headless : bool
Results
-------
browser : selenium webdriver object
"""
# find the path with 'which chromedriver'
path_to_chromedriver = '/usr/local/bin/chromedriver'
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
if headless:
chrome_options.add_argument("--headless")
chrome_options.add_experimental_option('prefs', {
"plugins.plugins_list": [{"enabled": False,
"name": "Chrome PDF Viewer"}],
"download": {
"prompt_for_download": False,
"default_directory": storage_dir,
"directory_upgrade": False,
"open_pdf_in_system_reader": False
}
})
browser = webdriver.Chrome(path_to_chromedriver,
chrome_options=chrome_options)
return browser
By switching the headless
parameter you can either watch it or not.