tl;dr
use --disable-dev-shm-usage
option.
- serverless-chrome(headless-chromium): v1.0.0-55
- ChromeDriver: v2.42
- Selenium for Python: v3.14.1
Error on AWS Lambda
selenium is worked on local lambci/lambda:build-python3.6
image,but is not worked on AWS Lambda.
The code and error message were like below,
from selenium import webdriver
driver_path = './bin/chromedriver-linux'
options = webdriver.ChromeOptions()
options.binary_location = './bin/headless-chromium-linux'
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--single-process')
driver = webdriver.Chrome(driver_path, chrome_options=options)
Message: unknown error: Chrome failed to start: exited abnormally
(chrome not reachable)
(The process started from chrome location ./bin/headless-chromium-linux is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
(Driver info: chromedriver=2.42.591071 (0b695ff80972cc1a65a5cd643186d2ae582cd4ac),platform=Linux 4.14.67-66.56.amzn1.x86_64 x86_64)
: WebDriverException
Traceback (most recent call last):
File "/var/task/handler.py", line 11, in handler
data = _download()
File "/var/task/handler.py", line 55, in _download
driver = webdriver.Chrome(driver_path, chrome_options=options)
File "/var/task/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
desired_capabilities=desired_capabilities)
File "/var/task/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/var/task/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/var/task/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/var/task/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(chrome not reachable)
(The process started from chrome location ./bin/headless-chromium-linux is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
(Driver info: chromedriver=2.42.591071 (0b695ff80972cc1a65a5cd643186d2ae582cd4ac),platform=Linux 4.14.67-66.56.amzn1.x86_64 x86_64)
So I tried debugging bare headless-chromium
binary on AWS Lambda.
from subprocess import check_call
print(check_call(["ls", "-l", './bin']))
print(check_call([
'./bin/headless-chromium-linux',
'--headless',
'--no-sandbox',
'--disable-gpu',
'--dump-dom',
'https://api.ipify.org?format=json',
]))
then getting error messages below,
[0927/064610.911069:ERROR:gpu_process_transport_factory.cc(1007)] Lost UI shared context.
[0927/064612.311328:ERROR:platform_shared_memory_region_posix.cc(222)] Creating shared memory in /dev/shm/.org.chromium.Chromium.JwBSnH failed: No such file or directory (2)
[0927/064612.311375:ERROR:platform_shared_memory_region_posix.cc(225)] Unable to access(W_OK|X_OK) /dev/shm: No such file or directory (2)
[0927/064612.311390:FATAL:platform_shared_memory_region_posix.cc(227)] This is frequently caused by incorrect permissions on /dev/shm. Try 'sudo chmod 1777 /dev/shm' to fix.
I added disable-dev-shm-usage
and it works finally.
from selenium import webdriver
driver_path = './bin/chromedriver-linux'
options = webdriver.ChromeOptions()
options.binary_location = './bin/headless-chromium-linux'
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--single-process')
options.add_argument('--disable-dev-shm-usage') # add this option!
driver = webdriver.Chrome(driver_path, chrome_options=options)
Quick Setup Instruction
Download binary
# $ cd /path/to/your_serverless_dir
$ mkdir -p bin/
# download chromedriver
$ curl -SL https://chromedriver.storage.googleapis.com/2.42/chromedriver_linux64.zip > chromedriver.zip
$ unzip chromedriver.zip
$ mv chromedriver ./bin/chromedriver-linux
# download headless-chromium
$ curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-55/stable-headless-$ chromium-amazonlinux-2017-03.zip > headless-chromium.zip
$ unzip headless-chromium.zip
$ mv headless-chromium ./bin/chromedriver-linux
# clean
$ rm headless-chromium.zip chromedriver.zip
Python Code
Write simple scraping code supporting for both OSX and Linux and save it as handler.py
from sys import platform
from selenium import webdriver
def handler(event, context):
options = webdriver.ChromeOptions()
if platform == 'darwin':
# download OSX binary from https://chromedriver.storage.googleapis.com/index.html?path=2.42/
driver_path = './bin/chromedriver-darwin'
# If you use Chrome Canary, set binary_location.
# options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'
`
elif platform == 'linux':
driver_path = './bin/chromedriver-linux'
options.binary_location = './bin/headless-chromium-linux'
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--single-process')
driver = webdriver.Chrome(driver_path, chrome_options=options)
driver.get('https://api.ipify.org?format=json')
print(driver.page_source)
if __name__ == '__main__':
handler(None, None)
Docker
Creat Dockerfile
like below,
FROM lambci/lambda:build-python3.6
COPY requirements.txt .
RUN python -m pip install \
--trusted-host pypi.org \
--trusted-host files.pythonhosted.org \
-r requirements.txt
ADD . .
Run Locally
# run code on OSX
$ python handler.py
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{"ip":"<your ip addr>"}</pre></body></html>
# run code on Docker
$ docker build . -t serverless-selenium-python
$ docker run serverless-selenium-python python handler.py
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{"ip":"<your ip addr>"}</pre></body></html>
Deploy code to AWS Lambda
serverless.yml
service: sls-selenium-python
provider:
name: aws
region: ap-northeast-1
runtime: python3.6
functions:
test-selenium:
handler: handler.handler
memorySize: 256
timeout: 120
custom:
pythonRequirements:
dockerizePip: true
plugins:
- serverless-python-requirements
requirements.txt
selenium==3.14.1
# setup serverless command...
# $ npm install -g serverless
# $ npm install --save serverless-python-requirements
$ sls deploy -v
# wait for finishing deployment...
$ sls invoke -f test-selenium
$ sls logs -f test-selenium
START RequestId: 917df9f0-c229-1100-a000-753d2bd38861 Version: $LATEST
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{"ip":"<ip addr>"}</pre></body></html>
END RequestId: 917df9f0-c229-1100-a000-753d2bd38861
REPORT RequestId: 917df9f0-c229-11e8-a098-653d2bd3886e Duration: 7636.25 ms Billed Duration: 7700 ms Memory Size: 256 MB Max Memory Used: 168 MB