Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Ask Question
I'm running a Python Selenium script on Ubuntu 18.04 using Amazon EC2.
I have a list of urls, and using Selenium to loop through them and get info. Here's a very simple example of my script:
import requests
import selenium
from selenium import webdriver
from selenium import webdriver
from datetime import datetime as dt
import re
import time
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import ElementNotVisibleException
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
# set driver options
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--window-size=1420,1080')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument("--disable-notifications")
chrome_options.add_argument("--remote-debugging-port=9222")
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
chrome_options.add_experimental_option("excludeSwitches", ["disable-popup-blocking"])
chrome_options.binary_location='/usr/bin/google-chrome-stable'
chrome_driver_binary = "/usr/bin/chromedriver"
events = ['https://www.bandsintown.com/e/1024970351-alukah-at-high-noon-saloon?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/103265416-bill-roberts-combo-at-come-back-in?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1022530728-chiiild-at-the-sylvee?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/103243450-aimee-mann-at-stoughton-opera-house?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1022530781-leon-bridges-at-the-sylvee?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/103338969-jonathan-coulton-at-stoughton-opera-house?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1024312079-necronomicon-at-high-noon-saloon?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/103395650-jon.-at-jitters-coffeehouse?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1024311602-the-convalescence-at-high-noon-saloon?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1024691276-todd-sheaffer-at-the-bur-oak?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/103127087-alec-benjamin-at-the-sylvee?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1024272496-sara-kays-at-the-sylvee?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event']
for event in events:
print("getting event")
driver = webdriver.Chrome(executable_path=chrome_driver_binary, chrome_options=chrome_options)
driver.get(event) #crashes!
time.sleep(3)
print(driver.title)
driver.quit()
I get:
getting event
Alukah Madison Tickets, High Noon Saloon May 02, 2022 | Bandsintown
getting event
Bill Roberts Combo Madison Tickets, Come Back In May 02, 2022 | Bandsintown
getting event
Chiiild Madison Tickets, The Sylvee May 02, 2022 | Bandsintown
getting event
Aimee Mann Stoughton Tickets, Stoughton Opera House May 02, 2022 | Bandsintown
getting event
Leon Bridges Madison Tickets, The Sylvee May 02, 2022 | Bandsintown
getting event
Traceback (most recent call last):
File "test.py", line 41, in <module>
driver.get(event) #crashes!
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
(Session info: headless chrome=101.0.4951.41)
What I've tried:
I've tried increasing the size of my Amazon EC2 instance. It's now got 16gb of memory, so I don't think that's the issue.
Tried countless tweaks to the chrome_options
, including the --no-sandbox
and '--disable-dev-shm-usage'
options.
Tried stopping/restarting instance.
Tried pkill chrome
and pkill -f "(chrome)?(--headless)"
commands to the command line to make sure all chrome processes are killed. No luck.
Ensured Chromedriver and google-chrome are same version. They are. (version 101.0.4951.41).
Ensured the location of Chrome/Chromedriver binaries are correct. google-chrome and chromedriver are both in usr/bin
.
Tested it locally - it works locally. Just fails on ubuntu EC2 instance.
Update
It seems like the issue is with this particular list of urls. For example, if I change the url list to this:
events = ["https://www.facebook.com",
"https://www.reddit.com",
"https://www.linkedin.com",
"https://yahoo.com",
"https://google.com",
"https://quora.com",
"https://sweetwater.com",
"https://amazon.com",
"https://youtube.com",
"https://github.com",
"https://stackoverflow.com",
"https://worpress.com",
"https://medium.com"]
The script seems to be run more reliably and I don't get a page crash.
So, what is it about the urls in my original question that are causing the crash? Could it be those urls have too much information that's overloading Selenium? If so, is there a way to load those urls more simply so they don't get bogged down with the map, popups, etc?
Update 2
I've tried two more solutions:
instead of:
driver = webdriver.Chrome(executable_path=chrome_driver_binary, chrome_options=chrome_options)
I used:
driver = webdriver.Chrome(executable_path=chrome_driver_binary, options=chrome_options)
I also added this argument:
chrome_options.add_experimental_option("prefs", { \
"profile.default_content_setting_values.media_stream_mic": 2,
"profile.default_content_setting_values.media_stream_camera": 2,
"profile.default_content_setting_values.geolocation": 2,
"profile.default_content_setting_values.notifications": 2
And it still crashes:
Alukah Madison Tickets, High Noon Saloon May 02, 2022 | Bandsintown
getting event
Bill Roberts Combo Madison Tickets, Come Back In May 02, 2022 | Bandsintown
getting event
Chiiild Madison Tickets, The Sylvee May 02, 2022 | Bandsintown
getting event
Traceback (most recent call last):
File "test.py", line 160, in <module>
driver.get(event) #crashes!
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from tab crashed
(Session info: headless chrome=101.0.4951.41)
There's an error in your drive instance code:
driver = webdriver.Chrome(executable_path=chrome_driver_binary, chrome_options=chrome_options)
It's supposed to be:
driver = webdriver.Chrome(executable_path=chrome_driver_binary, options=chrome_options)
The parameter for options in webdriver.Chrome is "options" not "chrome_options"
A few points:
You're supposed to create one instance of the driver and run commands under that, in your code you kept quiting and creating new instances (unless that is intentional for some reason?).
I recommend you use the package "Webdriver-Manager" rather than manually hardcoding paths.
To import selenium webdriver all you do is from selenium import webdriver
. So delete the rest.
So your final code would look like:
import requests
from selenium import webdriver
from datetime import datetime as dt
import re
import time
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import ElementNotVisibleException
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
# set driver options
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--window-size=1420,1080')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument("--disable-notifications")
chrome_options.add_argument("--remote-debugging-port=9222")
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
chrome_options.add_experimental_option("excludeSwitches", ["disable-popup-blocking"])
events = ['https://www.bandsintown.com/e/1024970351-alukah-at-high-noon-saloon?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/103265416-bill-roberts-combo-at-come-back-in?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1022530728-chiiild-at-the-sylvee?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/103243450-aimee-mann-at-stoughton-opera-house?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1022530781-leon-bridges-at-the-sylvee?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/103338969-jonathan-coulton-at-stoughton-opera-house?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1024312079-necronomicon-at-high-noon-saloon?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/103395650-jon.-at-jitters-coffeehouse?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1024311602-the-convalescence-at-high-noon-saloon?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1024691276-todd-sheaffer-at-the-bur-oak?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/103127087-alec-benjamin-at-the-sylvee?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event', 'https://www.bandsintown.com/e/1024272496-sara-kays-at-the-sylvee?came_from=253&utm_medium=web&utm_source=city_page&utm_campaign=event']
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
for event in events:
print("getting event")
driver.get(event)
time.sleep(3)
print(driver.title)
With the output being:
====== WebDriver manager ======
Current google-chrome version is 101.0.4951
Get LATEST chromedriver version for 101.0.4951 google-chrome
Driver [/Users/bilalakhtar/.wdm/drivers/chromedriver/mac64_m1/101.0.4951.41/chromedriver] found in cache
getting event
Alukah Madison Tickets, High Noon Saloon May 02, 2022 | Bandsintown
getting event
Bill Roberts Combo Madison Tickets, Come Back In May 02, 2022 | Bandsintown
getting event
Chiiild Madison Tickets, The Sylvee May 02, 2022 | Bandsintown
getting event
Aimee Mann Stoughton Tickets, Stoughton Opera House May 02, 2022 | Bandsintown
getting event
Leon Bridges Madison Tickets, The Sylvee May 02, 2022 | Bandsintown
getting event
Jonathan Coulton Stoughton Tickets, Stoughton Opera House May 02, 2022 | Bandsintown
getting event
Necronomicon Madison Tickets, High Noon Saloon May 02, 2022 | Bandsintown
getting event
Jon. Whitewater Tickets, Jitters Coffeehouse May 02, 2022 | Bandsintown
getting event
The Convalescence Madison Tickets, High Noon Saloon May 02, 2022 | Bandsintown
getting event
Todd Sheaffer Madison Tickets, The Bur Oak May 02, 2022 | Bandsintown
getting event
Alec Benjamin Madison Tickets, The Sylvee May 03, 2022 | Bandsintown
getting event
Sara Kays Madison Tickets, The Sylvee May 03, 2022 | Bandsintown
Process finished with exit code 0
Seems you were pretty close to the root cause of the error...
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
...implies that the loading status cannot be determined.
Deep Dive
Seems you have taken care of all the facts mentioned in the discussion unknown error: session deleted because of page crash from unknown error: cannot determine loading status from tab crashed with ChromeDriver Selenium and you were good to go.
However, it seems after opening a couple of the initial urls, all of a sudden an url opens with:
Show notifications popup.
SIGN IN popup.
The very first time these popup shows up within any of the urls, without handling them you can't perform any operation as the page isn't completely loaded which justifies your suspection that ...is there a way to load those urls more simply so they don't get bogged down with the map, popups, etc...
.
You can find a detailed discussion on how to handle the notification in How to allow or deny notification geo-location microphone camera pop up
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.