selenium.WebDriverException: unknown error: session deleted because of page crash from tab crashed (Not on docker)

3 人关注

Good morning,

这是StackOverflow上一个类似的帖子的重复,这个帖子并没有为我解决这个问题的答案。

在过去的几天里,我的Python-Selenium脚本使用Chrome驱动104,在无限滚动、动态加载的页面上向下滚动时出现了问题。这个脚本是 用于滚动Facebook和执行某些RPA操作,如发送消息等(我只附上了与错误有关的片段)。

总之,用户输入要达到的帖子数量,脚本将达到这个特定的帖子数量,例如,前1000个帖子,并执行某些操作(不违反Facebbook TOS)。

This script is NOT 在docker实例或任何类型的容器中运行,使用我的全部电脑资源。另外,这个脚本已经在以下设备上测试过了。

1- 拥有16GB内存和i7处理器的Windows 11电脑

2- MacBook - 16 GB

3- Windows Server 2019 - 32GB内存,i7处理器

4- Linux Ubuntu 22.0 服务器 - 16GB内存(在这台服务器上将Dev/shm增加到30GB)。

5- Google Colab Kernel (增加dev/shm)

上述所有的错误追踪都完全相同,都是由于页面崩溃导致会话被删除。

当脚本达到800-900个帖子时(这是一个随机的数字,它曾经为我达到过1,2千个帖子,然后在400个帖子时失败了),页面会变得非常慢,然后崩溃。现在有件事需要注意,我 CAN 在我的电脑上正常滚动超过1500个帖子(如手动),而且它绝对是 DOES NOT 崩溃。所以,我很确定这是我的脚本中的一个错误,而不是因为内存问题(也许是脚本中的内存泄漏,但不是我说的硬件问题)。当脚本崩溃的时候,实际上RAM并没有接近总RAM的80%。

如果我把脚本运行在 无头 模式下,我在Chrome浏览器上会收到一个错误信息,说是。

"Oh Snap, Chrome out of memory"

为了节省你的时间,我在Stackover flow上阅读了以下帖子,他们 didn't help:

1- 未知错误:由于页面崩溃,会话被删除,因为未知错误:无法确定使用ChromeDriver Selenium崩溃的标签的加载状态

2- selenium.WebDriverException: unknown error: session deleted because of page crash from tab crashed

3- Python Selenium会话因未知错误导致的页面崩溃而被删除:无法从标签崩溃中确定加载状态

4- 在执行自动化脚本时得到 "org.openqa.selenium.WebDriverException: unknown error: session deleted because of page crash "错误。 (其中使用了Java,但还是要读一读)

5- Selenium在使用简单的driver.get()方法时出现错误:由于未知错误导致的页面崩溃,会话被删除:无法确定加载状态

我做了什么来尝试和解决这个问题(但没有成功)。

1- 调整窗口的大小,根据这个 post .

2- 使用的Chrome选项 --no-sandbox和-disable-dev-shm-usage

3- 尝试使用 --js-flags (-max_old_space_size=8096)

4- 禁用了所有的通知、地理位置信息和图像。

5- 确保我在mac和linux上的dev/shm足够大,在Windows上的temp文件夹也是如此。

6- 在卷轴之间增加了大量的time.sleep()。

7- 试着使用不同的滚动方法(用javascript进入页面底部,'driver.execute_script()'

8- 使用Firefox GeckoDriver,以及Edge和Opera。

9- 使用不同的方法来检查页面上的帖子数量(Bs4,LXML),这似乎不是问题,因为问题发生在滚动部分。

导致该问题的片段:(代码中没有列出chrome选项,但我从一个单独的文件中加载它们,不过我会在代码后写下它们)

# Start Selenium Imports
from selenium import webdriver
from selenium.webdriver.chrome.options import  Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
# Selenium Imports Finished
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.action_chains import ActionChains
def login(email, password):
    driver.get('https://www.facebook.com/')
    #Email
    driver.find_element(By.NAME,'email').send_keys(email)
    #Password
    driver.find_element(By.NAME,'pass').send_keys(password, Keys.RETURN)
    time.sleep(2)
def reachPosts(noOfPosts = 50) -> None:
    posts = driver.find_element(By.XPATH,"//div[@role='feed']").find_elements(By.CSS_SELECTOR, ".g4tp4svg.mfclru0v.om3e55n1.p8bdhjjv")
    postsNo = len(posts) 
    posts = None
    while  postsNo < noOfPosts+1:
        scroll_down()
        posts = driver.find_element(By.XPATH,"//div[@role='feed']").find_elements(By.CSS_SELECTOR, ".g4tp4svg.mfclru0v.om3e55n1.p8bdhjjv")
        time.sleep(1)
        print(len(posts))
        postsNo = len(posts)
        if postsNo >= 1000:
            time.sleep(10)
        posts = None
    posts = None
#----------------Scroll Function!-----------------------------#
def scroll_down():
    """A method for scrolling the page."""
    # Scroll down to the bottom.
    #driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    for i in range(3):
        actions.send_keys(Keys.SPACE).perform()
#-----------------End-----------------------------------------#
def openGroup(facebookUrl, inputDate):
    print("Opening Facebook Link")
    driver.get(f'{facebookUrl}?sorting_setting=CHRONOLOGICAL')
    time.sleep(2)
    reachPosts(creds["Number of posts"])
    posts = driver.find_element(By.XPATH,"//div[@role='feed']").find_elements(By.CSS_SELECTOR, ".g4tp4svg.mfclru0v.om3e55n1.p8bdhjjv")
    noOfPosts = creds["Number of posts"]
def main():
    global creds
    creds = openCredentials()
    login(creds["email"], creds["password"])
    for group in creds['Facebook Groups']:
        openGroup(group, creds["Date"])
        time.sleep(3)

Chrome Options used:

                     "--disable-extensions",
                    "--disable-application-cache",
                    "--headless"
                    "window-size=600,450",    
                    "--disable-blink-features=AutomationControlled",
                    "--enable-javascript",
                    "disable-infobars",
                    "--js-flags='--max_old_space_size=8196'",
                    "--max_old_space_size=4096",
                    "max_old_space_size=9000",
                    "--disable-dev-shm-usage",
                    "--incognito",
                    "--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"

The error

Traceback (most recent call last):
  File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 313, in <module>
    main()
  File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 302, in main
    openGroup(group, creds["Date"])
  File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 254, in openGroup
    reachPosts(creds["Number of posts"])
  File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 84, in reachPosts
    scroll_down()
  File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 104, in scroll_down
    actions.send_keys(Keys.SPACE).perform()
  File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\common\action_chains.py", line 78, in perform
    self.w3c_actions.perform()
  File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\common\actions\action_builder.py", line 88, in perform
    self.driver.execute(Command.W3C_ACTIONS, enc)
  File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 434, in execute
    self.error_handler.check_response(response)
  File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
  (Session info: chrome=105.0.5195.102)
Stacktrace:
Backtrace:
        Ordinal0 [0x0024DF13+2219795]
        Ordinal0 [0x001E2841+1779777]
        Ordinal0 [0x000F4100+803072]
        Ordinal0 [0x000E6F18+749336]
        Ordinal0 [0x000E5F94+745364]
        Ordinal0 [0x000E6528+746792]
        Ordinal0 [0x000EF42F+783407]
        Ordinal0 [0x000FA938+829752]
        Ordinal0 [0x0014F3CF+1176527]
        Ordinal0 [0x0013E616+1107478]
        Ordinal0 [0x00117F89+950153]
        Ordinal0 [0x00118F56+954198]
        GetHandleVerifier [0x00542CB2+3040210]
        GetHandleVerifier [0x00532BB4+2974420]
        GetHandleVerifier [0x002E6A0A+565546]
        GetHandleVerifier [0x002E5680+560544]
        Ordinal0 [0x001E9A5C+1808988]
        Ordinal0 [0x001EE3A8+1827752]
        Ordinal0 [0x001EE495+1827989]
        Ordinal0 [0x001F80A4+1867940]
        BaseThreadInitThunk [0x76236739+25]
        RtlGetFullPathName_UEx [0x774D90AF+1215]
        RtlGetFullPathName_UEx [0x774D907D+1165]
        (No symbol) [0x00000000]