Good morning,
这是StackOverflow上一个类似的帖子的重复,这个帖子并没有为我解决这个问题的答案。
在过去的几天里,我的Python-Selenium脚本使用Chrome驱动104,在无限滚动、动态加载的页面上向下滚动时出现了问题。这个脚本是 用于滚动Facebook和执行某些RPA操作,如发送消息等(我只附上了与错误有关的片段)。
总之,用户输入要达到的帖子数量,脚本将达到这个特定的帖子数量,例如,前1000个帖子,并执行某些操作(不违反Facebbook TOS)。
This script is NOT 在docker实例或任何类型的容器中运行,使用我的全部电脑资源。另外,这个脚本已经在以下设备上测试过了。
1- 拥有16GB内存和i7处理器的Windows 11电脑
2- MacBook - 16 GB
3- Windows Server 2019 - 32GB内存,i7处理器
4- Linux Ubuntu 22.0 服务器 - 16GB内存(在这台服务器上将Dev/shm增加到30GB)。
5- Google Colab Kernel (增加dev/shm)
上述所有的错误追踪都完全相同,都是由于页面崩溃导致会话被删除。
当脚本达到800-900个帖子时(这是一个随机的数字,它曾经为我达到过1,2千个帖子,然后在400个帖子时失败了),页面会变得非常慢,然后崩溃。现在有件事需要注意,我 CAN 在我的电脑上正常滚动超过1500个帖子(如手动),而且它绝对是 DOES NOT 崩溃。所以,我很确定这是我的脚本中的一个错误,而不是因为内存问题(也许是脚本中的内存泄漏,但不是我说的硬件问题)。当脚本崩溃的时候,实际上RAM并没有接近总RAM的80%。
如果我把脚本运行在 无头 模式下,我在Chrome浏览器上会收到一个错误信息,说是。
"Oh Snap, Chrome out of memory"
为了节省你的时间,我在Stackover flow上阅读了以下帖子,他们 didn't help:
1- 未知错误:由于页面崩溃,会话被删除,因为未知错误:无法确定使用ChromeDriver Selenium崩溃的标签的加载状态
2- selenium.WebDriverException: unknown error: session deleted because of page crash from tab crashed
3- Python Selenium会话因未知错误导致的页面崩溃而被删除:无法从标签崩溃中确定加载状态
4- 在执行自动化脚本时得到 "org.openqa.selenium.WebDriverException: unknown error: session deleted because of page crash "错误。 (其中使用了Java,但还是要读一读)
5- Selenium在使用简单的driver.get()方法时出现错误:由于未知错误导致的页面崩溃,会话被删除:无法确定加载状态
我做了什么来尝试和解决这个问题(但没有成功)。
1- 调整窗口的大小,根据这个 post .
2- 使用的Chrome选项 --no-sandbox和-disable-dev-shm-usage
3- 尝试使用 --js-flags (-max_old_space_size=8096)
4- 禁用了所有的通知、地理位置信息和图像。
5- 确保我在mac和linux上的dev/shm足够大,在Windows上的temp文件夹也是如此。
6- 在卷轴之间增加了大量的time.sleep()。
7- 试着使用不同的滚动方法(用javascript进入页面底部,'driver.execute_script()'
8- 使用Firefox GeckoDriver,以及Edge和Opera。
9- 使用不同的方法来检查页面上的帖子数量(Bs4,LXML),这似乎不是问题,因为问题发生在滚动部分。
导致该问题的片段:(代码中没有列出chrome选项,但我从一个单独的文件中加载它们,不过我会在代码后写下它们)
# Start Selenium Imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
# Selenium Imports Finished
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.action_chains import ActionChains
def login(email, password):
driver.get('https://www.facebook.com/')
#Email
driver.find_element(By.NAME,'email').send_keys(email)
#Password
driver.find_element(By.NAME,'pass').send_keys(password, Keys.RETURN)
time.sleep(2)
def reachPosts(noOfPosts = 50) -> None:
posts = driver.find_element(By.XPATH,"//div[@role='feed']").find_elements(By.CSS_SELECTOR, ".g4tp4svg.mfclru0v.om3e55n1.p8bdhjjv")
postsNo = len(posts)
posts = None
while postsNo < noOfPosts+1:
scroll_down()
posts = driver.find_element(By.XPATH,"//div[@role='feed']").find_elements(By.CSS_SELECTOR, ".g4tp4svg.mfclru0v.om3e55n1.p8bdhjjv")
time.sleep(1)
print(len(posts))
postsNo = len(posts)
if postsNo >= 1000:
time.sleep(10)
posts = None
posts = None
#----------------Scroll Function!-----------------------------#
def scroll_down():
"""A method for scrolling the page."""
# Scroll down to the bottom.
#driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
for i in range(3):
actions.send_keys(Keys.SPACE).perform()
#-----------------End-----------------------------------------#
def openGroup(facebookUrl, inputDate):
print("Opening Facebook Link")
driver.get(f'{facebookUrl}?sorting_setting=CHRONOLOGICAL')
time.sleep(2)
reachPosts(creds["Number of posts"])
posts = driver.find_element(By.XPATH,"//div[@role='feed']").find_elements(By.CSS_SELECTOR, ".g4tp4svg.mfclru0v.om3e55n1.p8bdhjjv")
noOfPosts = creds["Number of posts"]
def main():
global creds
creds = openCredentials()
login(creds["email"], creds["password"])
for group in creds['Facebook Groups']:
openGroup(group, creds["Date"])
time.sleep(3)
Chrome Options used:
"--disable-extensions",
"--disable-application-cache",
"--headless"
"window-size=600,450",
"--disable-blink-features=AutomationControlled",
"--enable-javascript",
"disable-infobars",
"--js-flags='--max_old_space_size=8196'",
"--max_old_space_size=4096",
"max_old_space_size=9000",
"--disable-dev-shm-usage",
"--incognito",
"--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
The error
Traceback (most recent call last):
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 313, in <module>
main()
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 302, in main
openGroup(group, creds["Date"])
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 254, in openGroup
reachPosts(creds["Number of posts"])
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 84, in reachPosts
scroll_down()
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 104, in scroll_down
actions.send_keys(Keys.SPACE).perform()
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\common\action_chains.py", line 78, in perform
self.w3c_actions.perform()
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\common\actions\action_builder.py", line 88, in perform
self.driver.execute(Command.W3C_ACTIONS, enc)
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 434, in execute
self.error_handler.check_response(response)
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
(Session info: chrome=105.0.5195.102)
Stacktrace:
Backtrace:
Ordinal0 [0x0024DF13+2219795]
Ordinal0 [0x001E2841+1779777]
Ordinal0 [0x000F4100+803072]
Ordinal0 [0x000E6F18+749336]
Ordinal0 [0x000E5F94+745364]
Ordinal0 [0x000E6528+746792]
Ordinal0 [0x000EF42F+783407]
Ordinal0 [0x000FA938+829752]
Ordinal0 [0x0014F3CF+1176527]
Ordinal0 [0x0013E616+1107478]
Ordinal0 [0x00117F89+950153]
Ordinal0 [0x00118F56+954198]
GetHandleVerifier [0x00542CB2+3040210]
GetHandleVerifier [0x00532BB4+2974420]
GetHandleVerifier [0x002E6A0A+565546]
GetHandleVerifier [0x002E5680+560544]
Ordinal0 [0x001E9A5C+1808988]
Ordinal0 [0x001EE3A8+1827752]
Ordinal0 [0x001EE495+1827989]
Ordinal0 [0x001F80A4+1867940]
BaseThreadInitThunk [0x76236739+25]
RtlGetFullPathName_UEx [0x774D90AF+1215]
RtlGetFullPathName_UEx [0x774D907D+1165]
(No symbol) [0x00000000]