相关文章推荐
奔放的楼房  ·  使用Google Chrome ...·  1 月前    · 
儒雅的针织衫  ·  python 报 ...·  4 周前    · 
时尚的匕首  ·  Array.prototype.shift( ...·  10 月前    · 
神勇威武的滑板  ·  mssql ...·  1 年前    · 
坏坏的小熊猫  ·  无法对 ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Hi everyone so this script below is for Selenium but its extremely slow and not feasible for large amount of urls can anyone tell how to convert it into fast Bs4 script and can Beautiful Soup Scrape Click To Show buttons? Thank you everyone for helping me!

from selenium import webdriver
import time
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
chrome_path = r"C:\Users\lenovo\Downloads\chromedriver_win32 (5)\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.maximize_window()
driver.implicitly_wait(10)
driver.get("https://www.autotrader.ca/a/ram/1500/hamilton/ontario/19_12052335_/?showcpo=ShowCpo&ncse=no&ursrc=pl&urp=2&urm=8&sprx=-2")
wait =WebDriverWait(driver,30)
driver.find_element_by_xpath('//button[@class="close-button"]').click()
option = wait.until(EC.element_to_be_clickable((By.XPATH,"//a[text()= 'Click to show']")))
driver.execute_script("arguments[0].scrollIntoView(true);",option)
option.click()
time.sleep(10)
Name = driver.find_element_by_xpath('//p[@class="hero-title"]')
Number = driver.find_element_by_xpath('//div[@class="card-body"]')
print(Name.text,Number.text)

You don't really need to use selenium here, you can simple use requests as the phone number you're looking for is in the HTML (just not visible).

If you click on "view page source" in your browser you can ctrl+f for the phone number:

So you don't need to emulate browser and button clicking - everything is there!

Now lets see how we can scrape this data just by using requests (or any other http client like httpx or aiohttp):

import requests
import re
url = "https://www.autotrader.ca/a/ram/1500/hamilton/ontario/19_12052335_/?showcpo=ShowCpo&ncse=no&ursrc=pl&urp=2&urm=8&sprx=-2"
# we need to pretend that our request is coming from a web browser to get around anti-bot protection by setting user agent string header to a web-browsers one
# in this case we use windows chrome browser user agent string (you can find these online)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
# here we make request for html page
response = requests.get(url, headers=headers)
# now we can use regex patterns to find phone number
phone_number = re.findall('"phoneNumber":"([\d-]+)"', response.text)
["905-870-7127"]
description = re.findall('"description":"(.+?)"', response.text)
['2011 Ram 1500 Sport Crew Cab v8 5.7L - Fully loaded, Crew cab, leather heated/air-conditioned seats, heated leather steering wheel, 5’7 ft box w/ tonneau cover.']

Regex patterns are a bit of work to wrap your head around at first. I suggest googling "regex python tutorial" if you want to learn more but I can explain the pattern we're using here: we want to capture everything in double-quotes that follows "phoneNumber":" string and is either a digit (marked as \d) or a dash (marked as simply -).

This requests script would only take few seconds to complete and use almost no computing resources. However one thing to watch out when using http client compared to Selenium browser emulation is bot blocking which often requires quite a bit of development work to get around though performance gains are really worth it!

Thanks alot man for your help really appreciate it! one question though i'm also trying to scrape name with this strategy name = re.findall('"description":"([+])"', response.text) but output is nothing. Can we make the Output in this form "Name, phoneNumber" in one row i mean – Alian Nadeem Oct 9, 2021 at 4:17 Your code works really good but when i change the url and some other regex to it, it doesn't work. You can see my latest post or this is the link below. extracting name and number same as before. kijijiautos.ca/vip/22686710 This is the new link. – Alian Nadeem Oct 12, 2021 at 0:57

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.