Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I tried scraping a website (Futbin) with BeatifulSoup and Requests but when I try to run my code I get this error:

InvalidURL: Failed to parse: <Response [403]>

A way how to fix this problem will be appreciated. The code that I used:

import requests
url = requests.get("https://www.futbin.com/23/player/50188/bruno-guimaraes")
html_doc = url.content
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, "html.parser")
response = requests.get(url)
htmlText = response.text
part1 = htmlText.split ('class="box_price box_price_ps"')                               
part2 = part1.split ("price_big_right")[1]
part3 = part2.split (">")[2]
part4 = part3.split ("<")[0]
part5 = part4.replace (",",".")
wert = float (part5)

I expected to get the value from the class above as a float.

You can not use beautiful soap itself, you can try with selenium. Here a hint: stackoverflow.com/questions/11047348/… – Jaky Ruby Dec 15, 2022 at 16:36 @GabeJ21 Your first request returns 403, probably due to not sending headers. But what you get from that request is some html code, then you are sending request to a big code text not an url. – ghost21blade Dec 16, 2022 at 8:43

This response is because the page is blocking your request. It's a way to handle web scrapping. To avoid this answer you have to aggreate headers a your request.

import requests
headers = {'User-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}
res = requests.get('https://www.futbin.com/23/player/50188/bruno-guimaraes',headers=headers)

But, investigating the page i realized the page don´t send a respone with the prices but waits for the page to load so that it loads the prices through a script. Then when you go to extract the price return character '-'. It is a part of the html that is returned by python

<span class="price_big_right">
    <span id="ps-lowest-1">-</span>
</span>

I recommend use selenium library in this case. I leave you some links to learn about selenium.

Introducing and Instalation

More About Selenium

I have tried sending that site a request including headers and returned 403 code.

But when I was capturing requests I saw that website is protected by CloudFlare.

So Its very impossible to send requests to cloudflare protected sites.

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.