parsing - InvalidURL: Failed to parse: <Response [403]>

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I tried scraping a website (Futbin) with BeatifulSoup and Requests but when I try to run my code I get this error:

InvalidURL: Failed to parse: <Response [403]>

A way how to fix this problem will be appreciated. The code that I used:

import requests
url = requests.get("https://www.futbin.com/23/player/50188/bruno-guimaraes")
html_doc = url.content
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, "html.parser")
response = requests.get(url)
htmlText = response.text
part1 = htmlText.split ('class="box_price box_price_ps"')                               
part2 = part1.split ("price_big_right")[1]
part3 = part2.split (">")[2]
part4 = part3.split ("<")[0]
part5 = part4.replace (",",".")
wert = float (part5)
I expected to get the value from the class above as a float.
                You can not use beautiful soap itself, you can try with selenium. Here a hint: stackoverflow.com/questions/11047348/…
– Jaky Ruby
                Dec 15, 2022 at 16:36
                @GabeJ21 Your first request returns 403, probably due to not sending headers. But what you get from that request is some html code, then you are sending request to a big code text not an url.
– ghost21blade
                Dec 16, 2022 at 8:43
This response is because the page is blocking your request. It's a way to handle web scrapping. To avoid this answer you have to aggreate headers a your request.
import requests
headers = {'User-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}
res = requests.get('https://www.futbin.com/23/player/50188/bruno-guimaraes',headers=headers)
But, investigating the page i realized the page don´t send a respone with the prices but waits for the page to load so that it loads the prices through a script. Then when you go to extract the price return character '-'.
It is a part of the html that is returned by python
<span class="price_big_right">
    <span id="ps-lowest-1">-</span>
</span>
I recommend use selenium library in this case. I leave you some links to learn about selenium.
Introducing and Instalation
More About Selenium
I have tried sending that site a request including headers and returned 403 code.
But when I was capturing requests I saw that website is protected by CloudFlare.
So Its very impossible to send requests to cloudflare protected sites.
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.