Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am trying to scrape this website:
https://bartleby.com
, I tried to write a code using Python requests and it works. But I am trying to convert it to PHP because I want the result to be printed on my website and my Cpanel does not read python, so I am forced to use CURL to do this but did not work the code below returns:
Not Found
This page you were trying to reach at this address doesn't seem to exist.
What can I do now?
Sign up for your own free account.
So I am just wondering how this website blocks CURL on PHP but not Requests on Python? Are there any undetectable alternatives to CURL on PHP? Thanks.
My PHP Code (Not Working):
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.bartleby.com/questions-and-answers/1.-a-given-the-lines-l-7-124-tk-13k-1-k-3-and-l-x2-3s-y-1-10s-z-3-5s-determine-the-values-of-k-if-po/b88e3e3d-bfd6-4158-8335-6a3ca420430e');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
curl_setopt($ch, CURLOPT_HTTPHEADER, [
'authority' => 'www.bartleby.com',
'accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language' => 'en-US;q=0.6',
'cache-control' => 'max-age=0',
'sec-fetch-dest' => 'document',
'sec-fetch-mode' => 'navigate',
'sec-fetch-site' => 'same-origin',
'sec-fetch-user' => '?1',
'sec-gpc' => '1',
'upgrade-insecure-requests' => '1',
'user-agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.115 Safari/537.36',
'Accept-Encoding' => 'gzip',
curl_setopt($ch, CURLOPT_COOKIE, 'G_ENABLED_IDPS=google; refreshToken=330bb387263aa6673c3e39e975d729f723b38002; userId=4c28bc2c-1eec-4d2c-b44d-7bfa78216ba3; userStatus=A1; promotionId=; sku=bb999_bookstore; endCycleWhenQuestionsRemainingWasClosed=2022-06-19T07:00:00.000Z; btbHomeDashboardTooltipAnimationCount=0; isNoQuestionAskedModalClosed=true; accessToken=34ceed9609a07bd0238a74b5650d5c5362990498; bartlebyRefreshTokenExpiresAt=2022-07-16T12:37:57.217Z; btbHomeDashboardAnimationTriggerDate=2022-06-17T12:39:25.907Z; OptanonConsent=isGpcEnabled=1&datestamp=Thu+Jun+16+2022+20%3A39%3A43+GMT%2B0800+(China+Standard+Time)&version=6.32.0&isIABGlobal=false&hosts=&consentId=9432e357-0639-4883-9f99-39bed0bb5cd9&interactionCount=0&landingPath=NotLandingPage&groups=C0001%3A1%2CC0003%3A1%2CBG142%3A0%2CC0002%3A0%2CC0005%3A0%2CC0004%3A0&AwaitingReconsent=false');
$response = curl_exec($ch);
echo $response;
curl_close($ch);
I also tried to use file_get_contents() but it returns an error: Warning: file_get_contents(https://bartleby.com): Failed to open stream: HTTP request failed! HTTP/1.1 503 Service Temporarily Unavailable in D:\xampp\htdocs\bartleby\index.php on line 11
Line 11 is $response = file_get_contents($url, false, stream_context_create($arrContextOptions));
Full code (Not Working):
$url= 'https://bartleby.com';
$arrContextOptions=array(
"ssl"=>array(
"verify_peer"=>false,
"verify_peer_name"=>false,
$response = file_get_contents($url, false, stream_context_create($arrContextOptions));
echo $response;
My Python Code (Working):
import requests
cookies = {
'G_ENABLED_IDPS': 'google',
'refreshToken': '330bb387263aa6673c3e39e975d729f723b38002',
'userId': '4c28bc2c-1eec-4d2c-b44d-7bfa78216ba3',
'userStatus': 'A1',
'promotionId': '',
'sku': 'bb999_bookstore',
'endCycleWhenQuestionsRemainingWasClosed': '2022-06-19T07:00:00.000Z',
'btbHomeDashboardTooltipAnimationCount': '0',
'isNoQuestionAskedModalClosed': 'true',
'accessToken': '34ceed9609a07bd0238a74b5650d5c5362990498',
'bartlebyRefreshTokenExpiresAt': '2022-07-16T12:37:57.217Z',
'btbHomeDashboardAnimationTriggerDate': '2022-06-17T12:39:25.907Z',
'OptanonConsent': 'isGpcEnabled=1&datestamp=Thu+Jun+16+2022+20%3A39%3A43+GMT%2B0800+(China+Standard+Time)&version=6.32.0&isIABGlobal=false&hosts=&consentId=9432e357-0639-4883-9f99-39bed0bb5cd9&interactionCount=0&landingPath=NotLandingPage&groups=C0001%3A1%2CC0003%3A1%2CBG142%3A0%2CC0002%3A0%2CC0005%3A0%2CC0004%3A0&AwaitingReconsent=false',
headers = {
'authority': 'www.bartleby.com',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'en-US;q=0.6',
'cache-control': 'max-age=0',
# Requests sorts cookies= alphabetically
# 'cookie': 'G_ENABLED_IDPS=google; refreshToken=330bb387263aa6673c3e39e975d729f723b38002; userId=4c28bc2c-1eec-4d2c-b44d-7bfa78216ba3; userStatus=A1; promotionId=; sku=bb999_bookstore; endCycleWhenQuestionsRemainingWasClosed=2022-06-19T07:00:00.000Z; btbHomeDashboardTooltipAnimationCount=0; isNoQuestionAskedModalClosed=true; accessToken=34ceed9609a07bd0238a74b5650d5c5362990498; bartlebyRefreshTokenExpiresAt=2022-07-16T12:37:57.217Z; btbHomeDashboardAnimationTriggerDate=2022-06-17T12:39:25.907Z; OptanonConsent=isGpcEnabled=1&datestamp=Thu+Jun+16+2022+20%3A39%3A43+GMT%2B0800+(China+Standard+Time)&version=6.32.0&isIABGlobal=false&hosts=&consentId=9432e357-0639-4883-9f99-39bed0bb5cd9&interactionCount=0&landingPath=NotLandingPage&groups=C0001%3A1%2CC0003%3A1%2CBG142%3A0%2CC0002%3A0%2CC0005%3A0%2CC0004%3A0&AwaitingReconsent=false',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'same-origin',
'sec-fetch-user': '?1',
'sec-gpc': '1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.115 Safari/537.36',
response = requests.get('https://www.bartleby.com/questions-and-answers/1.-a-given-the-lines-l-7-124-tk-13k-1-k-3-and-l-x2-3s-y-1-10s-z-3-5s-determine-the-values-of-k-if-po/b88e3e3d-bfd6-4158-8335-6a3ca420430e', cookies=cookies, headers=headers)
print(response.text)
–
–
–
–
You did not set user agent.
It's look like that website required user agent from real user such as Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0.
Here is my code that just work.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.bartleby.com/questions-and-answers/1.-a-given-the-lines-l-7-124-tk-13k-1-k-3-and-l-x2-3s-y-1-10s-z-3-5s-determine-the-values-of-k-if-po/b88e3e3d-bfd6-4158-8335-6a3ca420430e');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);// this is needed.
// But in my code is using user agent from web browser directly.
// You may change this to other.
curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'headerFunction');// for debug response headers only.
$response = curl_exec($ch);
if (curl_errno($ch)) {
echo 'cURL error: ' . curl_error($ch);
echo '<br>';
exit();
echo '<hr>' . PHP_EOL;
echo '<h4>cURL response body</h4>' . PHP_EOL;
echo $response;
curl_close($ch);
unset($ch, $response);
* Header function for debugging
function headerFunction($ch, $header)
echo $header;
echo '<br>';
return mb_strlen($header);
Your code set request headers using wrong array format.
curl_setopt($ch, CURLOPT_HTTPHEADER, [
'authority' => 'www.bartleby.com',
//...
This is WRONG!
It should be...
curl_setopt($ch, CURLOPT_HTTPHEADER, [
'authority: www.bartleby.com',
//...
You can use $reqHeaders = curl_getinfo($ch, CURLINFO_HEADER_OUT);
to debug request headers.
Your current code did not sent user-agent
at all that's why it doesn't work.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.