Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to scrape this website: https://bartleby.com , I tried to write a code using Python requests and it works. But I am trying to convert it to PHP because I want the result to be printed on my website and my Cpanel does not read python, so I am forced to use CURL to do this but did not work the code below returns:

Not Found
This page you were trying to reach at this address doesn't seem to exist.
What can I do now?
Sign up for your own free account.

So I am just wondering how this website blocks CURL on PHP but not Requests on Python? Are there any undetectable alternatives to CURL on PHP? Thanks.

My PHP Code (Not Working):

$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, 'https://www.bartleby.com/questions-and-answers/1.-a-given-the-lines-l-7-124-tk-13k-1-k-3-and-l-x2-3s-y-1-10s-z-3-5s-determine-the-values-of-k-if-po/b88e3e3d-bfd6-4158-8335-6a3ca420430e'); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET'); curl_setopt($ch, CURLOPT_HTTPHEADER, [ 'authority' => 'www.bartleby.com', 'accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'accept-language' => 'en-US;q=0.6', 'cache-control' => 'max-age=0', 'sec-fetch-dest' => 'document', 'sec-fetch-mode' => 'navigate', 'sec-fetch-site' => 'same-origin', 'sec-fetch-user' => '?1', 'sec-gpc' => '1', 'upgrade-insecure-requests' => '1', 'user-agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.115 Safari/537.36', 'Accept-Encoding' => 'gzip', curl_setopt($ch, CURLOPT_COOKIE, 'G_ENABLED_IDPS=google; refreshToken=330bb387263aa6673c3e39e975d729f723b38002; userId=4c28bc2c-1eec-4d2c-b44d-7bfa78216ba3; userStatus=A1; promotionId=; sku=bb999_bookstore; endCycleWhenQuestionsRemainingWasClosed=2022-06-19T07:00:00.000Z; btbHomeDashboardTooltipAnimationCount=0; isNoQuestionAskedModalClosed=true; accessToken=34ceed9609a07bd0238a74b5650d5c5362990498; bartlebyRefreshTokenExpiresAt=2022-07-16T12:37:57.217Z; btbHomeDashboardAnimationTriggerDate=2022-06-17T12:39:25.907Z; OptanonConsent=isGpcEnabled=1&datestamp=Thu+Jun+16+2022+20%3A39%3A43+GMT%2B0800+(China+Standard+Time)&version=6.32.0&isIABGlobal=false&hosts=&consentId=9432e357-0639-4883-9f99-39bed0bb5cd9&interactionCount=0&landingPath=NotLandingPage&groups=C0001%3A1%2CC0003%3A1%2CBG142%3A0%2CC0002%3A0%2CC0005%3A0%2CC0004%3A0&AwaitingReconsent=false'); $response = curl_exec($ch); echo $response; curl_close($ch);

I also tried to use file_get_contents() but it returns an error: Warning: file_get_contents(https://bartleby.com): Failed to open stream: HTTP request failed! HTTP/1.1 503 Service Temporarily Unavailable in D:\xampp\htdocs\bartleby\index.php on line 11

Line 11 is $response = file_get_contents($url, false, stream_context_create($arrContextOptions));

Full code (Not Working):

$url= 'https://bartleby.com'; $arrContextOptions=array( "ssl"=>array( "verify_peer"=>false, "verify_peer_name"=>false, $response = file_get_contents($url, false, stream_context_create($arrContextOptions)); echo $response;

My Python Code (Working):

import requests
cookies = {
    'G_ENABLED_IDPS': 'google',
    'refreshToken': '330bb387263aa6673c3e39e975d729f723b38002',
    'userId': '4c28bc2c-1eec-4d2c-b44d-7bfa78216ba3',
    'userStatus': 'A1',
    'promotionId': '',
    'sku': 'bb999_bookstore',
    'endCycleWhenQuestionsRemainingWasClosed': '2022-06-19T07:00:00.000Z',
    'btbHomeDashboardTooltipAnimationCount': '0',
    'isNoQuestionAskedModalClosed': 'true',
    'accessToken': '34ceed9609a07bd0238a74b5650d5c5362990498',
    'bartlebyRefreshTokenExpiresAt': '2022-07-16T12:37:57.217Z',
    'btbHomeDashboardAnimationTriggerDate': '2022-06-17T12:39:25.907Z',
    'OptanonConsent': 'isGpcEnabled=1&datestamp=Thu+Jun+16+2022+20%3A39%3A43+GMT%2B0800+(China+Standard+Time)&version=6.32.0&isIABGlobal=false&hosts=&consentId=9432e357-0639-4883-9f99-39bed0bb5cd9&interactionCount=0&landingPath=NotLandingPage&groups=C0001%3A1%2CC0003%3A1%2CBG142%3A0%2CC0002%3A0%2CC0005%3A0%2CC0004%3A0&AwaitingReconsent=false',
headers = {
    'authority': 'www.bartleby.com',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'accept-language': 'en-US;q=0.6',
    'cache-control': 'max-age=0',
    # Requests sorts cookies= alphabetically
    # 'cookie': 'G_ENABLED_IDPS=google; refreshToken=330bb387263aa6673c3e39e975d729f723b38002; userId=4c28bc2c-1eec-4d2c-b44d-7bfa78216ba3; userStatus=A1; promotionId=; sku=bb999_bookstore; endCycleWhenQuestionsRemainingWasClosed=2022-06-19T07:00:00.000Z; btbHomeDashboardTooltipAnimationCount=0; isNoQuestionAskedModalClosed=true; accessToken=34ceed9609a07bd0238a74b5650d5c5362990498; bartlebyRefreshTokenExpiresAt=2022-07-16T12:37:57.217Z; btbHomeDashboardAnimationTriggerDate=2022-06-17T12:39:25.907Z; OptanonConsent=isGpcEnabled=1&datestamp=Thu+Jun+16+2022+20%3A39%3A43+GMT%2B0800+(China+Standard+Time)&version=6.32.0&isIABGlobal=false&hosts=&consentId=9432e357-0639-4883-9f99-39bed0bb5cd9&interactionCount=0&landingPath=NotLandingPage&groups=C0001%3A1%2CC0003%3A1%2CBG142%3A0%2CC0002%3A0%2CC0005%3A0%2CC0004%3A0&AwaitingReconsent=false',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-user': '?1',
    'sec-gpc': '1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.115 Safari/537.36',
response = requests.get('https://www.bartleby.com/questions-and-answers/1.-a-given-the-lines-l-7-124-tk-13k-1-k-3-and-l-x2-3s-y-1-10s-z-3-5s-determine-the-values-of-k-if-po/b88e3e3d-bfd6-4158-8335-6a3ca420430e', cookies=cookies, headers=headers)
print(response.text)
                It would be easier to tell if you posted your Python code as well. It can happen that the only difference is in the requests.
– Adrian
                Jun 16, 2022 at 12:38
                I'm not sure if Curl accepts an associative array for request headers - it needs to just be an array of plain strings in the format header-name: header-value
– iainn
                Jun 16, 2022 at 13:00
                even if I remove the this line: curl_setopt($ch, CURLOPT_HTTPHEADER, $header);  it is still not working.
– Ppap
                Jun 16, 2022 at 13:07
                It seems that whenever I make an HTTP request on that specific website using PHP curl or file_get_contents(), it returns a 503 Service Temporarily Unavailable header which then redirects to their error page. I also tried to use proxies but still did not work.
– Ppap
                Jun 16, 2022 at 13:11

You did not set user agent.

It's look like that website required user agent from real user such as Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0.

Here is my code that just work.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.bartleby.com/questions-and-answers/1.-a-given-the-lines-l-7-124-tk-13k-1-k-3-and-l-x2-3s-y-1-10s-z-3-5s-determine-the-values-of-k-if-po/b88e3e3d-bfd6-4158-8335-6a3ca420430e');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);// this is needed.
// But in my code is using user agent from web browser directly.
// You may change this to other.
curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'headerFunction');// for debug response headers only.
$response = curl_exec($ch);
if (curl_errno($ch)) {
    echo 'cURL error: ' . curl_error($ch);
    echo '<br>';
    exit();
echo '<hr>' . PHP_EOL;
echo '<h4>cURL response body</h4>' . PHP_EOL;
echo $response;
curl_close($ch);
unset($ch, $response);
 * Header function for debugging
function headerFunction($ch, $header)
    echo $header;
    echo '<br>';
    return mb_strlen($header);

Your code set request headers using wrong array format.

curl_setopt($ch, CURLOPT_HTTPHEADER, [
    'authority' => 'www.bartleby.com',
    //...

This is WRONG!
It should be...

curl_setopt($ch, CURLOPT_HTTPHEADER, [
    'authority: www.bartleby.com',
    //...

You can use $reqHeaders = curl_getinfo($ch, CURLINFO_HEADER_OUT); to debug request headers.

Your current code did not sent user-agent at all that's why it doesn't work.

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.