Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have a list of URL's where I'm scraping title name of each page by looping the entire list of URLs

The problem is whenever the url is invalid in the list the code is breaking up. so I'm trying to use try and except to pass the error how ever try and except is not working

Below is the code i'm using,(Please correct if I'm missing something here)

    import requests
    from bs4 import BeautifulSoup as BS
    url_list = ['http://www.aurecongroup.com',
    'http://www.bendigoadelaide.com.au',
    'http://www.burrell.com.au',
    'http://www.dsdbi.vic.gov.au',
    'http://www.energyaustralia.com.au',
    'http://www.executiveboard.com',
    'http://www.mallesons.com',
    'https://www.minterellison.com',
    'http://www.mta.org.nz',
    'http://www.services.nsw.gov.au']
for link in url_list:
        r = requests.get(link)    
        r.encoding = 'utf-8'
        html_content = r.text
        soup = BS(html_content, 'lxml')
        df = soup.title.string
        print(df)
    except IOError:

Executing the above code is giving me AttributeError: 'NoneType' object has no attribute 'string'. Can someone help me with this?

might be just a typo but try is all lowercase and you have indentation problem in your for loop. Apparently it's not a typo since fixing that works. – Ignacio Vergara Kausel Oct 26, 2017 at 13:10 Also, if you want processing to continue when one element fails, you would need to move your try...except inside the for block, wrapping the loop body. – ryachza Oct 26, 2017 at 13:12 @ryancha could you please help me with the code to proceed further when one element fails in the loop – PieSquare Oct 26, 2017 at 13:25

Move your try-catch into the loop, if you want only that erroneous iteration skipped.

for link in url_list:
        r = requests.get(link)    
    except (IOError, AttributeError):
                Hi @COLDSPEED I'm getting this error in the middle while i'm running the code " AttributeError: 'NoneType' object has no attribute 'string'"
– PieSquare
                Oct 26, 2017 at 13:42
                @MaheshVarma Made an edit. Any further questions go into a new post. If this helped, don't forget to vote on, and accept the answer. Thanks.
– cs95
                Oct 26, 2017 at 13:44
                @MaheshVarma if you’re going to unmark my answer, I’d appreciate an explanation as to why.
– cs95
                Oct 29, 2017 at 17:38
    'http://www.aurecongroup.com',
    'http://www.bendigoadelaide.com.au',
    'http://www.burrell.com.au',
    'http://www.dsdbi.vic.gov.au',
    'http://www.energyaustralia.com.au',
    'http://www.executiveboard.com',
    'http://www.mallesons.com',
    'https://www.minterellison.com',
    'http://www.mta.org.nz',
    'http://www.services.nsw.gov.au'
for link in url_list:   
        res = requests.get(link)    
        soup = BeautifulSoup(res.text, 'lxml')
            df = soup.title.string.strip()
        except (AttributeError, KeyError):
            df = ""
        print(df)
    except IOError:

Partial output including none:

Aurecon – A global engineering and infrastructure advisory company
                                         ####It gives the none value
Stockbroking & Superannuation Brisbane | Burrell
Home | Economic Development
Electricity Providers - Gas Suppliers | EnergyAustralia
import requests
from bs4 import BeautifulSoup as BS
url_list = ['Http://www.aurecongroup.com',
'Http://www.burrell.com.au',
'Http://www.dsdbi.vic.gov.au',
'Http://www.energyaustralia.com.au',
'Http://www.executiveboard.com',
'Http://www.mallesons.com',
'Https://www.minterellison.com',
'Http://www.mta.org.nz',
'Http://www.services.nsw.gov.au']
    for link in url_list:
        r = requests.get(link)    
        r.encoding = 'utf-8'
        html_content = r.text
        soup = BS(html_content, 'lxml')
        df = soup.title.string
        print(df)
except IOError:

Try: should be lowercase try:. And miss tabulation after for link in url_list:.

import requests
from bs4 import BeautifulSoup as BS
url_list = ['Http://www.aurecongroup.com',
            'Http://www.burrell.com.au',
            'Http://www.dsdbi.vic.gov.au',
            'Http://www.energyaustralia.com.au',
            'Http://www.executiveboard.com',
            'Http://www.mallesons.com',
            'Https://www.minterellison.com',
            'Http://www.mta.org.nz',
            'Http://www.services.nsw.gov.au']
    for link in url_list:
        r = requests.get(link)
        r.encoding = 'utf-8'
        html_content = r.text
        soup = BS(html_content, 'lxml')
        df = soup.title.string
        print(df)
except IOError:
                No additional contribution compared to the answer provided earlier. You're invited to contribute in improving the previous answer.
– Ignacio Vergara Kausel
                Oct 26, 2017 at 13:17
    'Http://www.burrell.com.au',
    'Http://www.dsdbi.vic.gov.au',
    'Http://www.energyaustralia.com.au',
    'Http://www.executiveboard.com',
    'Http://www.mallesons.com',
    'Https://www.minterellison.com',
    'Http://www.mta.org.nz',
    'Http://www.services.nsw.gov.au'
    for link in url_list:
        r = requests.get(link)    
        r.encoding = 'utf-8'
        html_content = r.text
        soup = BS(html_content, 'lxml')
        df = soup.title.string
        print(df)
except IOError:
                This does not appear to fix anything. The only difference I see between this and the original is a newline after the imports.
– ryachza
                Oct 26, 2017 at 13:14
                @ryancha could you please help me with the code to proceed further when one element fails in the loop
– PieSquare
                Oct 26, 2017 at 13:27
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.