Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I have a list of URL's where I'm scraping title name of each page by looping the entire list of URLs
The problem is whenever the url is invalid in the list the code is breaking up. so I'm trying to use
try
and
except
to pass the error how ever try and except is not working
Below is the code i'm using,(Please correct if I'm missing something here)
import requests
from bs4 import BeautifulSoup as BS
url_list = ['http://www.aurecongroup.com',
'http://www.bendigoadelaide.com.au',
'http://www.burrell.com.au',
'http://www.dsdbi.vic.gov.au',
'http://www.energyaustralia.com.au',
'http://www.executiveboard.com',
'http://www.mallesons.com',
'https://www.minterellison.com',
'http://www.mta.org.nz',
'http://www.services.nsw.gov.au']
for link in url_list:
r = requests.get(link)
r.encoding = 'utf-8'
html_content = r.text
soup = BS(html_content, 'lxml')
df = soup.title.string
print(df)
except IOError:
Executing the above code is giving me AttributeError: 'NoneType' object has no attribute 'string'
.
Can someone help me with this?
–
–
–
Move your try-catch
into the loop, if you want only that erroneous iteration skipped.
for link in url_list:
r = requests.get(link)
except (IOError, AttributeError):
–
–
–
'http://www.aurecongroup.com',
'http://www.bendigoadelaide.com.au',
'http://www.burrell.com.au',
'http://www.dsdbi.vic.gov.au',
'http://www.energyaustralia.com.au',
'http://www.executiveboard.com',
'http://www.mallesons.com',
'https://www.minterellison.com',
'http://www.mta.org.nz',
'http://www.services.nsw.gov.au'
for link in url_list:
res = requests.get(link)
soup = BeautifulSoup(res.text, 'lxml')
df = soup.title.string.strip()
except (AttributeError, KeyError):
df = ""
print(df)
except IOError:
Partial output including none:
Aurecon – A global engineering and infrastructure advisory company
####It gives the none value
Stockbroking & Superannuation Brisbane | Burrell
Home | Economic Development
Electricity Providers - Gas Suppliers | EnergyAustralia
import requests
from bs4 import BeautifulSoup as BS
url_list = ['Http://www.aurecongroup.com',
'Http://www.burrell.com.au',
'Http://www.dsdbi.vic.gov.au',
'Http://www.energyaustralia.com.au',
'Http://www.executiveboard.com',
'Http://www.mallesons.com',
'Https://www.minterellison.com',
'Http://www.mta.org.nz',
'Http://www.services.nsw.gov.au']
for link in url_list:
r = requests.get(link)
r.encoding = 'utf-8'
html_content = r.text
soup = BS(html_content, 'lxml')
df = soup.title.string
print(df)
except IOError:
Try:
should be lowercase try:
. And miss tabulation after for link in url_list:
.
import requests
from bs4 import BeautifulSoup as BS
url_list = ['Http://www.aurecongroup.com',
'Http://www.burrell.com.au',
'Http://www.dsdbi.vic.gov.au',
'Http://www.energyaustralia.com.au',
'Http://www.executiveboard.com',
'Http://www.mallesons.com',
'Https://www.minterellison.com',
'Http://www.mta.org.nz',
'Http://www.services.nsw.gov.au']
for link in url_list:
r = requests.get(link)
r.encoding = 'utf-8'
html_content = r.text
soup = BS(html_content, 'lxml')
df = soup.title.string
print(df)
except IOError:
–
'Http://www.burrell.com.au',
'Http://www.dsdbi.vic.gov.au',
'Http://www.energyaustralia.com.au',
'Http://www.executiveboard.com',
'Http://www.mallesons.com',
'Https://www.minterellison.com',
'Http://www.mta.org.nz',
'Http://www.services.nsw.gov.au'
for link in url_list:
r = requests.get(link)
r.encoding = 'utf-8'
html_content = r.text
soup = BS(html_content, 'lxml')
df = soup.title.string
print(df)
except IOError:
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.