Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Ask Question
I'm writing a simple web scraper to get data from the Texas Commission on Environmental Quality (TCEQ) website. The info I need is inside 'td' tags. I'm scraping the appropriate 'td' by referencing the preceding 'th', which all have the same text used to ID. I'm using find_next_sibling to scrape the data into a variable.
Here is my code:
import requests
from bs4 import BeautifulSoup
URL = "https://www2.tceq.texas.gov/oce/eer/index.cfm?fuseaction=main.getDetails&target=323191"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html.parser')
###This one works
report = soup.find("th", text="Incident Tracking Number:").find_next_sibling("td").text
###This one doesn't
owner = soup.find("th", text="Name of Owner or Operator:").find_next_sibling("td").text
I'm getting this error: AttributeError: 'NoneType' object has no attribute 'find_next_sibling'. This code has several lines like the two above, and, like them, some of them work and some of them don't. I've looked into the HTML to see if there's another tag, but I'm not seeing it if it's there. Please and thank you for any help!
–
–
When using the text
parameter, you should make sure you provide the text exactly. In your case, there's a space at the end.
soup.find('th', text='Name of Owner or Operator: ').find_next_sibling('td').text
This prints:
\n \n \n \n \n PHILLIPS 66 COMPANY\n \n \n
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.