相关文章推荐
近视的跑步鞋  ·  WSDL validation·  5 月前    · 
潇洒的茶壶  ·  git subtree pull 错误 ...·  10 月前    · 
乐观的松球  ·  .Net ...·  1 年前    · 
年轻有为的领带  ·  Exchange ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Why does find_next_sibling in bs4 work on one line of code but not another, very similar, line of code?

Ask Question

I'm writing a simple web scraper to get data from the Texas Commission on Environmental Quality (TCEQ) website. The info I need is inside 'td' tags. I'm scraping the appropriate 'td' by referencing the preceding 'th', which all have the same text used to ID. I'm using find_next_sibling to scrape the data into a variable.

Here is my code:

import requests
from bs4 import BeautifulSoup
URL = "https://www2.tceq.texas.gov/oce/eer/index.cfm?fuseaction=main.getDetails&target=323191"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html.parser')
###This one works
report = soup.find("th", text="Incident Tracking Number:").find_next_sibling("td").text
###This one doesn't
owner = soup.find("th", text="Name of Owner or Operator:").find_next_sibling("td").text

I'm getting this error: AttributeError: 'NoneType' object has no attribute 'find_next_sibling'. This code has several lines like the two above, and, like them, some of them work and some of them don't. I've looked into the HTML to see if there's another tag, but I'm not seeing it if it's there. Please and thank you for any help!

Are you really sure that the soup.find is actually finding a tag? If not, then that would explain why it throws the error. – Hampus Larsson Oct 23, 2019 at 15:59 It's finding it in the first variable and some others I had written. The html is set up the exact same way for the second one, with the 'td' containing the text I need, while it's preceded by the 'th' with the identifier. Not sure why it finds it in one and not the other. – bclark Oct 23, 2019 at 16:05

When using the text parameter, you should make sure you provide the text exactly. In your case, there's a space at the end.

soup.find('th', text='Name of Owner or Operator: ').find_next_sibling('td').text

This prints:

\n      \n      \n      \n        \n        PHILLIPS 66 COMPANY\n        \n      \n    
                This worked...thanks! I knew it would be something simple. Those random spaces make a huge difference. Good reminder to comb over every little thing. Thanks again!
– bclark
                Oct 23, 2019 at 17:23
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.