相关文章推荐
忐忑的保温杯  ·  while(ture) ...·  5 月前    · 
个性的韭菜  ·  List Files in a Zip ...·  1 年前    · 
冲动的野马  ·  推荐一款 Flutter Push ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Why do I get an AttributeError when trying to use BeautifulSoup's `.find` to find text in a page? [duplicate]

Ask Question

I am trying to scrape a website with BeautifulSoup but am having a problem. I was following a tutorial done in python 2.7 and it had exactly the same code in it and had no problems.

import urllib.request
from bs4 import *
htmlfile = urllib.request.urlopen("http://en.wikipedia.org/wiki/Steve_Jobs")
htmltext = htmlfile.read()
soup = BeautifulSoup(htmltext)
title = (soup.title.text)
body = soup.find("Born").findNext('td')
print (body.text)

If I try to run the program I get,

Traceback (most recent call last):
  File "C:\Users\USER\Documents\Python Programs\World Population.py", line 13, in <module>
    body = soup.find("Born").findNext('p')
AttributeError: 'NoneType' object has no attribute 'findNext'

Is this a problem with python 3 or am i just too naive?

The find and find_all methods do not search for arbitrary text in the document, they search for HTML tags. The documentation makes that clear (my italics):

Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match. This is the simplest usage:

soup.find_all("title")
# [<title>The Dormouse's story</title>]

That's why your soup.find("Born") is returning None and hence why it complains about NoneType (the type of None) having no findNext() method.

That page you reference contains (at the time this answer was written) eight copies of the word "born", none of which are tags.

Looking at the HTML source for that page, you'll find the best option may be to look for the correct span (formatted for readabilty):

<th scope="row" style="text-align: left;">Born</th>
    <span class="nickname">Steven Paul Jobs</span><br />
    <span style="display: none;">(<span class="bday">1955-02-24</span>)</span>February 24, 1955<br />
                @user391339, though the original question didn't call for it, you can just use the regular Python string search functionality (e.g., the string find or regex search) on the stringified soup doc, either pretty or non-pretty: crummy.com/software/BeautifulSoup/bs4/doc/#pretty-printing
– paxdiablo
                Apr 4, 2016 at 1:26

The find method looks for tags, not text. To find the name, birthday and birthplace, you would have to look up the span elements with the corresponding class name, and access the text attribute of that item:

import urllib.request
from bs4 import *
soup = BeautifulSoup(urllib.request.urlopen("http://en.wikipedia.org/wiki/Steve_Jobs"))
title = soup.title.text
name = soup.find('span', {'class': 'nickname'}).text
bday = soup.find('span', {'class': 'bday'}).text
birthplace = soup.find('span', {'class': 'birthplace'}).text
print(name)
print(bday)
print(birthplace)

Output:

Steven Paul Jobs
1955-02-24
San Francisco, California, US

PS: You don't have to call read on urlopen, BS accept file-like objects.