Python BeautifulSoup。通过元素循环来剥离函数中的空白部分

0 人关注

我正在尝试编写一个可以重复使用的函数，以便从搜刮的元素中去除空白。我正在搜刮 h2 、 li 和 p 标签；它们目前被返回为 <tag> string </tag> ，我想去除空白并使用 *.get_text(strip=True) 保存内容。

h_content = soup.select('h2') 将存储所有发现的 h2 标签。

p_content = soup.select('p') 将存储所有找到的 p 标签。

以此类推。

I have been trying this but am not sure how to return the items to the original location, that is to say, return them here --> *_content

def remove_whitespace(tags):
    for item in tags:
        item.get_text(strip=True)
        return item
最理想的情况是最终得到一个我可以重用的函数。
remove_whitespace(*_content)


         4
         
         个评论


           
            你目前得到的输出是什么？


           
            当我在循环内放置return并执行p_content = remove_whitespace(p_content)时，我看到该函数起作用了，但只在第一项中起作用，当我在循环外放置return并再次执行时，我收到一个错误
            
             Traceback (most recent call last): File "<stdin>", line 1, in <module> \ File "<stdin>", line 3, in remove_whitespace \ AttributeError: 'unicode' object has no attribute 'get_text'
            
            。


           
            Martin Evans
           
           ：


           
            你是否试图修改HTML并保存一个去除空白的版本？你能否
            
             edit
            
            您的问题，请举出一些工作实例


           
            是的，@MartinEvans 这就是我想做的事。我已经编辑了我的问题，希望现在能更清楚一点。


         python


         beautifulsoup


        2
        
        个回答


          已采纳


         0
         
         人赞同


          
           The error you got
          
          
           AttributeError: 'unicode' 对象没有'get_text'属性
          
          
           源于给定结果集中的一个元素（
           
            Tag
           
           ），该元素不是
           
            NavigableString
           
           类的实例或后裔。因此，它没有方法
           
            get_text
           
           。
          
          
           另见文档
           
            杂项
           
           common errors.
          
          
           I would suggest to use the
           
            
             字符串生成器
            
           
           如
           
            stripped_strings
           
           或简单的
           
            text
           
           属性。
          
          def remove_whitespace(tags):
    texts = [] 
    for t in tags:
        print(t, type(t))  # debug print to see the type
        texts.append(t.text.strip())
    return texts
'unicode' object has no attribute 'prettify'
How to make BeautifulSoup 'replace_with' attribute work with a 'unicode' object?
BeautifulSoup : TypeError: 'unicode' object is not callable


           
            
             这很有效，它也帮助我找到了我的
             
              AttributeError
             
             的原因。我似乎不能清楚地理解的一件事是，
             
              texts = []
             
             是如何将结果存储在函数中的，但当我做
             
              print(p_content)
             
             时的时候，我看到的是经过清理的结果。列表
             
              res
             
             是如何映射回
             
              p_content
             
             的？


          
           
            使用 "return "会在第一次迭代后退出函数。你需要做这样的事情来阻止这种情况的发生。
           
           def remove_whitespace(tags):