[Solved] 'Nonetype' object has no attribute 'findall' while using bs4

相关文章推荐

听话的乌冬面 · Excel 中日期转换成数值的 Java ...· 6 月前 ·

难过的抽屉 · 通配符查询_表格存储(Tablestore) ...· 8 月前 ·

含蓄的铁板烧 · TypeScript基础内容(一): ...· 1 年前 ·

活泼的企鹅 · vbscript ...· 1 年前 ·

坚韧的熊猫 · PostgreSQL中数组类型_postgr ...· 1 年前 ·

soup = BeautifulSoup(urls.text , " html5lib" ) # print(soup.prettify()) content = soup.find( " div" , { " class" : " tt_article_useless_p_margin" }) images = content.findAll( ' img' ) for img in images: img_url = img[ ' src' ]+ " ?original" print (img_url,file=im_link) def get_links(): count=1 for line in tw_link: print (line,count) count+=1 get_images(line) get_links()
What I have tried:
<pre>The code seems to work fine when using a single link, but when i pass the urls to the function i ' m getting the following error.<br /> AttributeError Traceback (most recent call last) in () 23 count+=1 24 get_images(line) ---> 25 get_links()<br /> 1 frames in get_links() 22 print(line,count) 23 count+=1 ---> 24 get_images(line) 25 get_links()<br /> in get_images(urli) 12 print(soup.prettify()) 13 content = soup.find("div", {"class": "tt_article_useless_p_margin"}) ---> 14 images = content.findAll(' img ' ) 15 for img in images: 16 img_url = img[' src ' ]+"?original"<br /> AttributeError: ' NoneType ' object has no attribute ' findAll '
My guess is that i'm triggering some sort of Bot Detection (because when passing a single link different page is loaded not the one that's being loaded currently), is there any way to bypass that..? I've tried using time.sleep(5) but that also didn't work tw_link = open ( " TW_Links.txt" , " r" , encoding = ' utf-8' ) im_link = open ( " DCDN_Links.txt" , " w+" ) kak_link = open ( " KCDN_Links.txt" , " w+" ) def get_images(urlset): for x in urlset: rs = requests.Session() urls=rs.get(x) soup = BeautifulSoup(urls.text , " html5lib" ) content = soup.find( " div" , { " class" : " tt_article_useless_p_margin" }) images = content.findAll( ' img' ) for img in images: img_url = img[ ' src' ]+ " ?original" if " blog" in img_url: print (img_url,file=kak_link) print (img_url) print (img_url,file=im_link) print (img_url) # print(x) time.sleep( 2 ) def get_links(): count=1 linklist = [] for line in tw_link: line = line.replace( " \n" , " " ) linklist.append(line) get_images(linklist) get_links()
For those waiting for a solution, it was pretty simple, i was doubtful of the request module so i intercepted the traffic from the program using proxy and voila turns out the request module also included EOL symbol in the request as well, while it might've worked with most sites this particular site redirected to the 404 Page , so a simple removal of "\n" from the lines read did the trick.

Read the question carefully.

Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.

If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.

Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question. Let's work to help developers, not make them feel stupid.