【python 报错】NLTK stopword找不到_Avasla的博客

相关文章推荐
高兴的自行车 · connect() failed ...· 9 月前 ·
豪气的马克杯 · Azure Pipelines 中的作业 ...· 9 月前 ·
慷慨大方的野马 · vue 在浏览器控制台怎么调试 ...· 1 年前 ·
聪明的茶叶 · swift ...· 2 年前 ·
在这里插入图片描述
将下载的文件解压到下面目录里，没有文件夹的新建文件夹：
我在解压时候没有这个目录，找到在/Users/mac/opt/anaconda3文件
在目录下新建一个nltk_data文件夹；
再在nltk_data里建corpora文件夹，将stopword拉进去。
这里记录自己的解决过程。
from nltk.corpus import stopwords
stop = stopwords.words('english')
报错内容： 
---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
~/opt/anaconda3/lib/python3.7/site-packages/nltk/corpus/util.py in __load(self)
     85                 try:
---> 86                     root = nltk.data.find('{}/{}'.format(self.subdir, zip_name))
     87                 except LookupError:
~/opt/anaconda3/lib/python3.7/site-packages/nltk/data.py in find(resource_name, paths)
    700     resource_not_found = '\n%s\n%s\n%s\n' % (sep, msg, sep)
--> 701     raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:
  >>> import nltk
  >>> nltk.download('stopwords')
  For more information see: https://www.nltk.org/data.html
  Attempted to load corpora/stopwords.zip/stopwords/
  Searched in:
    - '/Users/mac/nltk_data'
    - '/Users/mac/opt/anaconda3/nltk_data'
    - '/Users/mac/opt/anaconda3/share/nltk_data'
    - '/Users/mac/opt/anaconda3/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************
During handling of the above exception, another exception occurred:
LookupError                               Traceback (most recent call last)
<ipython-input-18-a8339b7b1fb5> in <module>
      1 from nltk.corpus import stopwords
----> 2 stop = stopwords.words('english')
      3 train['stopwords']=train['tweet'].apply(lambda x: len([x for x in x.split() if x in stop]))
      4 train[['tweet','stopwords']].head()
~/opt/anaconda3/lib/python3.7/site-packages/nltk/corpus/util.py in __getattr__(self, attr)
    121             raise AttributeError("LazyCorpusLoader object has no attribute '__bases__'")
--> 123         self.__load()
    124         # This looks circular, but its not, since __load() changes our
    125         # __class__ to something new:
~/opt/anaconda3/lib/python3.7/site-packages/nltk/corpus/util.py in __load(self)
     86                     root = nltk.data.find('{}/{}'.format(self.subdir, zip_name))
     87                 except LookupError:
---> 88                     raise e
     90         # Load the corpus.
~/opt/anaconda3/lib/python3.7/site-packages/nltk/corpus/util.py in __load(self)
     81         else:
     82             try:
---> 83                 root = nltk.data.find('{}/{}'.format(self.subdir, self.__name))
     84             except LookupError as e:
     85                 try:
~/opt/anaconda3/lib/python3.7/site-packages/nltk/data.py in find(resource_name, paths)
    699     sep = '*' * 70
    700     resource_not_found = '\n%s\n%s\n%s\n' % (sep, msg, sep)
--> 701     raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:
  >>> import nltk
  >>> nltk.download('stopwords')
  For more information see: https://www.nltk.org/data.html
  Attempted to load corpora/stopwords
  Searched in:
    - '/Users/mac/nltk_data'
    - '/Users/mac/opt/anaconda3/nltk_data'
    - '/Users/mac/opt/anaconda3/share/nltk_data'
    - '/Users/mac/opt/anaconda3/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************
根据返回的信息，输入代码下载，依旧失败 
import nltk
nltk.download('stopwords')
直接去网站下载：http://www.nltk.org/nltk_data/
 显示找不到网页，可能被墙掉了。
                    问题描述使用NLTK时，出现stopword资源找不到。解决方法在网上找到公开资源，下载文件后，解压到相应地址。解决方法博客链接查看地址路径：from nltk import dataprint(data.path)将下载的文件解压到下面目录里，没有文件夹的新建文件夹：我在解压时候没有这个目录，找到在/Users/mac/opt/anaconda3文件在目录下新建一个nltk_data文件夹；再在nltk_data里建corpora文件夹，将stopword拉进去。过程
				【Bugs】解决报错[nltk_data] Error loading stopwords: <urlopen error [Errno -3]
一、报错场景
在公共服务器上复现论文代码时报错，代码如下
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
[nltk_data] Error loading stopwords: <urlopen error [Errno -3]
[nltk_
				NLTK包含众多一系列的语料库，这些语料库可以通过nltk.package 导入使用。每一个语料库可以通过一个叫做“语料库读取器”的工具读取语料库，例如：nltk.corpus
每一个语料库都包含许多的文件或者是很多的文档。若要获取这些文件的列表，可以通过语料库的fileids()方法。
import nltk.corpus.brown    #导入brown语料库
brown.fileid
				[nltk_data] Error loading stopwords: 
False
OSError                                   Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_6204/767069256.py in 
				1、nltk.download('stopwords')：
（1）由于实验室服务器无法连接到NLTK指定的URL，所以这里只能手动下载需要的数据，即停用词stopwords。
从运行上面代码报的错误得知，NLTK会默认搜索几个路径，以求找到本地已经下载好的nltk_data：
  Attempted to load corpora/stopwords
  Searched in:
[nltk_data] Error loading stopwords: <urlopen error [Errno 11004]
[nltk_data]   getaddrinfo failed>
Traceback (most recent call last):
 File "C:\Users\Jack\anaconda3\envs\py36_tf17\lib\site-packages\nltk\corpu...
# from nltk.corpus import gutenberg  # 直接加载某个具体语料库
# print gutenberg.fileids()  # 语料库的文本
# emma = gutenberg.words(
>>> import nltk
>>> nltk.download('stopwords')
[nltk_data] Error loading stopwords: <urlopen error [WinError 10060]
[nltk_data]  
				from nltk.corpus import stopwords
stoplist = stopwords.words('english')
text = "this is just a test"
cleanwordlist = [word for word in text.lower().split() if word not in stoplist]
cleanwordlist
				因为实习的缘故，所以有机会接触到了自然语言处理的一些方面。
这里主要总结一下在python环境下进行自然语言处理的相关包和可能会出现的相关错误，目前接触的都比较Low,但是还是想要记录下来。
Nltk是python下处理语言的主要工具包，可以实现去除停用词、词性标注以及分词和分句等。
安装nltk,我写python一般使用的是集成环境EPD，其中有包管理，可以在线进行安装。如果不是集成环境，
1. 打开终端或命令提示符。
2. 输入以下命令来安装nltk库：`pip install nltk`
3. 安装完成后，可以在Python中使用以下代码来验证nltk库是否已经正确安装：
``` python
import nltk
nltk.download('punkt')
如果没有报错，则表示安装成功。
注意：在安装nltk库之前，请确保已经安装了Python环境。如果还没有安装Python，可以从官方网站下载并安装：https://www.python.org/downloads/