pandas读文件出现错误UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: unexpected end of

相关文章推荐

讲道义的毛豆 · 一次SQL注入导致的"越权" - ...· 1 年前 ·

风流的沙滩裤 · https站点调用wcf的问题解决_wcf ...· 2 年前 ·

知识渊博的豆芽 · Oracle：高效批量插入数据 - 知乎· 2 年前 ·

千杯不醉的上铺 · 通过 ...· 2 年前 ·

俊秀的豆腐 · pyWavelets工具包的安装及使用_im ...· 2 年前 ·

用pandas读一个txt文件，

data = pd.read_table(os.path.join(project_path, 'src/data/corpus.txt'), sep='\n')

出现如下错误：
'utf-8' codec can't decode bytes in position 0-1: unexpected end of data

这个错误的原因是：

you cannot randomly partition the bytes you've received and then ask UTF-8 to decode it. UTF-8 is a multibyte encoding, meaning you can have anywhere from 1 to 6 bytes to represent one character. If you chop that in half, and ask Python to decode it, it will throw you the unexpected end of data error.

也就是说，UTF-8是多字节编码，1-6位表示一个character，不能随意切分然后要python去解码。

解决方法：

如果是因为文中有汉字，出现编码问题。这种情况应该加上


    encodings ='utf-8'

路径里面有中文。这种情况应该确保路径都为英文字母

如果不是，根据github的这个讨论： https://github.com/pandas-dev/pandas/issues/43540 ,可以加上参数encoding_errors。

data = pd.read_table(os.path.join(project_path, 'src/data/corpus.txt'), sep='\n', encoding_errors='ignore')```

推荐文章

讲道义的毛豆 · 一次SQL注入导致的"越权" - SecIN社区 - 博客园

1 年前

风流的沙滩裤 · https站点调用wcf的问题解决_wcf https_一枚大帅哥的博客-CSDN博客

2 年前

知识渊博的豆芽 · Oracle：高效批量插入数据 - 知乎

2 年前

千杯不醉的上铺 · 通过 GetSystemTimes获取CPU占用率_阿达和自己的博客-CSDN博客

2 年前

俊秀的豆腐 · pyWavelets工具包的安装及使用_import pywt_mishidemudong的博客-CSDN博客

2 年前