相关文章推荐
鼻子大的苹果  ·  UnicodeDecodeError: ...·  1 周前    · 
自信的登山鞋  ·  std::begin (Iterator) ...·  9 月前    · 
瘦瘦的人字拖  ·  vmware snapshot api-掘金·  1 年前    · 
朝气蓬勃的茶叶  ·  jQuery ...·  1 年前    · 
温柔的泡面  ·  ora-02069 ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams
File "/usr/lib/python3.1/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 805: invalid start byte

Hi, I get this exception. How do I catch it, and continue reading my files when I get this exception.

My program has a loop that reads a text file line-by-line and tries to do some processing. However, some files I encounter may not be text files, or have lines that are not properly formatted (foreign language etc). I want to ignore those lines.

The following is not working

for line in sys.stdin:
   if line != "":
         matched = re.match(searchstuff, line, re.IGNORECASE)
         print (matched)
      except UnicodeDecodeError, UnicodeEncodeError:
         continue
                There's an entire CHAPTER in the Python tutorial dedicated to errors and exceptions. Try there. docs.python.org/tutorial/errors.html
– Ignacio Vazquez-Abrams
                Dec 29, 2010 at 12:59
                Yeah, I get it. I'm not asking whether Python has features related to errors and exceptions. I am using try, except statements, but these codec decode errors are not getting excepted, resulting in failed jobs.
– Deepak
                Dec 29, 2010 at 13:01

Look at http://docs.python.org/py3k/library/codecs.html. When you open the codecs stream, you probably want to use the additional argument errors='ignore'

In Python 3, sys.stdin is by default opened as a text stream (see http://docs.python.org/py3k/library/sys.html), and has strict error checking.

You need to reopen it as an error-tolerant utf-8 stream. Something like this will work:

sys.stdin = codecs.getreader('utf8')(sys.stdin.detach(), errors='ignore')
                Not... quite. codecs.open() isn't really needed in 3.x, since its capabilities are now a part of open().
– Ignacio Vazquez-Abrams
                Dec 29, 2010 at 13:11
                I'm not calling codec.open() or any function in the codec module. I think it's getting called by re.search()
– Deepak
                Dec 29, 2010 at 13:13
                Thanks Ignacio, the problems are caused because sys.stdin is opened as an error-intolerant utf-8 stream by default in py3k. I've patched my answer.
– user97370
                Dec 29, 2010 at 13:30
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.