I have a very simple csv, with the following data, compressed inside the tar.gz file. I need to read that in dataframe using pandas.read_csv.

0 1 4 1 2 5 2 3 6 import pandas as pd pd.read_csv("sample.tar.gz",compression='gzip')

However, I am getting error:

CParserError: Error tokenizing data. C error: Expected 1 fields in line 440, saw 2

Following are the set of read_csv commands and the different errors I get with them:

pd.read_csv("sample.tar.gz",compression='gzip',  engine='python')
Error: line contains NULL byte
pd.read_csv("sample.tar.gz",compression='gzip', header=0)
CParserError: Error tokenizing data. C error: Expected 1 fields in line 440, saw 2
pd.read_csv("sample.tar.gz",compression='gzip', header=0, sep=" ")
CParserError: Error tokenizing data. C error: Expected 2 fields in line 94, saw 14    
pd.read_csv("sample.tar.gz",compression='gzip', header=0, sep=" ", engine='python')
Error: line contains NULL byte

What's going wrong here? How can I fix this?

Answers

df = pd.read_csv('sample.tar.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)

Note: error_bad_lines=False will ignore the offending rows.

  • 浏览量 12
  • 收藏 0
  • 0

所有评论(0)