Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I'm trying to write a program that looks at a .CSV file (input.csv) and rewrites only the rows that begin with a certain element (corrected.csv), as listed in a text file (output.txt).
This is what my program looks like right now:
import csv
lines = []
with open('output.txt','r') as f:
for line in f.readlines():
lines.append(line[:-1])
with open('corrected.csv','w') as correct:
writer = csv.writer(correct, dialect = 'excel')
with open('input.csv', 'r') as mycsv:
reader = csv.reader(mycsv)
for row in reader:
if row[0] not in lines:
writer.writerow(row)
Unfortunately, I keep getting this error, and I have no clue what it's about.
Traceback (most recent call last):
File "C:\Python32\Sample Program\csvParser.py", line 12, in <module>
for row in reader:
_csv.Error: line contains NULL byte
Credit to all the people here to even to get me to this point.
–
–
–
–
–
I'm guessing you have a NUL byte in input.csv. You can test that with
if '\0' in open('input.csv').read():
print "you have null bytes in your input file"
else:
print "you don't"
if you do,
reader = csv.reader(x.replace('\0', '') for x in mycsv)
may get you around that. Or it may indicate you have utf16 or something 'interesting' in the .csv file.
–
–
–
–
–
–
–
–
–
–
–
–
You could just inline a generator to filter out the null values if you want to pretend they don't exist. Of course this is assuming the null bytes are not really part of the encoding and really are some kind of erroneous artifact or bug.
See the (line.replace('\0','') for line in f)
below, also you'll want to probably open that file up using mode rb
.
import csv
lines = []
with open('output.txt','r') as f:
for line in f.readlines():
lines.append(line[:-1])
with open('corrected.csv','w') as correct:
writer = csv.writer(correct, dialect = 'excel')
with open('input.csv', 'rb') as mycsv:
reader = csv.reader( (line.replace('\0','') for line in mycsv) )
for row in reader:
if row[0] not in lines:
writer.writerow(row)
–
–
with open('corrected.csv','w') as correct:
writer = csv.writer(correct, dialect = 'excel')
with open('input.csv', 'r') as mycsv:
reader = csv.reader(mycsv)
for i, row in enumerate(reader):
if row[0] not in lines:
writer.writerow(row)
except csv.Error:
print('csv choked on line %s' % (i+1))
raise
Perhaps this from daniweb would be helpful:
I'm getting this error when reading from a csv file: "Runtime Error!
line contains NULL byte". Any idea about the root cause of this error?
Ok, I got it and thought I'd post the solution. Simply yet caused me
grief... Used file was saved in a .xls format instead of a .csv Didn't
catch this because the file name itself had the .csv extension while
the type was still .xls
–
–
–
–
If you develop under Lunux, you can use all the power of sed:
from subprocess import check_call, CalledProcessError
PATH_TO_FILE = '/home/user/some/path/to/file.csv'
check_call("sed -i -e 's|\\x0||g' {}".format(PATH_TO_FILE), shell=True)
except CalledProcessError as err:
print(err)
The most efficient solution for huge files.
Checked for Python3, Kubuntu
with open(csv_file, 'r', encoding = "utf-8") as f:
reader = csv.reader(fix_nulls(f))
for line in reader:
#do something
this way works for me
Turning my linux environment into a clean complete UTF-8 environment made the trick for me.
Try the following in your command line:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
–
This is long settled, but I ran across this answer because I was experiencing an unexpected error while reading a CSV to process as training data in Keras and TensorFlow.
In my case, the issue was much simpler, and is worth being conscious of. The data being produced into the CSV wasn't consistent, resulting in some columns being completely missing, which seems to end up throwing this error as well.
The lesson: If you're seeing this error, verify that your data looks the way that you think it does!
pandas.read_csv now handles the different UTF encoding when reading/writing and therefore can deal directly with null bytes
data = pd.read_csv(file, encoding='utf-16')
see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
It is very simple.
don't make a csv file by "create new excel" or save as ".csv" from window.
simply import csv module, write a dummy csv file, and then paste your data in that.
csv made by python csv module itself will no longer show you encoding or blank line error.
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.