Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am having some problems in reading a csv file with R.
x=read.csv("LorenzoFerrone.csv",header=T)
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>N'
I can read the file using libre office with no problems.
I can not upload the file because it is full of sensible information.
What can I do?
Setting encoding seem like the solution to the problem.
> x=read.csv("LorenzoFerrone.csv",fileEncoding = "UCS-2LE")
> x[2,1]
[1] Adriano Caruso
100 Levels: Ada Adriano Caruso adriano diaz Adriano Diaz alberto ferrone Alexey ... Zia Tina
–
–
This will read the column names as-is and won't return any errors:
x = read.csv(check.names = F)
To remove/replace troublesome characters in column names, use this:
iconv(names(x), to = "ASCII", sub = "")
You can always use the "Latin1" encoding while reading the csv:
x = read.csv("LorenzoFerrone.csv", fileEncoding = "Latin1", check.names = F)
I am adding check.names = F to avoid replacing spaces by dots within your header.
Typically an encoding issue. You can try to change encoding or else deleting the offending character (just use your favorite editor and replace all instances). In some cases R will spit the char location, for example:
invalid multibyte string 1847
Which should make your life easier.
Also note that you may be required to repeat this process several times (deleting all offending characters or trying several encodings).