Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am having some problems in reading a csv file with R.

 x=read.csv("LorenzoFerrone.csv",header=T)
Error in make.names(col.names, unique = TRUE) : 
      invalid multibyte string at '<ff><fe>N'

I can read the file using libre office with no problems.

I can not upload the file because it is full of sensible information.

What can I do?

Setting encoding seem like the solution to the problem.

> x=read.csv("LorenzoFerrone.csv",fileEncoding = "UCS-2LE")
> x[2,1]
[1] Adriano Caruso
100 Levels:  Ada Adriano Caruso adriano diaz Adriano Diaz alberto ferrone Alexey ... Zia Tina
                i never had this error before, but as i can understand from the error message you might have 2 columns with the same name in your file.
– Error404
                Aug 26, 2013 at 13:05
                Hey you are right this seems to work   fileEncoding = "UCS-2LE". I will wait for a bit before to close the question, just to be sure.
– Donbeo
                Aug 26, 2013 at 13:21

This will read the column names as-is and won't return any errors:

x = read.csv(check.names = F)

To remove/replace troublesome characters in column names, use this:

iconv(names(x), to = "ASCII", sub = "")

You can always use the "Latin1" encoding while reading the csv:

 x = read.csv("LorenzoFerrone.csv", fileEncoding = "Latin1", check.names = F)

I am adding check.names = F to avoid replacing spaces by dots within your header.

Typically an encoding issue. You can try to change encoding or else deleting the offending character (just use your favorite editor and replace all instances). In some cases R will spit the char location, for example:

invalid multibyte string 1847

Which should make your life easier. Also note that you may be required to repeat this process several times (deleting all offending characters or trying several encodings).