Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have an API controller that receives information about a media file's path and id3 tags, and saves them to an Active Record instance, using PostgreSQL/Rails.

Sometimes however the user sends strings such as:

"genre"=>"Hip-Hop\u0000Hip-Hop/Rap"

and Rails/Postgres aren't exactly happy about that when trying to persist on save:

An ArgumentError occurred in internals#receive:
 string contains null byte
 activerecord (3.2.21) lib/active_record/connection_adapters/postgresql_adapter.rb:1172:in `send_query_prepared'

How can I clean this string in Ruby to completely remove null bytes?

It looks like you're receiving text in a UTF-16 encoding. Instead of trying to "clean it up", I'd recommend confirming this with the sender and, if that's the case, use Ruby's Encoding to convert the text to UTF-8. Willy-nilly stripping of the character won't help if it's occurring on other diacritics. Also, both Ruby and PostgreSQL can be upset if you try to store a string encoded one way into a field defined for another type of encoding, so you'll need to be thorough. – the Tin Man Mar 28, 2015 at 17:51 unfortunately users are all around the world and I cannot ask them to change up this stuff, so the fix must be server side, even because this data is sent by our and third party applications. – John Smith Mar 28, 2015 at 17:57 It's possible to check a string to see if it's possible to determine its encoding. Sometimes you can get lucky and get a string that is actually all one encoding, which makes it easy to get where you're going. Sometimes you get a string that contains multiple encodings and then have to code for that, but how is left for you to figure out. Asking people to change to you isn't likely to happen unless they need your API/service badly. This is a very gnarly rabbit-hole to fall into and it can devolve into a very tricky situation. – the Tin Man Mar 30, 2015 at 19:41

The gsub method on String is probably suitable. You can just do string.gsub("\u0000", '') to get rid of them.

http://ruby-doc.org/core-2.1.1/String.html#method-i-gsub

No that would work fine too, so would string.tr. In fact tr and delete are both more appropriate than gsub in this case – tpbowden Jan 18, 2018 at 8:20

The String.delete method is a better choice because it performs better than both String.tr and String.gsub; as long as you are replacing with an empty string there is never a reason to choose tr or gsub over delete.

"Hip-Hop\u0000Hip-Hop/Rap".delete("\u0000")
# => "Hip-HopHip-Hop/Rap"

However, in this case, the string does not make sense with the null character simply deleted, and should probably be replaced with a space (or other delimiting character), so we are back to gsub. But please note that "\u0000" is not visible, so generally a deletion makes sense.

"Hip-Hop\u0000Hip-Hop/Rap".tr("\u0000", " ")
# => "Hip-Hop Hip-Hop/Rap"

AppSignal has a nice blog post detailing and profiling various string replacement methods.

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.