Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I get the following error when inserting data from mysql into postgres.
Do I have to manually remove all null characters from my input data?
Is there a way to get postgres to do this for me?
ERROR: invalid byte sequence for encoding "UTF8": 0x00
PostgreSQL doesn't support storing NULL (\0x00) characters in text fields (this is obviously different from the database NULL value, which is fully supported).
Source: http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE
If you need to store the NULL character, you must use a bytea
field - which should store anything you want, but won't support text operations on it.
Given that PostgreSQL doesn't support it in text values, there's no good way to get it to remove it. You could import your data into bytea and later convert it to text using a special function (in perl or something, maybe?), but it's likely going to be easier to do that in preprocessing before you load it.
–
If you are using Java, you could just replace the x00 characters before the insert like following:
myValue.replaceAll("\u0000", "")
The solution was provided and explained by Csaba in following post:
https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de
Respectively:
in Java you can actually have a "0x0" character in your string, and
that's valid unicode. So that's translated to the character 0x0 in
UTF8, which in turn is not accepted because the server uses null
terminated strings... so the only way is to make sure your strings
don't contain the character '\u0000'.
–
–
–
–
You can first insert data into blob field and then copy to text field with the folloing function
CREATE OR REPLACE FUNCTION blob2text() RETURNS void AS $$
Declare
ref record;
i integer;
Begin
FOR ref IN SELECT id, blob_field FROM table LOOP
-- find 0x00 and replace with space
i := position(E'\\000'::bytea in ref.blob_field);
WHILE i > 0 LOOP
ref.bob_field := set_byte(ref.blob_field, i-1, 20);
i := position(E'\\000'::bytea in ref.blobl_field);
END LOOP
UPDATE table SET field = encode(ref.blob_field, 'escape') WHERE id = ref.id;
END LOOP;
End; $$ LANGUAGE plpgsql;
SELECT blob2text();
If you need to store null characters in text fields and don't want to change your data type other than text then you can follow my solution too:
Before insert:
myValue = myValue.replaceAll("\u0000", "SomeVerySpecialText")
After select:
myValue = myValue.replaceAll("SomeVerySpecialText","\u0000")
I've used "null" as my SomeVerySpecialText which I am sure that there will be no any "null" string in my values at all.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.