Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
This simple code segment shows an issue I am having with JSON::XS encoding in Perl:
#!/usr/bin/perl
use strict;
use warnings;
use JSON::XS;
use utf8;
binmode STDOUT, ":encoding(utf8)";
my (%data);
$data{code} = "Gewürztraminer";
print "data{code} = " . $data{code} . "\n";
my $json_text = encode_json \%data;
print $json_text . "\n";
The output this yields is:
johnnyb@boogie:~/Projects/repos > ./jsontest.pl
data{code} = Gewürztraminer
{"code":"Gewürztraminer"}
Now if I comment out the binmode line above I get:
johnnyb@boogie:~/Projects/repos > ./jsontest.pl
data{code} = Gew�rztraminer
{"code":"Gewürztraminer"}
What is happening here? Note that I am trying to fix this behavior in a perl CGI script in which binmode can not be used but I always get the "ü" characters as above returned in the JSON stream. How do I debug this? What am I missing?
encode_json
(short for JSON::XS->new->utf8->encode
) encodes using UTF-8, then you are re-encoding it by printing it to STDOUT to which you've added an encoding layer. Effectively, you are doing encode_utf8(encode_utf8($uncoded_json))
.
Solution 1
use open ':std', ':encoding(utf8)'; # Defaults
binmode STDOUT; # Override defaults
print encode_json(\%data);
Solution 2
use open ':std', ':encoding(utf8)'; # Defaults
print JSON::XS->new->encode(\%data); # Or to_json from JSON.pm
Solution 3
The following works with any encoding on STDOUT by using \u
escapes for non-ASCII:
print JSON::XS->new->ascii->encode(\%data);
use utf8; # Encoding of source code.
use open ':encoding(UTF-8)'; # Default encoding of file handles.
BEGIN {
binmode STDIN; # Usually does nothing on non-Windows.
binmode STDOUT; # Usually does nothing on non-Windows.
binmode STDERR, ':encoding(UTF-8)'; # For text sent to the log file.
use CGI qw( -utf8 );
use JSON::XS qw( );
my $cgi = CGI->new();
my $data = { code => "Gewürztraminer" };
print $cgi->header('application/json');
print encode_json($data);
–
–
–
–
–
JSON::XS
encodes its output into octets. It means the external representation of encoded utf8 string, but it is not unicode string. For more details see perlunicode. In short, content of $json_text
is prepared for transmitting by IO
handler in binary code. If you create scalar content of $data{code}
after use utf8;
you have scalar containing internally encoded unicode characters string. (Which is internally encoded as utf8 but it is implementation detail you should not rely on. Pragma use utf8;
means the source code is encoded as utf8
and nothing else.) If you would like to output both scalars in utf8
encoded IO
handler you have to transform $json_string
into internal unicode chracters string.
use strict;
use warnings;
use JSON::XS;
use utf8;
binmode STDOUT, ":encoding(utf8)";
my (%data);
$data{code} = "Gewürztraminer";
print "data{code} = " . $data{code} . "\n";
my $json_text = encode_json \%data;
utf8::decode($json_text);
print $json_text . "\n";
Or how it is intended to use, output encoded string using IO handler in binary mode.
my $json_text = encode_json \%data;
binmode STDOUT;
print $json_text . "\n";
print utf8::is_utf8($json_text) ? "UTF8" : "OCTETS" . "\n";
to see what is inside.
–
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.