Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to extract text from pdf and write it into a json file. While extracting unicode characters the Json converts all & to \u0026. For example my actual String is &#1588 . (which represents ش). It prints correctly to a .txt file, to console etc. But when I try to print this string to a Json file it shows \u0026#1588; .

I am using Java, and the code is

Gson gson = new Gson();
String json = gson.toJson(pdfDoc);

Note: pdfDoc is an object, that contains all the details (position, color, font.. etc) of characters inside the input PDF document. I am using gson-2.2.1.jar.

That's actually a valid (but not required) encoding. Any character may be encoded using the unicode escape in JSON and any valid JSON parsing library must be able to interpret those escapes.

& is not part of the characters that need encoding (see the definition of string at json.org), but there are a few JSON libraries that are quite "aggressive" in their encoding. That's not usually a problem, unless you don't really handle the resulting JSON with a conforming JSON parser.

GsonBuilder.disableHtmlEscaping() will help you turn that feature off if you absolutely need to.

Thanks. It worked. I changed the code to Gson gson = new GsonBuilder().disableHtmlEscaping().create(); – Neeraj Oct 3, 2012 at 10:05 One of these "aggressive" libraries is unsplash.com. Using following code to decode it in swift: extension String { func utf8DecodedString()-> String { let data = self.data(using: .utf8) let message = String(data: data!, encoding: .nonLossyASCII) ?? "" return message } func utf8EncodedString()-> String { let messageData = self.data(using: .nonLossyASCII) let text = String(data: messageData!, encoding: .utf8) ?? "" return text } } – JeanNicolas Jan 4, 2022 at 22:23

Using following code to decode \u0026 from a unsplash.com JSON file in Swift:

extension String {
    func utf8DecodedString()-> String {
        let data = self.data(using: .utf8)
        let message = String(data: data!, encoding: .nonLossyASCII) ?? ""
        return message
    func utf8EncodedString()-> String {
        let messageData = self.data(using: .nonLossyASCII)
        let text = String(data: messageData!, encoding: .utf8) ?? ""
        return text
let jsonOriginal = #"Let\u2019s not be na\357ve \u0026 dumb!"#
print(jsonOriginal)
print("----")
let jsonDecoded = jsonOriginal.utf8DecodedString()
print(jsonDecoded)
let jsonEncoded = jsonDecoded.utf8EncodedString()
print(jsonEncoded)

Curiously, encoding leaves & and will not recoding to \u0026??

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.