Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I'm trying to parse the following JSON and I keep getting a JsonParseException :

"episodes":{ "description":"Episode 3 – Oprah's Surprise Patrol from 1\/20\/04\nTake a trip down memory lane and hear all your favorite episodes of The Oprah Winfrey Show from the last 25 seasons -- everyday on your radio!"

also fails on this JSON

"episodes":{ "description":"After 20 years in sports talk…he’s still the top dog! Catch Christopher “Mad Dog” Russo weekday afternoons on Mad Dog Radio as he tells it like it is…Give the Doggie a call at 888-623-3646."

Exception:

org.codehaus.jackson.JsonParseException: Invalid UTF-8 start byte 0x96
 at [Source: C:\Json Test Files\episodes.txt; line: 3, column: 33]
    at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
    at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
    at org.codehaus.jackson.impl.Utf8StreamParser._reportInvalidInitial(Utf8StreamParser.java:2236)
    at org.codehaus.jackson.impl.Utf8StreamParser._reportInvalidChar(Utf8StreamParser.java:2230)
    at org.codehaus.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1467)
    at org.codehaus.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
    at org.codehaus.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
    at com.niveus.jackson.Main.parseEpisodes(Main.java:37)
    at com.niveus.jackson.Main.main(Main.java:13)

Code:

    public static void main(String [] args) {
        parseEpisodes("C:\\Json Test Files\\episodes.txt");
    public static void parseEpisodes(String filename) {
        JsonFactory factory = new JsonFactory();
        JsonParser parser = null;
        String nameField = null;
        try {
            parser = factory.createJsonParser(new File(filename));
            parser.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, true);
            parser.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER, true);
            JsonToken token = parser.nextToken();
            nameField = parser.getText();
            String desc = null;
            while (token != JsonToken.END_OBJECT) {
                if (nameField.equals("episodes")) {
                    while (token != JsonToken.END_OBJECT) {
                        if (nameField.equals("description")) {
                            parser.nextToken();
                            desc = parser.getText();
                        token = parser.nextToken();
                        nameField = parser.getText();
                token = parser.nextToken();
                nameField = parser.getText();
            System.out.println(desc);
        } catch (JsonParseException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();

The character at column 33 is , and the reason this would be the byte 0x96 is that the file is physically encoded as Windows-1252. You need to save the file in UTF-8, windows-1252 is not a valid encoding for json. How to do this depends on what text editor you are using.

See JSON RFC:

  • Encoding

    JSON text SHALL be encoded in Unicode. The default encoding is
    UTF-8.

  • I download a JSON string response as UTF-8 and write the contents to a .txt file (UTF-8) onto my Android SD storage and then copy that file to my desktop. – Android Noob Dec 11, 2012 at 23:59 Somewhere in the process, what you believe is UTF-8 is actually Windows-1252. My guess is the original download is mislabeled. BTW, 0x96 is the character "en-dash", which is what MS Word uses for a hyphen. In the second example, the culprit is the ellipsis character, also a MS Word idiosyncrasy. – Jim Garrison Dec 12, 2012 at 0:19 @AndroidNoob the byte 0x96 will never appear alone in an utf-8 encoded file since it's a continuation byte. Open the file in notepad, press save as, and select UTF-8 from the Encoding menu. – Esailija Dec 12, 2012 at 0:20

    I know this question is old, but I would like to share something that works for me. It is possible to ignore the character in the following way.

  • Define a charset decoded

    StandardCharsets.UTF_8.newDecoder().onMalformedInput(CodingErrorAction.IGNORE);

  • Use to read the InputStream

    InputStreamReader stream = new InputStreamReader(resource.getInputStream(), CHARSET_DECODER)

  • Use the Jackson CSV mapper to read the content

    new CsvMapper().readerFor(Map.class).readValues(stream);

    The key element here is the charset decoder with the option IGNORE in the malformed input.

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.

  •