Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I'm not sure what this is called so I'm having trouble searching for it. How can I decode a string with unicode from http\u00253A\u00252F\u00252Fexample.com to http://example.com with JavaScript? I tried unescape , decodeURI , and decodeURIComponent so I guess the only thing left is string replace.

EDIT: The string is not typed, but rather a substring from another piece of code. So to solve the problem you have to start with something like this:

var s = 'http\\u00253A\\u00252F\\u00252Fexample.com';

I hope that shows why unescape() doesn't work.

@Cameron: The string is from a script which I called innerHTML on to get. This is why alex's answer doesn't work. – styfle Oct 25, 2011 at 5:46

Edit (2017-10-12):

@MechaLynx and @Kevin-Weber note that unescape() is deprecated from non-browser environments and does not exist in TypeScript. decodeURIComponent is a drop-in replacement. For broader compatibility, use the below instead:

decodeURIComponent(JSON.parse('"http\\u00253A\\u00252F\\u00252Fexample.com"'));
> 'http://example.com'

Original answer:

unescape(JSON.parse('"http\\u00253A\\u00252F\\u00252Fexample.com"'));
> 'http://example.com'

You can offload all the work to JSON.parse

Interesting. I did had to add quotes around it unescape(JSON.parse('"' + s + '"')); What is the reason for the extra quotes? Does that make it valid JSON? – styfle Nov 7, 2012 at 1:46 Note that this appears to be significantly faster than the fromCharCode approach: jsperf.com/unicode-func-vs-json-parse – nrabinowitz Apr 1, 2014 at 19:45 Important note about @styfle's answer: Don't use JSON.parse('"' + s + '"') when dealing with untrusted data use JSON.parse('"' + s.replace('"', '\\"') + '"') instead, otherwise your code will break when the input contains quotes. – ntninja Sep 13, 2014 at 17:37 Great answer @alexander255, but you would actually want to use: JSON.parse('"' + str.replace(/\"/g, '\\"' + '"') to replace ALL occurrences of that character throughout the string, rather than replace one. – C. S. May 23, 2016 at 18:01 For those who come across this and are worried because unescape() has been deprecated, decodeURIComponent() works identically to unescape() in this case, so just replace it with that and you're good. – mechalynx Oct 12, 2017 at 16:29

This is a unicode, escaped string. First the string was escaped, then encoded with unicode. To convert back to normal:

var x = "http\\u00253A\\u00252F\\u00252Fexample.com";
var r = /\\u([\d\w]{4})/gi;
x = x.replace(r, function (match, grp) {
    return String.fromCharCode(parseInt(grp, 16)); } );
console.log(x);  // http%3A%2F%2Fexample.com
x = unescape(x);
console.log(x);  // http://example.com

To explain: I use a regular expression to look for \u0025. However, since I need only a part of this string for my replace operation, I use parentheses to isolate the part I'm going to reuse, 0025. This isolated part is called a group.

The gi part at the end of the expression denotes it should match all instances in the string, not just the first one, and that the matching should be case insensitive. This might look unnecessary given the example, but it adds versatility.

Now, to convert from one string to the next, I need to execute some steps on each group of each match, and I can't do that by simply transforming the string. Helpfully, the String.replace operation can accept a function, which will be executed for each match. The return of that function will replace the match itself in the string.

I use the second parameter this function accepts, which is the group I need to use, and transform it to the equivalent utf-8 sequence, then use the built - in unescape function to decode the string to its proper form.

Thanks. Could you explain a little bit about what you're doing? It looks like the regex is looking for a \u prefix and than a 4 character hex number (letters or numbers). How does the function in the replace method work? – styfle Oct 26, 2011 at 1:42 Great solution. In my case, I am encoding all international (non-ascii) characters being sent from the server as escaped unicode, then using your function in the browser to decode the characters to the correct UTF-8 characters. I found that I had to update the following regex in order to catch characters from all languages (i.e. Thai): var r = /\\u([\d\w]{1,})/gi; – Nathan Hanna Mar 4, 2014 at 21:43 Note that this appears to be significantly slower than the JSON.parse approach: jsperf.com/unicode-func-vs-json-parse – nrabinowitz Apr 1, 2014 at 19:45 @IoannisKaradimas There most certainly is such a thing as deprecation in Javascript. To claim that and then support it by stating that older browsers must always be supported is a completely ahistorical perspective. In any case, anyone who wants to use this and also wants to avoid unescape() can use decodeURIComponent() instead. It works identically in this case. I would recommend radicand's approach however, as it is simpler, just as supported and faster to execute, with the same results (make sure to read the comments however). – mechalynx Oct 12, 2017 at 16:31

Note that the use of unescape() is deprecated and doesn't work with the TypeScript compiler, for example.

Based on radicand's answer and the comments section below, here's an updated solution:

var string = "http\\u00253A\\u00252F\\u00252Fexample.com";
decodeURIComponent(JSON.parse('"' + string.replace(/\"/g, '\\"') + '"'));

http://example.com

This doesn't work for some strings, as quotes can break the JSON string and result in JSON parsing errors. I used the other answer (stackoverflow.com/a/7885499/249327) in these cases. – nickdos Sep 4, 2019 at 2:22

Using JSON.decode for this comes with significant drawbacks that you must be aware of:

  • You must wrap the string in double quotes
  • Many characters are not supported and must be escaped themselves. For example, passing any of the following to JSON.decode (after wrapping them in double quotes) will error even though these are all valid: \\n, \n, \\0, a"a
  • It does not support hexadecimal escapes: \\x45
  • It does not support Unicode code point sequences: \\u{045}
  • There are other caveats as well. Essentially, using JSON.decode for this purpose is a hack and doesn't work the way you might always expect. You should stick with using the JSON library to handle JSON, not for string operations.

    I recently ran into this issue myself and wanted a robust decoder, so I ended up writing one myself. It's complete and thoroughly tested and is available here: https://github.com/iansan5653/unraw. It mimics the JavaScript standard as closely as possible.

    Explanation:

    The source is about 250 lines so I won't include it all here, but essentially it uses the following Regex to find all escape sequences and then parses them using parseInt(string, 16) to decode the base-16 numbers and then String.fromCodePoint(number) to get the corresponding character:

    /\\(?:(\\)|x([\s\S]{0,2})|u(\{[^}]*\}?)|u([\s\S]{4})\\u([^{][\s\S]{0,3})|u([\s\S]{0,4})|([0-3]?[0-7]{1,2})|([\s\S])|$)/g
    

    Commented (NOTE: This regex matches all escape sequences, including invalid ones. If the string would throw an error in JS, it throws an error in my library [ie, '\x!!' will error]):

    \\ # All escape sequences start with a backslash (?: # Starts a group of 'or' statements (\\) # If a second backslash is encountered, stop there (it's an escaped slash) | # or x([\s\S]{0,2}) # Match valid hexadecimal sequences | # or u(\{[^}]*\}?) # Match valid code point sequences | # or u([\s\S]{4})\\u([^{][\s\S]{0,3}) # Match surrogate code points which get parsed together | # or u([\s\S]{0,4}) # Match non-surrogate Unicode sequences | # or ([0-3]?[0-7]{1,2}) # Match deprecated octal sequences | # or ([\s\S]) # Match anything else ('.' doesn't match newlines) | # or $ # Match the end of the string ) # End the group of 'or' statements /g # Match as many instances as there are

    Example

    Using that library:

    import unraw from "unraw";
    let step1 = unraw('http\\u00253A\\u00252F\\u00252Fexample.com');
    // yields "http%3A%2F%2Fexample.com"
    // Then you can use decodeURIComponent to further decode it:
    let step2 = decodeURIComponent(step1);
    // yields http://example.com
    

    I don't have enough rep to put this under comments to the existing answers:

    unescape is only deprecated for working with URIs (or any encoded utf-8) which is probably the case for most people's needs. encodeURIComponent converts a js string to escaped UTF-8 and decodeURIComponent only works on escaped UTF-8 bytes. It throws an error for something like decodeURIComponent('%a9'); // error because extended ascii isn't valid utf-8 (even though that's still a unicode value), whereas unescape('%a9'); // © So you need to know your data when using decodeURIComponent.

    decodeURIComponent won't work on "%C2" or any lone byte over 0x7f because in utf-8 that indicates part of a surrogate. However decodeURIComponent("%C2%A9") //gives you © Unescape wouldn't work properly on that // © AND it wouldn't throw an error, so unescape can lead to buggy code if you don't know your data.

    This is not an answer to this exact question, but for those who are hitting this page via a search result and who are trying to (like I was) construct a single Unicode character given a sequence of escaped codepoints, note that you can pass multiple arguments to String.fromCodePoint() like so:

    String.fromCodePoint(parseInt("1F469", 16), parseInt("200D", 16), parseInt("1F4BC", 16)) // 👩‍💼
    

    You can of course parse your string to extract the hex codepoint strings and then do something like:

    let codePoints = hexCodePointStrings.map(s => parseInt(s, 16));
    let str = String.fromCodePoint(...codePoints);
    

    In my case, I was trying to unescape HTML file sth like

    "\u003Cdiv id=\u0022app\u0022\u003E\r\n    \u003Cdiv data-v-269b6c0d\u003E\r\n        \u003Cdiv data-v-269b6c0d class=\u0022menu\u0022\u003E\r\n    \u003Cdiv data-v-269b6c0d class=\u0022faux_column\u0022\u003E\r\n        \u003Cdiv data-v-269b6c0d class=\u0022row\u0022\u003E\r\n            \u003Cdiv data-v-269b6c0d class=\u0022col-md-12\u0022\u003E\r\n"  
    
    <div id="app">
        <div data-v-269b6c0d>
            <div data-v-269b6c0d class="menu">
        <div data-v-269b6c0d class="faux_column">
            <div data-v-269b6c0d class="row">
                <div data-v-269b6c0d class="col-md-12">
    

    Here below works in my case:

    const jsEscape = (str: string) => {
      return str.replace(new RegExp("'", 'g'),"\\'");
    export const decodeUnicodeEntities = (data: any) => {
      return unescape(jsEscape(data));
    // Use it
    const data = ".....";
    const unescaped = decodeUnicodeEntities(data); // Unescaped html
            

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.