How to convert a string to utf-8 in Python

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have a browser which sends utf-8 characters to my Python server, but when I retrieve it from the query string, the encoding that Python returns is ASCII. How can I convert the plain string to utf-8?

NOTE: The string passed from the web is already UTF-8 encoded, I just want to make Python to treat it as UTF-8 not ASCII.

In 2018, python 3 if you get ascii decode error do


    "some_string".encode('utf-8').decode('utf-8')

– devssh Sep 26, 2018 at 8:40 >>> unicode_string = u"Hi!" >>> type(plain_string), type(unicode_string) (<type 'str'>, <type 'unicode'>)

^ This is the difference between a byte string (plain_string) and a unicode string.

>>> s = "Hello!"
>>> u = unicode(s, "utf-8")
^ Converting to unicode and specifying the encoding.
In Python 3
All strings are unicode. The unicode function does not exist anymore. See answer from @Noumenon
                ,I am getting the following error: UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 2: invalid start byte This is my code:     ret=[]      for line in csvReader:         cline=[]         for elm in line:             unicodestr = unicode(elm, 'utf-8')             cline.append(unicodestr)         ret.append(cline)
– Gopakumar N G
                Oct 22, 2013 at 6:56
                This code will only work as long as the text does not contain non-ascii characters; a simple accented character on the string will make it fail.
– Haroldo_OK
                Feb 16, 2018 at 10:31
                Hi, if you have "2340" in a string variable, and you want to print the unicode character U+2340 (⍀), is there any way to do that?
– Sha2b
                Nov 5, 2019 at 3:36
If the methods above don't work, you can also tell Python to ignore portions of a string that it can't convert to utf-8:
stringnamehere.decode('utf-8', 'ignore')
                @saran3h it sounds like you're using Python 3, in which case Python should handle encoding issues for you. Have you tried reading your document without specifying an encoding?
– duhaime
                Aug 6, 2018 at 14:56
                Python by default picks system encoding. In windows 10 it's cp1252 which is different from utf-8. I wasted few hours on it while using codecs.open() in py 3.8
– Vishesh Mangla
                Jul 1, 2020 at 15:15
Might be a bit overkill, but when I work with ascii and unicode in same files, repeating decode can be a pain, this is what I use:
def make_unicode(inp):
    if type(inp) != unicode:
        inp =  inp.decode('utf-8')
    return inp
                It is not what OP asks. But avoid such string literals anyway. It creates Unicode string in Python 3 (good) but it is a bytestring in Python 2 (bad). Either add from __future__ import unicode_literals at the top or use u'' prefix. Don't use non-ascii characters in bytes literals. To get utf-8 bytes, you could utf8bytes = unicode_text.encode('utf-8') later if it is necessary.
– jfs
                Apr 26, 2015 at 1:26
                @jfs how will  from __future__ import unicode_literals help me to convert a string with non-ascii characters to utf-8?
– Ortal Turgeman
                Nov 29, 2018 at 17:30
                @OrtalTurgeman I'm not answering the question. Look, it is a comment, not an answer. My comment addresses the issue with the code in the answer. It tries to create a bytestring with non-ascii characters on Python 2 (it is a SyntaxError on Python 3 — bytes literals forbid that).
– jfs
                Nov 29, 2018 at 17:34
If I understand you correctly, you have a utf-8 encoded byte-string in your code.
Converting a byte-string to a unicode string is known as decoding (unicode -> byte-string is encoding).
You do that by using the unicode function or the decode method. Either:
unicodestr = unicode(bytestr, encoding)
unicodestr = unicode(bytestr, "utf-8")
unicodestr = bytestr.decode(encoding)
unicodestr = bytestr.decode("utf-8")
In Python 3.6, they do not have a built-in unicode() method.
Strings are already stored as unicode by default and no conversion is required. Example:
my_str = "\u221a25"
print(my_str)
Translate with ord() and unichar().
Every unicode char have a number asociated, something like an index. So Python have a few methods to translate between a char and his number. Downside is a ñ example. Hope it can help.
>>> C = 'ñ'
>>> U = C.decode('utf8')
u'\xf1'
>>> ord(U)
>>> unichr(241)
u'\xf1'
>>> print unichr(241).encode('utf8')
First, str in Python is represented in Unicode.
Second, UTF-8 is an encoding standard to encode Unicode string to bytes. There are many encoding standards out there (e.g. UTF-16, ASCII, SHIFT-JIS, etc.).
When the client sends data to your server and they are using UTF-8, they are sending a bunch of bytes not str.
You received a str because the "library" or "framework" that you are using, has implicitly converted some random bytes to str.
Under the hood, there is just a bunch of bytes. You just need ask the "library" to give you the request content in bytes and you will handle the decoding yourself (if library can't give you then it is trying to do black magic then you shouldn't use it).
Decode UTF-8 encoded bytes to str: bs.decode('utf-8')
Encode str to UTF-8 bytes: s.encode('utf-8')
The url is translated to ASCII and to the Python server it is just a Unicode string, eg.:
"T%C3%A9st%C3%A3o"
Python understands "é" and "ã" as actual %C3%A9 and %C3%A3.
You can encode an URL just like this:
import urllib
url = "T%C3%A9st%C3%A3o"
print(urllib.parse.unquote(url))
>> Téstão
See https://www.adamsmith.haus/python/answers/how-to-decode-a-utf-8-url-in-python for details.
                What is unidecode? Is it this pypi.org/project/Unidecode? Please provide info if it's a 3rd-party package, and how to install/use it.
– Gino Mempin
                Jul 19, 2021 at 23:27
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.