相关文章推荐
暴躁的蜡烛  ·  snmp出現Timeout: No ...·  1 年前    · 
安静的莲藕  ·  使用React ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have a string that I got from reading a HTML webpage with bullets that have a symbol like "•" because of the bulleted list. Note that the text is an HTML source from a webpage using Python 2.7's urllib2.read(webaddress) .

I know the unicode character for the bullet character as U+2022 , but how do I actually replace that unicode character with something else?

I tried doing str.replace("•", "something")

but it does not appear to work... how do I do this?

I'm sorry, I'm not going to download a webpage using urllib2 now. What is the type ? str or unicode ? Fred Foo Oct 26, 2012 at 20:16 if your python code contains utf-8 characters, you should use the 'magic comment' # coding=utf8 in the first or the second line of your code. Kinjal Dixit Oct 15, 2013 at 12:09
str.decode("utf-8")
  • Call the replace method and be sure to pass it a Unicode string as its first argument:

    str.decode("utf-8").replace(u"\u2022", "*")
    
  • Encode back to UTF-8, if needed:

    str.decode("utf-8").replace(u"\u2022", "*").encode("utf-8")
    

    (Fortunately, Python 3 puts a stop to this mess. Step 3 should really only be performed just prior to I/O. Also, mind you that calling a string str shadows the built-in type str.)

    could you please elaborate in what way "Python 3 puts a stop to this mess"? How would I do this in Python 3 then? – cryanbhu Apr 9, 2019 at 10:48 When trying: re.sub(u'2022', varcontainingstring, ''), it makes the string empty with nothing in it. – Rolando Oct 26, 2012 at 20:25 @AntonTeodor Regex is less efficient than a simple string search and replace. It will work though – NullUserException Oct 3, 2018 at 16:17

    Try this one.

    you will get the output in a normal string

    str.encode().decode('unicode-escape')
    

    and after that, you can perform any replacement.

    str.replace('•','something')
    
    str1 = "This is Python\u500cPool"
    

    Encode the string to ASCII and replace all the utf-8 characters with '?'.

    str1 = str1.encode("ascii", "replace")
    

    Decode the byte stream to string.

    str1 = str1.decode(encoding="utf-8", errors="ignore")
    

    Replace the question mark with the desired character.

    str1 = str1.replace("?"," ")
    

    If you want to remove all \u character. Code below for you

    def replace_unicode_character(self, content: str):
        content = content.encode('utf-8')
        if "\\x80" in str(content):
            count_unicode = 0
            i = 0
            while i < len(content):
                if "\\x" in str(content[i:i + 1]):
                    if count_unicode % 3 == 0:
                        content = content[:i] + b'\x80\x80\x80' + content[i + 3:]
                    i += 2
                    count_unicode += 1
                i += 1
            content = content.replace(b'\x80\x80\x80', b'')
        return content.decode('utf-8')
            

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.

  •