python - DeprecationWarning: invalid escape sequence - what to use instead of \d?

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I've met a problem with re module in Python 3.6.5. I have this pattern in my regular expression:

'\\nRevision: (\d+)\\n'
But when I run it, I'm getting a DeprecationWarning.
I searched for the problem on SO, and haven't found the answer, actually - what should I use instead of \d+? Just [0-9]+ or maybe something else?
Python 3 interprets string literals as Unicode strings, and therefore your \d is treated as an escaped Unicode character.
Declare your RegEx pattern as a raw string instead by prepending r, as below:
r'\nRevision: (\d+)\n'
This also means you can drop the escapes for \n as well since these will just be parsed as newline characters by re.
                To be little bit more precise, \d is treated as an unrecognized escape sequence and as such is left unchanged. A DeprecationWarning is given since Python 3.6. In some future version of Python it will be a SyntaxError. Details from "2.4.1. String and Bytes literals" in the Docs.
– VPfB
                Apr 6, 2019 at 8:23
                @VPfB the thread is old, but I was looking for answers on the same problem. If \d is treated as an escaped Unicode character, how do I distinguish d (alphabetical character) from \d (any digit) without treating the regex pattern as raw string? (Same question applies to \w, \W etc...)
– giulia_dnt
                Jan 21, 2020 at 16:15
                @theggg If I understand your question correctly - escape your backslash, so the string will read '\\d'.
– ACascarino
                Jan 21, 2020 at 18:10
'\\nRevision: (\d+)\\n'
because Python interprets \d as invalid escape sequence. As is, Python doesn't substitute that sub-string, but warns about it since Version 3.6:
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) It is also important to note that the escape sequences only recognized in string literals fall into the category of unrecognized escapes for bytes literals.
Changed in version 3.6: Unrecognized escape sequences produce a DeprecationWarning. In a future Python version they will be a SyntaxWarning and eventually a SyntaxError.
(source)
Thus, you can fix this warning by either escaping that back-slash properly or using raw strings.
That means, escape more:
'\\nRevision: (\\d+)\\n'
Or, use a raw string literal (where \ doesn't start an escape sequence):
r'\nRevision: (\d+)\n'