Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I saw this pattern used for a regular expression in which the goal was to remove non-ascii
characters from a string. What does it mean?
It says something like:
all characters that are not (
^
) in the range
\x20-\x7E
(hex
0x20
to
0x7E
).
According to
http://www.asciitable.com/
, those are characters from space to
~
.
–
–
It means match any characters that are not printing characters.
Printing characters include a to z, A to Z, 0 to 9 and symbols such as ",;$#% etc.
^ not
\x20 hex code for space character
\x7e hex code for ~ (tilde) character
All the ascii printing characters fall between these two.
This statement matches non ascii characters as well as ascii control (non printing) characters such as bell, tab, null and others.
Look at
man ascii
on a unix system to see which characters it matches.
In perl, you could also write this as
[^ -~]
[[:^cntrl:]]
This last one is slightly different, in that it matches any non control character, including extended ascii (e.g. accented characters) and unicode.
You may not want to restrict yourself to just ascii, since non US locations often use valid printing characters outside this small range, e.g. øüéåç...
–
–
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.