Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Related:
How can I pretty-print JSON in (unix) shell script?
Is there a (unix) shell script to format XML in human-readable form?
Basically, I want it to transform the following:
<root><foo a="b">lorem</foo><bar value="ipsum" /></root>
... into something like this:
<foo a="b">lorem</foo>
<bar value="ipsum" />
</root>
–
–
xmllint
This utility comes with libxml2-utils
:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
xmllint --format -
Perl's XML::Twig
This command comes with XML::Twig perl module, sometimes xml-twig-tools
package:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
xml_pp
xmlstarlet
This command comes with xmlstarlet
:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
xmlstarlet format --indent-tab
Check the tidy
package:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
tidy -xml -i -
Python
Python's xml.dom.minidom
can format XML (works also on legacy python2):
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
python -c 'import sys; import xml.dom.minidom; s=sys.stdin.read(); print(xml.dom.minidom.parseString(s).toprettyxml())'
saxon-lint
You need saxon-lint
:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
saxon-lint --indent --xpath '/' -
saxon-HE
You need saxon-HE
:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
java -cp /usr/share/java/saxon/saxon9he.jar net.sf.saxon.Query \
-s:- -qs:/ '!indent=yes'
xidel
You need xidel
:
echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
xidel -s - -se . --output-node-format=xml --output-node-indent
(Credit to Reino)
Output for all commands:
<foo a="b">lorem</foo>
<bar value="ipsum"/>
</root>
–
–
–
–
–
xmllint --format yourxmlfile.xml
xmllint is a command line XML tool and is included in libxml2
(http://xmlsoft.org/).
================================================
Note: If you don't have libxml2
installed you can install it by doing the following:
CentOS
cd /tmp
wget ftp://xmlsoft.org/libxml2/libxml2-2.8.0.tar.gz
tar xzf libxml2-2.8.0.tar.gz
cd libxml2-2.8.0/
./configure
sudo make install
Ubuntu
sudo apt-get install libxml2-utils
Cygwin
apt-cyg install libxml2
MacOS
To install this on MacOS with Homebrew just do:
brew install libxml2
Also available on Git if you want the code:
git clone git://git.gnome.org/libxml2
–
–
–
–
You can also use tidy, which may need to be installed first (e.g. on Ubuntu: sudo apt-get install tidy
).
For this, you would issue something like following:
tidy -xml -i your-file.xml > output.xml
Note: has many additional readability flags, but word-wrap behavior is a bit annoying to untangle (http://tidy.sourceforge.net/docs/quickref.html).
–
–
–
–
You didn't mention a file, so I assume you want to provide the XML string as standard input on the command line. In that case, do the following:
$ echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' | xmllint --format -
I think
xmllint -o tst.xml --format tst.xml
should be safe as the parser will fully load the input into a tree
before opening the output to serialize it.
Indent level is controlled by XMLLINT_INDENT
environment variable which is by default 2 spaces. Example how to change indent to 4 spaces:
XMLLINT_INDENT=' ' xmllint -o out.xml --format in.xml
You may have lack with --recover
option when you XML documents are broken. Or try weak HTML parser with strict XML output:
xmllint --html --xmlout <in.xml >out.xml
--nsclean
, --nonet
, --nocdata
, --noblanks
etc may be useful. Read man page.
apt-get install libxml2-utils
dnf install libxml2
apt-cyg install libxml2
brew install libxml2
This simple(st) solution doesn't provide indentation, but it is nevertheless much easier on the human eye. Also it allows the xml to be handled more easily by simple tools like grep, head, awk, etc.
Use sed
to replace '<' with itself preceeded with a newline.
And as mentioned by Gilles, it's probably not a good idea to use this in production.
# check you are getting more than one line out
sed 's/</\n</g' sample.xml | wc -l
# check the output looks generally ok
sed 's/</\n</g' sample.xml | head
# capture the pretty xml in a different file
sed 's/</\n</g' sample.xml > prettySample.xml
This took me forever to find something that works on my mac. Here's what worked for me:
brew install xmlformat
cat unformatted.html | xmlformat
–
With xidel:
$ xidel -s input.xml -e . --output-node-format=xml --output-node-indent
$ xidel -s input.xml -e 'serialize(.,{"indent":true()})'
$ echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' | \
xidel -se . --output-node-format=xml --output-node-indent
$ echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' | \
xidel -se 'serialize(.,{"indent":true()})'
Edit:
Disclaimer: you should usually prefer installing a mature tool like xmllint
to do a job like this. XML/HTML can be a horribly mutilated mess. However, there are valid situations where using existing tooling is preferable over manually installing new ones, and where it is also a safe bet the XML's source is valid (enough). I've written this script for one of those cases, but they are rare, so precede with caution.
I'd like to add a pure Bash solution, as it is not 'that' difficult to just do it by hand, and sometimes you won't want to install an extra tool to do the job.
#!/bin/bash
declare -i currentIndent=0
declare -i nextIncrement=0
while read -r line ; do
currentIndent+=$nextIncrement
nextIncrement=0
if [[ "$line" == "</"* ]]; then # line contains a closer, just decrease the indent
currentIndent+=-1
dirtyStartTag="${line%%>*}"
dirtyTagName="${dirtyStartTag%% *}"
tagName="${dirtyTagName//</}"
# increase indent unless line contains closing tag or closes itself
if [[ ! "$line" =~ "</$tagName>" && ! "$line" == *"/>" ]]; then
nextIncrement+=1
# print with indent
printf "%*s%s" $(( $currentIndent * 2 )) # print spaces for the indent count
echo $line
done <<< "$(cat - | sed 's/></>\n</g')" # separate >< with a newline
Paste it in a script file, and pipe in the xml.
This assumes the xml is all on one line, and there are no extra spaces anywhere. One could easily add some extra \s*
to the regexes to fix that.
–
–
nicholas@mordor:~/flwor$ basex
BaseX 9.0.1 [Standalone]
Try 'help' to get more information.
> create database pretty
Database 'pretty' created in 231.32 ms.
> open pretty
Database 'pretty' was opened in 0.05 ms.
> set parser xml
PARSER: xml
> add ugly.xml
Resource(s) added in 161.88 ms.
> xquery .
<foo a="b">lorem</foo>
<bar value="ipsum"/>
</root>
Query executed in 179.04 ms.
Have fun.
nicholas@mordor:~/flwor$
if only because then it's "in" a database, and not "just" a file. Easier to work with, to my mind.
Subscribing to the belief that others have worked this problem out already. If you prefer, no doubt eXist
might even be "better" at formatting xml
, or as good.
You can always query the data various different ways, of course. I kept it as simple as possible. You can just use a GUI, too, but you specified console.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.