相关文章推荐
高大的灌汤包  ·  python ...·  5 月前    · 
气势凌人的伤疤  ·  java ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Related: How can I pretty-print JSON in (unix) shell script?

Is there a (unix) shell script to format XML in human-readable form?

Basically, I want it to transform the following:

<root><foo a="b">lorem</foo><bar value="ipsum" /></root>

... into something like this:

<foo a="b">lorem</foo> <bar value="ipsum" /> </root> To have xmllint available on Debian systems, you need to install the package libxml2-utils (libxml2 does not provide this tool, at least not on Debian 5.0 "Lenny" and 6.0 "Squeeze"). – twonkeys Sep 20, 2013 at 13:03 web browsers (e.g. firefox / chrome) tend to do a good job of pretty-printing XML documents these days. (posting as a comment because this isn't a CLI, but a very convenient alternative) – Sam Mason Mar 29, 2022 at 10:02

xmllint

This utility comes with libxml2-utils:

echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
    xmllint --format -

Perl's XML::Twig

This command comes with XML::Twig module, sometimes xml-twig-tools package:

echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
    xml_pp

xmlstarlet

This command comes with xmlstarlet:

echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
    xmlstarlet format --indent-tab

Check the tidy package:

echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
    tidy -xml -i -

Python

Python's xml.dom.minidom can format XML (works also on legacy python2):

echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
    python -c 'import sys; import xml.dom.minidom; s=sys.stdin.read(); print(xml.dom.minidom.parseString(s).toprettyxml())'

saxon-lint

You need saxon-lint:

echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
    saxon-lint --indent --xpath '/' -

saxon-HE

You need saxon-HE:

 echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
    java -cp /usr/share/java/saxon/saxon9he.jar net.sf.saxon.Query \
    -s:- -qs:/ '!indent=yes'

xidel

You need xidel:

echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
    xidel -s - -se . --output-node-format=xml --output-node-indent

(Credit to Reino)

Output for all commands:

<foo a="b">lorem</foo> <bar value="ipsum"/> </root> Good, quick answer. The first option seems like it'll be more ubiquitous on modern *nix installs. A minor point; but can it be called without working through an intermediate file? I.e., echo '<xml .. />' | xmllint --some-read-from-stdn-option? – svidgen Apr 18, 2013 at 19:08 Note that the "cat data.xml | xmllint --format - | tee data.xml" does not work. On my system it sometimes worked for small files, but always truncated huge files. If you really want to do anything in place read backreference.org/2011/01/29/in-place-editing-of-files – user1346466 Dec 3, 2014 at 18:55 To solve UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 805: ordinal not in range(128) in python version you want to define PYTHONIOENCODING="UTF-8": cat some.xml | PYTHONIOENCODING="UTF-8" python -c 'import sys;import xml.dom.minidom;s=sys.stdin.read();print xml.dom.minidom.parseString(s).toprettyxml()' > pretty.xml – FelikZ Nov 2, 2016 at 11:16 Note that tidy can also format xml with no root element. This is useful to format through a pipe, xml sections (e.g. extracted from logs). echo '<x></x><y></y>' | tidy -xml -iq – Marinos An Oct 9, 2019 at 11:49 didn't find any coloring options? any hints? for now I use vim to get coloring, but then I have to create a newly formatted xml to have good readability again – Markus Dec 9, 2019 at 9:32

xmllint --format yourxmlfile.xml

xmllint is a command line XML tool and is included in libxml2 (http://xmlsoft.org/).

================================================

Note: If you don't have libxml2 installed you can install it by doing the following:

CentOS

cd /tmp
wget ftp://xmlsoft.org/libxml2/libxml2-2.8.0.tar.gz
tar xzf libxml2-2.8.0.tar.gz
cd libxml2-2.8.0/
./configure
sudo make install

Ubuntu

sudo apt-get install libxml2-utils

Cygwin

apt-cyg install libxml2

MacOS

To install this on MacOS with Homebrew just do: brew install libxml2

Also available on Git if you want the code: git clone git://git.gnome.org/libxml2

sputnick's answer contains this information, but crmpicco's answer is the most useful answer here to the general question about how to pretty print XML. – Seth Difley Nov 26, 2014 at 18:08 we can write out that formatted xml output to some other xml file and use that.. eg xmllint --format yourxmlfile.xml >> new-file.xml – LearnToLive Jan 13, 2016 at 15:53 This works on Windows too; git for Windows download even installs a recent version of xmllint. Example: "C:\Program Files\Git\usr\bin\xmllint.exe" --format QCScaper.test@borland.com.cds.xml > QCScaper.test@borland.com.pretty-printed.cds.xml – Jeroen Wiert Pluimers Dec 21, 2017 at 7:46 From MacOS with libxml2 installed via brew. To unminify an xml and save it to a new file for me it worked this command xmllint --format in.xml > out.xml – Ax_ Jul 5, 2021 at 20:34

You can also use tidy, which may need to be installed first (e.g. on Ubuntu: sudo apt-get install tidy).

For this, you would issue something like following:

tidy -xml -i your-file.xml > output.xml

Note: has many additional readability flags, but word-wrap behavior is a bit annoying to untangle (http://tidy.sourceforge.net/docs/quickref.html).

Helpful, because I couldn't get xmllint to add linebreaks to a single line xml file. Thanks! – xlttj Nov 12, 2014 at 16:00 BTW, here are some options that I have found useful: tidy --indent yes --indent-spaces 4 --indent-attributes yes --wrap-attributes yes --input-xml yes --output-xml yes < InFile.xml > OutFile.xml. – Victor Yarema Feb 19, 2016 at 10:02 Great tip @VictorYarema. I combined it with pygmentize and added it to my .bashrc: alias prettyxml='tidy --indent yes --indent-spaces 4 --indent-attributes yes --wrap-attributes yes --input-xml yes --output-xml yes | pygmentize -l xml' and then can curl url | prettyxml – Net Wolf Nov 12, 2017 at 23:45 You don't need the cat step: tidy -xml -iq filename.xml. Also, you can even do tidy -xml -iq filename.xml using the -m option to modify the original file... – janniks Mar 3, 2020 at 8:36

You didn't mention a file, so I assume you want to provide the XML string as standard input on the command line. In that case, do the following:

$ echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' | xmllint --format -

I think xmllint -o tst.xml --format tst.xml should be safe as the parser will fully load the input into a tree before opening the output to serialize it.

Indent level is controlled by XMLLINT_INDENT environment variable which is by default 2 spaces. Example how to change indent to 4 spaces:

XMLLINT_INDENT='    '  xmllint -o out.xml --format in.xml

You may have lack with --recover option when you XML documents are broken. Or try weak HTML parser with strict XML output:

xmllint --html --xmlout <in.xml >out.xml

--nsclean, --nonet, --nocdata, --noblanks etc may be useful. Read man page.

apt-get install libxml2-utils
dnf install libxml2
apt-cyg install libxml2
brew install libxml2

This simple(st) solution doesn't provide indentation, but it is nevertheless much easier on the human eye. Also it allows the xml to be handled more easily by simple tools like grep, head, awk, etc.

Use sed to replace '<' with itself preceeded with a newline.

And as mentioned by Gilles, it's probably not a good idea to use this in production.

# check you are getting more than one line out
sed 's/</\n</g' sample.xml | wc -l
# check the output looks generally ok
sed 's/</\n</g' sample.xml | head
# capture the pretty xml in a different file
sed 's/</\n</g' sample.xml > prettySample.xml

This took me forever to find something that works on my mac. Here's what worked for me:

brew install xmlformat
cat unformatted.html | xmlformat
                there is also yq -P but I tried it and looks like not really working. Just yq --input-format xml --output-format xml produced a well formatted XML
– Sergey Ponomarev
                Mar 18 at 16:39

With :

$ xidel -s input.xml -e . --output-node-format=xml --output-node-indent
$ xidel -s input.xml -e 'serialize(.,{"indent":true()})'
$ echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' | \
  xidel -se . --output-node-format=xml --output-node-indent
$ echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' | \
  xidel -se 'serialize(.,{"indent":true()})'

Edit:

Disclaimer: you should usually prefer installing a mature tool like xmllint to do a job like this. XML/HTML can be a horribly mutilated mess. However, there are valid situations where using existing tooling is preferable over manually installing new ones, and where it is also a safe bet the XML's source is valid (enough). I've written this script for one of those cases, but they are rare, so precede with caution.

I'd like to add a pure Bash solution, as it is not 'that' difficult to just do it by hand, and sometimes you won't want to install an extra tool to do the job.

#!/bin/bash
declare -i currentIndent=0
declare -i nextIncrement=0
while read -r line ; do
  currentIndent+=$nextIncrement
  nextIncrement=0
  if [[ "$line" == "</"* ]]; then # line contains a closer, just decrease the indent
    currentIndent+=-1
    dirtyStartTag="${line%%>*}"
    dirtyTagName="${dirtyStartTag%% *}"
    tagName="${dirtyTagName//</}"
    # increase indent unless line contains closing tag or closes itself
    if [[ ! "$line" =~ "</$tagName>" && ! "$line" == *"/>"  ]]; then
      nextIncrement+=1
  # print with indent
  printf "%*s%s" $(( $currentIndent * 2 )) # print spaces for the indent count
  echo $line
done <<< "$(cat - | sed 's/></>\n</g')" # separate >< with a newline

Paste it in a script file, and pipe in the xml. This assumes the xml is all on one line, and there are no extra spaces anywhere. One could easily add some extra \s* to the regexes to fix that.

Because parsing XML/HTML with anything else than a real parser is (or will be soon) plain buggy. If it's a small personal script on a personal computer, up to you, but for production, no way. It will break ! – Gilles Quénot Jun 19, 2020 at 13:13 I agree XML/HTML can be horribly mutilated, but it does depend on the source. I wrote this for some XML we generate ourselves, so it is a pretty safe bet there. – Leon S. Jun 19, 2020 at 14:06 nicholas@mordor:~/flwor$ basex BaseX 9.0.1 [Standalone] Try 'help' to get more information. > create database pretty Database 'pretty' created in 231.32 ms. > open pretty Database 'pretty' was opened in 0.05 ms. > set parser xml PARSER: xml > add ugly.xml Resource(s) added in 161.88 ms. > xquery . <foo a="b">lorem</foo> <bar value="ipsum"/> </root> Query executed in 179.04 ms. Have fun. nicholas@mordor:~/flwor$

if only because then it's "in" a database, and not "just" a file. Easier to work with, to my mind.

Subscribing to the belief that others have worked this problem out already. If you prefer, no doubt eXist might even be "better" at formatting xml, or as good.

You can always query the data various different ways, of course. I kept it as simple as possible. You can just use a GUI, too, but you specified console.

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.