如何在Python中读取XML头

相关文章推荐

侠义非凡的剪刀 · XML简介与CDATA解释 - ...· 1 月前 ·

低调的爆米花 · Docbook ...· 1 月前 ·

胆小的青椒 · linux报错 find: missing ...· 1 周前 ·

高大的猕猴桃 · iOS开发-使用第三方库AFNetWorki ...· 2 天前 ·

聪明伶俐的黄豆 · C:\forjava\kafkadata\k ...· 5 月前 ·

要出家的吐司 · 大数据处理工具：可处理超过100万行的 ...· 1 年前 ·

心软的吐司 · android - what is ...· 1 年前 ·

豪爽的热水瓶 · Python中gdal实现MODIS遥感影像 ...· 1 年前 ·

神勇威武的水煮肉 · BeautifulSoup Scraper ...· 1 年前 ·

如何在Python 3中读取一个XML文档的标题？

理想情况下，我将使用defusedxml模块作为文件指出，它更安全但在这一点上（经过几个小时的尝试），我愿意接受任何解析器。

例如，我有一个文件（这实际上是来自一个练习），看起来像这样。

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"> <!-- this is root -->
    <!-- CONTENTS -->
</plist>
我在想如何访问根节点之前的所有内容。
这似乎是一个很普遍的问题，我以为会很容易在网上找到答案，但我想我错了。我找到的最接近的东西是这个问题在Stack Overflow上这并没有什么帮助（我研究了一下xml.sax，但找不到任何相关内容）。
    python
xml
python-3.x
xml-parsing
Ratler发布于 2018-02-23
3 个回答
qwermike发布于 2018-02-23
已采纳
0 人赞同

I tried minidom根据《中国共产党纪律处分条例》，该条例容易受到亿万笑料和二次爆破攻击的影响。link you provided. Here is my code:
from xml.dom.minidom import parse
dom = parse('file.xml')
print('<?xml version="{}" encoding="{}"?>'.format(dom.version, dom.encoding))
print(dom.doctype.toxml())
print(dom.getElementsByTagName('plist')[0].previousSibling.toxml())
print(dom.childNodes[0].toxml())
Output:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>
你可以使用defusedxml中的minidom。我下载了那个包，只是用from defusedxml.minidom import parse替换了import，代码工作时的输出结果是一样的。
    
Ratler：
辉煌！这正是我所寻找的。这正是我在寻找的东西。第三个选项（childNodes[0]）似乎是获取所有标题的最通用方法。
qwermike：
我很高兴我有所帮助 :-)
mzjn发布于 2018-02-23
0 人赞同

With the lxml库，你可以通过一个DocInfo object.
from lxml import etree
tree = etree.parse('input.xml')
info = tree.docinfo
v, e, d = info.xml_version, info.encoding, info.doctype
print('<?xml version="{}" encoding="{}"?>'.format(v, e))
print(d)
Output:
 <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    
Ratler：
谢谢!这样做非常好，但我已经接受了@mike-kaskun的答案，因为(a)defusedxml和(b)minidom似乎是一个默认包（至少在我的系统上），而lxml是我必须安装的。
Usman发布于 2018-02-23
0 人赞同

试试这个代码!
我假设临时xml在变量's'中。
I am declare a class of MyParser having a function of XmlDecl to print the XML header & the purpose of second function is to parse the XML header .so first create the parser by using the ParserCreate() function defined in xml.parsers .
Now create the object of MyParser class 'parser' & call the parse function with the object reference.
from xml.parsers import expat
s = """<?xml version='1.0' encoding='iso-8859-1'?>
           <title>Title</title>
           <chapter>Chapter 1</chapter>
       </book>"""
class MyParser(object):
    def XmlDecl(self, version, encoding, standalone):
        print ("XmlDecl", version, encoding, standalone)
    def Parse(self, data):
        Parser = expat.ParserCreate()
        Parser.XmlDeclHandler = self.XmlDecl

推荐文章

侠义非凡的剪刀 · XML简介与CDATA解释 - jenson138

1 月前

低调的爆米花 · Docbook XML文件前置处理Perl程序，解决回车，空格等等影响FOP排版输出的问题

1 月前

胆小的青椒 · linux报错 find: missing argument to `-exec'_zping_6967的技术博客_

1 周前

高大的猕猴桃 · iOS开发-使用第三方库AFNetWorking解析JSON和XML数据 - Beyond平君

2 天前

聪明伶俐的黄豆 · C:\forjava\kafkadata\kafka_2.12-3.5.0>bin\windows\zookeeper-server-start.bat config\zookeeper.proper

5 月前

要出家的吐司 · 大数据处理工具：可处理超过100万行的 CSV 表格，包括以下功能匹对、筛选、合并、Excel转CSV、统计、分/并列、截取_csv超过100万条用什么打开-CSDN博客

1 年前

心软的吐司 · android - what is good setAudioEncodingBitRate on record voice - Stack Overflow

1 年前

豪爽的热水瓶 · Python中gdal实现MODIS遥感影像数据读取与质量控制QC波段筛选及掩膜 - 腾讯云开发者社区-腾讯云

1 年前

神勇威武的水煮肉 · BeautifulSoup Scraper can't find text?AttributeError: ResultSet object has no attribute 'find_all'_python_Mangs-DevPre

1 年前