我在解析html中的输入标签子窗体时出现问题。我可以使用// input [@type]从根目录解析它们,但不能作为特定节点的子节点。使用HtmlAgilityPack解析节点的子节点的问题
下面是一些代码,说明了这个问题:
private const string HTML_CONTENT =
"
" +
"
Test Page" +
"
" +
"
" +
"
" +
"
Someplace
" +
"
" +
"
" +
"
" +
public void Parser_Test()
var htmlDoc = new HtmlDocument
OptionFixNestedTags = true,
OptionUseIdAttribute = true,
OptionAutoCloseOnEnd = true,
OptionAddDebuggingAttributes = true
byte[] byteArray = Encoding.UTF8.GetBytes(HTML_CONTENT);
var stream = new MemoryStream(byteArray);
htmlDoc.Load(stream, Encoding.UTF8, true);
var nodeCollection = htmlDoc.DocumentNode.SelectNodes("//form");
if (nodeCollection != null && nodeCollection.Count > 0)
foreach (var form in nodeCollection)
var id = form.GetAttributeValue("id", string.Empty);
if (!form.HasChildNodes)
Debug.WriteLine(string.Format("Form {0} has no children", id));
var childCollection = form.SelectNodes("input[@type]");
if (childCollection != null && childCollection.Count > 0)
Debug.WriteLine("Got some child nodes");
Debug.WriteLine("Unable to find input nodes as children of Form");
var inputNodes = htmlDoc.DocumentNode.SelectNodes("//input");
if (inputNodes != null && inputNodes.Count > 0)
Debug.WriteLine(string.Format("Found {0} input nodes when parsed from root", inputNodes.Count));
Debug.WriteLine("Found no forms");
什么是输出:
Form form1 has no children
Unable to find input nodes as children of Form
Form form2 has no children
Unable to find input nodes as children of Form
Found 3 input nodes when parsed from root
我会想到的是,Form 1和Form既能有孩子和输入[@type ]将能够找到2个节点的form1和1的form2
是否有一个特定的配置设置或方法,我没有使用,我应该是?有任何想法吗?
2010-06-23
SteveG
我在解析html中的输入标签子窗体时出现问题。我可以使用// input [@type]从根目录解析它们,但不能作为特定节点的子节点。使用HtmlAgilityPack解析节点的子节点的问题下面是一些代码,说明了这个问题:private const string HTML_CONTENT ="" +"" +"Test Page" +"" +"" +"" +"" +"" +"" +"" +"Some...