lmxl 获取所有节点
返回一个列表每个元素都是Element类型,所有节点都包含在其中。
from lxml import etree
html=etree.parse('test',etree.HTMLParser())
result=html.xpath('//*') #//代表获取子孙节点,*代表获取所有
print(type(html))
print(type(result))
print(result)
#
<class 'lxml.etree._ElementTree'>
<class 'list'>
[<Element html at 0x754b210048>, <Element body at 0x754b210108>, <Element div at 0x754b210148>, <Element ul at 0x754b210188>, <Element li at 0x754b2101c8>, <Element a at 0x754b210248>, <Element li at 0x754b210288>, <Element a at 0x754b2102c8>, <Element li at 0x754b210308>, <Element a at 0x754b210208>, <Element li at 0x754b210348>, <Element a at 0x754b210388>, <Element li at 0x754b2103c8>, <Element a at 0x754b210408>]
如要获取li节点,可以使用//后面加上节点名称,然后调用xpath()方法
html.xpath('//li') #获取所有子孙节点的li节点