Documentation
Overview
Warning
Current implementation is under development, consider to avoid using it in production until 1.0.0
md.html component provides clear high-level,
strict type-hinted lightweight API wrapped
around lxml
libraries
to reduces internal calls number becoming more readable, simpler predictable and
Architecture overview
Installation
Usage example
Search elements with css selector
#!/usr/bin/env python3
import md.html
if __name__ == '__main__':
element = md.html.Element(content=
'''
<div>
<div></div>
<div>
<span class="span">example 1</span>
<span class="span">example 2</span>
<span class="span">example 3</span>
</div>
</div>'
'''
)
for searchable_item in element.css('div:nth-child(2) > :not(span.span:nth-child(1))'):
print(searchable_item.get_text()) # example 2, example 3
Retrieve text content
#!/usr/bin/env python3
import md.html
if __name__ == '__main__':
element = md.html.Element(content='<div>42</div>')
print(element.get_text()) # 42
Attributes
List attribute keys
#!/usr/bin/env python3
import md.html
if __name__ == '__main__':
element = md.html.Element(content='<div id="example-id" data-content="example content"/>')
print(element.list_attributes()) # ['id', 'data-content']
Map attributes
#!/usr/bin/env python3
import md.html
if __name__ == '__main__':
element = md.html.Element(content='<div id="example-id" data-content="example content"/>')
print(element.map_attributes()) # [('id', 'example-id'), ('data-content', 'example content')]
Retrieve attribute value
#!/usr/bin/env python3
import md.html
if __name__ == '__main__':
element = md.html.Element(content='<div id="example-id" data-content="example content"/>')
print(element.get_attribute(attribute='id')) # example-id
print(element.get_attribute(attribute='data-content')) # example content
Initialize from lxml model
#!/usr/bin/env python3
import lxml.html
import md.html
if __name__ == '__main__':
model = lxml.html.Element()
element = md.html.Element(model=model)
No wrapper for low-level method
Consider to use low-level API instead, eg: