Skip to content

Documentation

Overview

Warning

Current implementation is under development, consider to avoid using it in production until 1.0.0

md.html component provides clear high-level, strict type-hinted lightweight API wrapped around lxml libraries

to reduces internal calls number becoming more readable, simpler predictable and

Architecture overview

Installation

pip install md.html --extra-index-url https://source.md.land/python/

Usage example

Search elements with css selector

#!/usr/bin/env python3
import md.html

if __name__ == '__main__':
    element = md.html.Element(content=
      '''
      <div>
        <div></div>
        <div>
            <span class="span">example 1</span>
            <span class="span">example 2</span>
            <span class="span">example 3</span>
        </div>
      </div>'
      '''
    )

    for searchable_item in element.css('div:nth-child(2) > :not(span.span:nth-child(1))'):
        print(searchable_item.get_text())  # example 2, example 3

Retrieve text content

#!/usr/bin/env python3
import md.html

if __name__ == '__main__':
    element = md.html.Element(content='<div>42</div>')
    print(element.get_text())  # 42

Attributes

List attribute keys

#!/usr/bin/env python3
import md.html

if __name__ == '__main__':
    element = md.html.Element(content='<div id="example-id" data-content="example content"/>')
    print(element.list_attributes())  # ['id', 'data-content']

Map attributes

#!/usr/bin/env python3
import md.html

if __name__ == '__main__':
    element = md.html.Element(content='<div id="example-id" data-content="example content"/>')
    print(element.map_attributes())  # [('id', 'example-id'), ('data-content', 'example content')]

Retrieve attribute value

#!/usr/bin/env python3
import md.html

if __name__ == '__main__':
    element = md.html.Element(content='<div id="example-id" data-content="example content"/>')
    print(element.get_attribute(attribute='id'))  # example-id
    print(element.get_attribute(attribute='data-content'))  # example content

Initialize from lxml model

#!/usr/bin/env python3
import lxml.html
import md.html

if __name__ == '__main__':
    model = lxml.html.Element()
    element = md.html.Element(model=model)

No wrapper for low-level method

Consider to use low-level API instead, eg:

#!/usr/bin/env python3
import md.html

if __name__ == '__main__':
    element = md.html.Element(content=
    '''
        <ol>
            <li>item 1</li>
            <li>item 2</li>
            <li>item 3</li>
        </ol>
    ''')

    for child_model in element.model.getchildren():
        child = md.html.Element(model=child_model)
        # some payload ...