- 授权协议: MIT
- 开发语言: Python Ruby
- 操作系统: 跨平台
- 软件首页: http://code.google.com/p/html5lib/
- 软件文档: http://code.google.com/p/html5lib/wiki/UserDocumentation
软件介绍
html5lib 是一个 Ruby 和 Python 用来解析 HTML 文档的类库,支持HTML 5 以及最大程度兼容桌面浏览器。
主要特性包括:
- Parses valid and invalid HTML documents to a tree
- Support for minidom, ElementTree (including cElementTree and lxml.etree), BeautifulSoup and custom simpletree output formats
- DOM to SAX converter
- Reports parse errors
- Character encoding detection
- XML mode for working with illformed XML e.g. feeds
- Filtering and serializing of trees
- HTML+CSS sanitizer
- Many unit tests
- Faster than before :)