HTML Parsers
Sort by:Popular

jsoup is a Java library that simplifies working with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, and xpath selectors.
Last Release on Apr 20, 2026
Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Last Release on Mar 13, 2026
NekoHtml is the Html parser used by HtmlUnit.
Last Release on Dec 28, 2025
JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML.
Last Release on Sep 11, 2024

Relocated → net.sf.jtidy » jtidy
HtmlCleaner is an HTML parser written in Java. It transforms dirty HTML to well-formed XML following the same rules that most web-browsers use.
Last Release on Jun 19, 2023
Apache Tika HTML Parser Module
Last Release on Mar 23, 2026
Java HTML/XML parsers suite
Last Release on Jun 8, 2022
An HTML parser and tag balancer.
Last Release on Apr 17, 2015
Powerful, fast and easy to use HTML and XML parser for Java
Last Release on Jul 30, 2023
JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML.
Last Release on Jul 20, 2010