HTML Parsers
jsoup is a Java library that simplifies working with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, and xpath selectors.
Last Release on Apr 20, 2026
2.OWASP Java HTML Sanitizer173 usages
com.googlecode.owasp-java-html-sanitizer » owasp-java-html-sanitizer Apache
Takes third-party HTML and produces HTML that is safe to embed in
your web application.
Fast and easy to configure.
Last Release on Mar 13, 2026
JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML.
Last Release on Sep 11, 2024
Relocated → net.sf.jtidy »
jtidy
HtmlCleaner is an HTML parser written in Java. It transforms dirty HTML to well-formed XML following
the same rules that most web-browsers use.
Last Release on Jun 19, 2023
9.ATTOPARSER20 usages
org.attoparser » attoparser Apache
Powerful, fast and easy to use HTML and XML parser for Java
Last Release on Jul 30, 2023
