HTML Parsers
Sort by:Popular

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML.
Last Release on Sep 11, 2024

Relocated → net.sf.jtidy » jtidy
An HTML parser and tag balancer.
Last Release on Apr 17, 2015
TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to ...
Last Release on Aug 22, 2011
HTML Parser is the high level syntactical analyzer.
Last Release on Apr 24, 2011
Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
Last Release on Oct 25, 2015
Neko HTML
Last Release on Mar 23, 2008

The Validator.nu HTML Parser is an implementation of the HTML5 parsing algorithm in Java for applications. The parser is designed to work as a drop-in replacement for the XML parser in applications that already support XHTML 1.x content with an XML ...
Last Release on Jun 7, 2012

Relocated → nu.validator » htmlparser
HTML Lexer is the low level lexical analyzer.
Last Release on Apr 24, 2011