This tool provide by the w3c consortium is geared by an XSLT stylesheet which tries to extract some information from the supplied HTML semantic rich document. It only uses informations available through a good usage of the semantics defined in HTML.
browse: html, semantic, validator, w3c