HTML Purifier is an HTML filter that will take an arbitrary snippet of HTML and rigorously test, validate and filter it into a version that is safe for output onto webpages. It achieves this by:
- Lexing (parsing into tokens) the document,
- Executing various strategies on the tokens:
- Removing all elements not in the whitelist,
- Making the tokens well-formed,
- Fixing the nesting of the nodes, and
- Validating attributes of the nodes; and
- Generating HTML from the purified tokens.
However, most users will only need to interface with the HTMLPurifier and HTMLPurifier_Config.