HTML Purifier 4.0.0 features a new configuration naming system that allows arbitrary nesting of namespaces. While there are certain cases in which using two namespaces is obviously better (the canonical example is where we were using AutoFormatParam to contain directives for AutoFormat parameters), it is unclear whether or not a general migration to highly namespaced directives is a good idea or not.
== Case studies ==
=== Attr.* ===
We have a dead duck HTML.Attr.Name.UseCDATA which migrated before we decided to think this out thoroughly.
We currently have a large number of directives in the Attr.* namespace. These directives tweak the behavior of some HTML attributes. They have the properties:
While they apply to only one attribute at a time, the attribute can span over multiple elements (not necessarily all attributes, either). The information of which elements it impacts is either omitted or informally stated (EnableID applies to all elements, DefaultImageAlt applies to tags, AllowedRev doesn’t say but only applies to a tags).
There is a certain degree of clustering that could be applied, especially to the ID directives. The clustering could be done with respect to what element/attribute was used, i.e.
*.id -> EnableID, IDBlacklistRegexp, IDBlacklist, IDPrefixLocal, IDPrefix img.src -> DefaultInvalidImage img.alt -> DefaultImageAlt, DefaultInvalidImageAlt bdo.dir -> DefaultTextDir a.rel -> AllowedRel a.rev -> AllowedRev a.target -> AllowedFrameTargets a.name -> Name.UseCDATA
The directives often reference generic attribute types that were specified in the DTD/specification. However, some of the behavior specifically relies on the fact that other use cases of the attribute are not, at current, supported by HTML Purifier.
AllowedRel, AllowedRev -> heavily specific; if ends up being allowed, we will also have to give users specificity there (we also want to preserve generality) DTD %Linktypes, HTML5 distinguishes between and / AllowedFrameTargets -> heavily specific, but also used by and
== AutoFormat.* ==
These have the fairly normal pluggable architecture that lends itself to large amounts of namespaces (pluggability may be the key to figuring out when gratuitous namespacing is good.) Properties:
The AutoFormat string is a bit long, but is the only bit of repeated context.
== Core.* ==
Core is the potpourri of directives, mostly regarding some minor behavioral tweaks for HTML handling abilities.
AggressivelyFixLt ConvertDocumentToFragment DirectLexLineNumberSyncInterval LexerImpl MaintainLineNumbers Lexer CollectErrors Language Error handling (Language is ostensibly a little more general, but it's only used for error handling right now) ColorKeywords CSS and HTML Encoding EscapeNonASCIICharacters Character encoding EscapeInvalidChildren EscapeInvalidTags HiddenElements RemoveInvalidImg Lexing/Output RemoveScriptContents Deprecated
== HTML.* ==
AllowedAttributes AllowedElements AllowedModules Allowed ForbiddenAttributes ForbiddenElements Element set tuning BlockWrapper Child def advanced twiddle CoreModules CustomDoctype Advanced HTMLModuleManager twiddles DefinitionID DefinitionRev Caching Doctype Parent Strict XHTML Global environment MaxImgLength Attribute twiddle? (applies to two attributes) Proprietary SafeEmbed SafeObject Trusted Extra functionality/tagsets TidyAdd TidyLevel TidyRemove Tidy
== Output.* ==
These directly affect the output of Generator. These are all advanced twiddles.
== URI.* ==
AllowedSchemes OverrideAllowedSchemes Scheme tuning Base DefaultScheme Host Global environment DefinitionID DefinitionRev Caching DisableExternalResources DisableExternal DisableResources Disable Contextual/authority tuning HostBlacklist Authority tuning MakeAbsolute MungeResources MungeSecretKey Munge Transformation behavior (munge can be grouped)