While looking for a way to remove the <meta name="ROBOTS" content="NONE"%/> meta tag from some of the pages produced by Geneweb I stumbled upon a relatively new tool with interesting potential – mod_publisher :

Mod_publisher turns the URL mapping of mod_proxy_html into a general-purpose text search and replace. Whereas mod_proxy_html applies rewrites to HTML URLs, and in version 2 extends that to other contexts where a link might occur, mod_publisher extends it further to allow parsing of text wherever it can occur.

Unlike mod_proxy_html there is no presumption of the rewrites serving any particular purpose – this is entirely up to the user. This means we are potentially parsing all text in a document, which is a significantly higher overhead than mod_proxy_html. To deal with this, we provide fine-grained control over what is or isn’t parsed, replacing the simple ProxyHTMLExtended with a more general MLRewriteOptions directive.

My feeling is that the authors are considerably understating how much CPU this thing is going to cost. Production-minded people were certainly cringing at that thought while reading the description, but I foresee immense power for hacks of last resort.