PolyglotMan takes man pages from most of the popular flavors of UNIX and transforms them into any of a number of text source formats. PolyglotMan was formerly known as RosettaMan. The name of the binary is still called rman, for scripts that depend on that name; mnemonically, just think "reverse man". Previously PolyglotMan required pages to be formatted by nroff prior to its processing. With version 3.0, it prefers [tn]roff source and usually produces results that are better yet. And source processing is the only way to translate tables. Source format translation is not as mature as formatted, however, so try formatted translation as a backup.
In parsing [tn]roff source, one could implement an arbitrarily large subset of [tn]roff, which I did not and will not do, so the results can be off. I did implement a significant subset of those use in man pages, however, including tbl (but not eqn), if tests, and general macro definitions, so usually the results look great. If they don't, format the page with nroff before sending it to PolyglotMan. If PolyglotMan doesn't recognize a key macro used by a large class of pages, however, e-mail me the source and a uuencoded nroff-formatted page and I'll see what I can do. When running PolyglotMan with man page source that includes or redirects to other [tn]roff source using the .so (source or inclusion) macro, you should be in the parent directory of the page, since pages are written with this assumption. For example, if you are translating /usr/man/man1/ls.1, first cd into /usr/man.
PolyglotMan accepts formatted man pages from:
SunOS, Sun Solaris, Hewlett-Packard HP-UX, AT&T System V, OSF/1 aka Digital UNIX, DEC Ultrix, SGI IRIX, Linux, FreeBSD, SCO.Man page source processing works for:
SunOS, Sun Solaris, Hewlett-Packard HP-UX, AT&T System V, OSF/1 aka Digital UNIX, DEC Ultrix.It can produce
printable ASCII-only (control characters stripped), section headers-only, Tk, TkMan, [tn]roff (traditional man page source), partial DocBook XML, HTML, MIME, LaTeX, LaTeX2e, RTF, Perl 5 POD.A modular architecture permits easy addition of additional output formats.
The latest version of PolyglotMan is available via http://polyglotman.sourceforge.net/.
The following options should not be used with any others and exit PolyglotMan without processing any input.
You should specify the filter first, as this sets a number of parameters, and then specify other options.
The following options apply only when formatted pages are given as input. They do not apply or are always handled correctly with the source.
Some flavors of UNIX ship man page without [tn]roff source, making one's laser printer little more than a laser-powered daisy wheel. This filer tries to intuit the original [tn]roff directives, which can then be recompiled by [tn]roff.
TkMan, a hypertext man page browser, uses PolyglotMan to show man pages without the (usually) useless headers and footers on each pages. It also collects section and (optionally) subsection heads for direct access from a pulldown menu. TkMan and Tcl/Tk, the toolkit in which it's written, are available via anonymous ftp from ftp://ftp.smli.com/pub/tcl/
This option outputs the text in a series of Tcl lists consisting of text-tags pairs, where tag names roughly correspond to HTML. This output can be inserted into a Tk text widget by doing an eval <textwidget> insert end <text>. This format should be relatively easily parsible by other programs that want both the text and the tags. Also see ASCII.
When printed on a line printer, man pages try to produce special text effects by overstriking characters with themselves (to produce bold) and underscores (underlining). Other text processing software, such as text editors, searchers, and indexers, must counteract this. The ASCII filter strips away this formatting. Piping nroff output through col -b also strips away this formatting, but it leaves behind unsightly page headers and footers. Also see Tk.
Dumps section and (optionally) subsection titles. This might be useful for another program that processes man pages.
With a simple extention to an HTTP server for Mosaic or other World Wide Web browser, PolyglotMan can produce high quality HTML on the fly. Several such extensions and pointers to several others are included in PolyglotMan's contrib directory.
This is appoaching the Docbook DTD, but I'm hoping that someone that someone with a real interest in this will polish the tags generated. Try it to see how close the tags are now.
Improved by Aaron Hawley, but still he notes
Output requires human intervention to become proper DocBook format. This is a result of the fundamental nature of nroff and DocBook xml. One is marked for formating the other is marked for semantics (defining what the content is rather then what it should look like). For instance, italics and bold formatting are converted to emphasis and command DocBook elements respectively even though they should probably be marked up as command, option, literal, arg, option and other possible DocBook tags.
MIME (Multipurpose Internet Mail Extensions) as defined by RFC 1563, good for consumption by MIME-aware e-mailers or as Emacs (>=19.29) enriched documents.
Use output on Mac or NeXT or whatever. Maybe take random man pages and integrate with NeXT's documentation system better. Maybe NeXT has own man page macros that do this.
To produce PostScript, use groff or psroff. To produce FrameMaker MIF, use FrameMaker's built-in filter. In both cases you need [tn]roff source, so if you only have a formatted version of the manual page, use PolyglotMan's roff filter first.
To convert the formatted man page named ls.1 back into [tn]roff source form:
rman -f roff /usr/local/man/cat1/ls.1 > /usr/local/man/man1/ls.1
Long man pages are often compressed to conserve space (compression is
especially effective on formatted man pages as many of the characters
are spaces). As it is a long man page, it probably has subsections,
which we try to separate out (some macro sets don't distinguish
subsections well enough for PolyglotMan to detect them). Let's convert
this to LaTeX format:
pcat /usr/catman/a_man/cat1/automount.z | rman -b -n automount -s 1 -f latex > automount.man
man 1 automount | rman -b -n automount -s 1 -f latex > automount.man
For HTML/Mosaic users, PolyglotMan can, without modification of the
source code, produce HTML links that point to other HTML man pages
either pregenerated or generated on the fly. First let's assume
pregenerated HTML versions of man pages stored in /usr/man/html.
Generate these one-by-one with the following form:
rman -f html -r 'http:/usr/man/html/%s.%s.html' /usr/man/cat1/ls.1 > /usr/man/html/ls.1.html
If you've extended your HTML client to generate HTML on the fly you should use
rman -f html -r 'http:~/bin/man2html?%s:%s' /usr/man/cat1/ls.1
when generating HTML.
PolyglotMan is not perfect in all cases, but it usually does a good job, and in any case reduces the problem of converting man pages to light editing.
Tables in formatted pages, especially H-P's, aren't handled very well. Be sure to pass in source for the page to recognize tables.
The man pager woman applies its own idea of formatting for man pages, which can confuse PolyglotMan. Bypass woman by passing the formatted manual page text directly into PolyglotMan.
The [tn]roff output format uses fB to turn on boldface. If your macro set requires .B, you'll have to a postprocess the PolyglotMan output.
GNU groff can now output to HTML.
Copyright (c) 1994-2003 T.A. Phelps
developed at the
University of California, Berkeley
Computer Science Division
Manual page last updated on $Date: 2003/03/29 08:09:13 $