pdftohtml.cat (xpdf-4.03) | : | pdftohtml.cat (xpdf-4.04) | ||
---|---|---|---|---|
pdftohtml(1) General Commands Manual pdftohtml(1) | pdftohtml(1) General Commands Manual pdftohtml(1) | |||
NAME | NAME | |||
pdftohtml - Portable Document Format (PDF) to HTML converter (version | pdftohtml - Portable Document Format (PDF) to HTML converter (version | |||
4.03) | 4.04) | |||
SYNOPSIS | SYNOPSIS | |||
pdftohtml [options] PDF-file HTML-dir | pdftohtml [options] PDF-file HTML-dir | |||
DESCRIPTION | DESCRIPTION | |||
Pdftohtml converts Portable Document Format (PDF) files to HTML. | Pdftohtml converts Portable Document Format (PDF) files to HTML. | |||
Pdftohtml reads the PDF file, PDF-file, and places an HTML file for | Pdftohtml reads the PDF file, PDF-file, and places an HTML file for | |||
each page, along with auxiliary images in the directory, HTML-dir. The | each page, along with auxiliary images in the directory, HTML-dir. The | |||
HTML directory will be created; if it already exists, pdftohtml will | HTML directory will be created; if it already exists, pdftohtml will | |||
report an error. | report an error. | |||
CONFIGURATION FILE | CONFIGURATION FILE | |||
Pdftohtml reads a configuration file at startup. It first tries to | Pdftohtml reads a configuration file at startup. It first tries to | |||
find the user's private config file, ~/.xpdfrc. If that doesn't exist, | find the user's private config file, ~/.xpdfrc. If that doesn't exist, | |||
it looks for a system-wide config file, typically /usr/local/etc/xpdfrc | it looks for a system-wide config file, typically /etc/xpdfrc (but this | |||
(but this location can be changed when pdftohtml is built). See the | location can be changed when pdftohtml is built). See the xpdfrc(5) | |||
xpdfrc(5) man page for details. | man page for details. | |||
OPTIONS | OPTIONS | |||
Many of the following options can be set with configuration file com- | Many of the following options can be set with configuration file com- | |||
mands. These are listed in square brackets with the description of the | mands. These are listed in square brackets with the description of the | |||
corresponding command line option. | corresponding command line option. | |||
-f number | -f number | |||
Specifies the first page to convert. | Specifies the first page to convert. | |||
-l number | -l number | |||
skipping to change at line 49 | skipping to change at line 49 | |||
the HTML. Using '-z 1.5', for example, will make the initial | the HTML. Using '-z 1.5', for example, will make the initial | |||
view 50% larger. | view 50% larger. | |||
-r number | -r number | |||
Specifies the resolution, in DPI, for background images. This | Specifies the resolution, in DPI, for background images. This | |||
controls the pixel size of the background image files. The ini- | controls the pixel size of the background image files. The ini- | |||
tial zoom level is controlled by the '-z' option. Specifying a | tial zoom level is controlled by the '-z' option. Specifying a | |||
larger '-r' value will allow the viewer to zoom in farther with- | larger '-r' value will allow the viewer to zoom in farther with- | |||
out upscaling artifacts in the background. | out upscaling artifacts in the background. | |||
-vstretch number | ||||
Specifies a vertical stretch factor. Setting this to a value | ||||
greater than 1.0 will stretch each page vertically, spreading | ||||
out the lines. This also stretches the background image to | ||||
match. | ||||
-embedbackground | ||||
Embeds the background image as base64-encoded data directly in | ||||
the HTML file, rather than storing it as a separate file. | ||||
-nofonts | -nofonts | |||
Disable extraction of embedded fonts. By default, pdftohtml | Disable extraction of embedded fonts. By default, pdftohtml | |||
extracts TrueType and OpenType fonts. Disabling extraction can | extracts TrueType and OpenType fonts. Disabling extraction can | |||
work around problems with buggy fonts. | work around problems with buggy fonts. | |||
-embedfonts | ||||
Embeds any extracted fonts as base64-encoded data directly in | ||||
the HTML file, rather than storing them as separate files. | ||||
-skipinvisible | -skipinvisible | |||
Don't draw invisible text. By default, invisible text (commonly | Don't draw invisible text. By default, invisible text (commonly | |||
used in OCR'ed PDF files) is drawn as transparent (alpha=0) HTML | used in OCR'ed PDF files) is drawn as transparent (alpha=0) HTML | |||
text. This option tells pdftohtml to discard invisible text | text. This option tells pdftohtml to discard invisible text | |||
entirely. | entirely. | |||
-allinvisible | -allinvisible | |||
Treat all text as invisible. By default, regular (non-invisi- | Treat all text as invisible. By default, regular (non-invisi- | |||
ble) text is not drawn in the background image, and is instead | ble) text is not drawn in the background image, and is instead | |||
drawn with HTML on top of the image. This option tells pdfto- | drawn with HTML on top of the image. This option tells pdfto- | |||
html to include the regular text in the background image, and | html to include the regular text in the background image, and | |||
then draw it as transparent (alpha=0) HTML text. | then draw it as transparent (alpha=0) HTML text. | |||
-formfields | ||||
Convert AcroForm text and checkbox fields to HTML input ele- | ||||
ments. This also removes text (e.g., underscore characters) and | ||||
erases background image content (e.g., lines or boxes) in the | ||||
field areas. | ||||
-table Use table mode when performing the underlying text extraction. | ||||
This will generally produce better output when the PDF content | ||||
is a full-page table. NB: This does not generate HTML tables; | ||||
it just changes the way text is split up. | ||||
-opw password | -opw password | |||
Specify the owner password for the PDF file. Providing this | Specify the owner password for the PDF file. Providing this | |||
will bypass all security restrictions. | will bypass all security restrictions. | |||
-upw password | -upw password | |||
Specify the user password for the PDF file. | Specify the user password for the PDF file. | |||
-verbose | ||||
Print a status message (to stdout) before processing each page. | ||||
[config file: printStatusInfo] | ||||
-q Don't print any messages or errors. [config file: errQuiet] | -q Don't print any messages or errors. [config file: errQuiet] | |||
-cfg config-file | -cfg config-file | |||
Read config-file in place of ~/.xpdfrc or the system-wide config | Read config-file in place of ~/.xpdfrc or the system-wide config | |||
file. | file. | |||
-v Print copyright and version information. | -v Print copyright and version information. | |||
-h Print usage information. (-help and --help are equivalent.) | -h Print usage information. (-help and --help are equivalent.) | |||
skipping to change at line 103 | skipping to change at line 132 | |||
1 Error opening a PDF file. | 1 Error opening a PDF file. | |||
2 Error opening an output file. | 2 Error opening an output file. | |||
3 Error related to PDF permissions. | 3 Error related to PDF permissions. | |||
99 Other error. | 99 Other error. | |||
AUTHOR | AUTHOR | |||
The pdftohtml software and documentation are copyright 1996-2021 Glyph | The pdftohtml software and documentation are copyright 1996-2022 Glyph | |||
& Cog, LLC. | & Cog, LLC. | |||
SEE ALSO | SEE ALSO | |||
xpdf(1), pdftops(1), pdftotext(1), pdfinfo(1), pdffonts(1), pdfde- | xpdf(1), pdftops(1), pdftotext(1), pdfinfo(1), pdffonts(1), pdfde- | |||
tach(1), pdftoppm(1), pdftopng(1), pdfimages(1), xpdfrc(5) | tach(1), pdftoppm(1), pdftopng(1), pdfimages(1), xpdfrc(5) | |||
http://www.xpdfreader.com/ | http://www.xpdfreader.com/ | |||
28 Jan 2021 pdftohtml(1) | 18 Apr 2022 pdftohtml(1) | |||
End of changes. 11 change blocks. | ||||
11 lines changed or deleted | 40 lines changed or added |