libextractor  1.11
About: GNU libextractor is a library used to extract meta-data from files of arbitrary type.
  Fossies Dox: libextractor-1.11.tar.gz  ("unofficial" and yet experimental doxygen-generated source code documentation)  

pdf_extractor.c File Reference

plugin to support PDF files More...

#include "platform.h"
#include <extractor.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <signal.h>
#include <unistd.h>
Include dependency graph for pdf_extractor.c:

Go to the source code of this file.

Data Structures

struct  Matches
 

Functions

static void process_stdout (FILE *fout, EXTRACTOR_MetaDataProcessor proc, void *proc_cls)
 
void EXTRACTOR_pdf_extract_method (struct EXTRACTOR_ExtractContext *ec)
 

Variables

static struct Matches tmap []
 

Detailed Description

plugin to support PDF files

Author
Christian Grothoff

PDF libraries today are a nightmare (TM). So instead of doing the fast thing and calling some library functions to parse the PDF, we execute 'pdfinfo' and parse the output. Because that's 21st century plumbing: nobody writes reasonable code anymore.

Definition in file pdf_extractor.c.

Function Documentation

◆ EXTRACTOR_pdf_extract_method()

void EXTRACTOR_pdf_extract_method ( struct EXTRACTOR_ExtractContext ec)

Main entry method for the PDF extraction plugin.

Parameters
ecextraction context provided to the plugin

Definition at line 133 of file pdf_extractor.c.

References EXTRACTOR_ExtractContext::cls, EXTRACTOR_ExtractContext::get_size, NULL, EXTRACTOR_ExtractContext::proc, process_stdout(), EXTRACTOR_ExtractContext::read, and EXTRACTOR_ExtractContext::seek.

◆ process_stdout()

static void process_stdout ( FILE *  fout,
EXTRACTOR_MetaDataProcessor  proc,
void *  proc_cls 
)
static

Process the "stdout" file from pdfinfo.

Parameters
foutstdout of pdfinfo
procfunction to call with meta data
proc_clsclosure for proc

Definition at line 82 of file pdf_extractor.c.

References EXTRACTOR_METAFORMAT_UTF8, NULL, Matches::text, tmap, and type.

Referenced by EXTRACTOR_pdf_extract_method().

Variable Documentation

◆ tmap

struct Matches tmap[]
static
Initial value:
= {
{NULL, 0}
}
#define NULL
Definition: getopt1.c:60
@ EXTRACTOR_METATYPE_PRODUCED_BY_SOFTWARE
Definition: extractor.h:258
@ EXTRACTOR_METATYPE_AUTHOR_NAME
Definition: extractor.h:143
@ EXTRACTOR_METATYPE_ENCODER_VERSION
Definition: extractor.h:350
@ EXTRACTOR_METATYPE_TITLE
Definition: extractor.h:134
@ EXTRACTOR_METATYPE_CREATOR
Definition: extractor.h:189
@ EXTRACTOR_METATYPE_CREATION_DATE
Definition: extractor.h:196
@ EXTRACTOR_METATYPE_KEYWORDS
Definition: extractor.h:185
@ EXTRACTOR_METATYPE_MODIFICATION_DATE
Definition: extractor.h:197
@ EXTRACTOR_METATYPE_PAGE_COUNT
Definition: extractor.h:141
@ EXTRACTOR_METATYPE_SUBJECT
Definition: extractor.h:188

Map from pdf-control entries to LE types.

See output of 'pdfinfo'.

Definition at line 1 of file pdf_extractor.c.

Referenced by process_stdout().