recoll  1.26.3
About: Recoll is a personal full text search tool based on Xapian as back-end (with Qt GUI).
  Fossies Dox: recoll-1.26.3.tar.gz  ("unofficial" and yet experimental doxygen-generated source code documentation)  

About filters

Overview

Before a document can be processed either for indexing or previewing, it must be translated into an internal common format.

The MimeHandler class defines the virtual interface for filters. There are derived classes for text, html (MimeHandlerHtml), and mail folders (MimeHandlerMail)

There is also a derived class (MimeHandlerExec) that will execute an external program to translate the document to simple html (to be further processed by MimeHandlerHtml).

To extend Recoll for a new document type, you may either subclass the MimeHandler class (look at one of the existing subclasses), or write an external filter, which will probably be the simpler solution in most cases.

External filters

Filters are programs (usually shell scripts) that will turn a document of foreign type into something that Recoll can understand. HTML was chosen as a pivot format for its ability to carry structured information.

The meta-information tags that Recoll will use at the moment are the following:

 - title
 - charset 
 - keywords 
 - description

For an example, you can take a look at the rclsoff filter which translates openoffice documents.

The filter is executed with the input file name as a parameter and should output the result to stdout.

Associating a filter to a mime type

This is done in the mimeconf configuration file. Take a look at the file, the format is self-explanatory.

*/