"Fossies" - the Fresh Open Source Software Archive

Member "xapian-core-1.4.14/docs/synonyms.html" (23 Nov 2019, 12593 Bytes) of package /linux/www/xapian-core-1.4.14.tar.xz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) HTML source code syntax highlighting (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 <?xml version="1.0" encoding="utf-8" ?>
    2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    3 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    4 <head>
    5 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    6 <meta name="generator" content="Docutils 0.15.2: http://docutils.sourceforge.net/" />
    7 <title>Xapian Synonym Support</title>
    8 <style type="text/css">
    9 
   10 /*
   11 :Author: David Goodger (goodger@python.org)
   12 :Id: $Id: html4css1.css 7952 2016-07-26 18:15:59Z milde $
   13 :Copyright: This stylesheet has been placed in the public domain.
   14 
   15 Default cascading style sheet for the HTML output of Docutils.
   16 
   17 See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to
   18 customize this style sheet.
   19 */
   20 
   21 /* used to remove borders from tables and images */
   22 .borderless, table.borderless td, table.borderless th {
   23   border: 0 }
   24 
   25 table.borderless td, table.borderless th {
   26   /* Override padding for "table.docutils td" with "! important".
   27      The right padding separates the table cells. */
   28   padding: 0 0.5em 0 0 ! important }
   29 
   30 .first {
   31   /* Override more specific margin styles with "! important". */
   32   margin-top: 0 ! important }
   33 
   34 .last, .with-subtitle {
   35   margin-bottom: 0 ! important }
   36 
   37 .hidden {
   38   display: none }
   39 
   40 .subscript {
   41   vertical-align: sub;
   42   font-size: smaller }
   43 
   44 .superscript {
   45   vertical-align: super;
   46   font-size: smaller }
   47 
   48 a.toc-backref {
   49   text-decoration: none ;
   50   color: black }
   51 
   52 blockquote.epigraph {
   53   margin: 2em 5em ; }
   54 
   55 dl.docutils dd {
   56   margin-bottom: 0.5em }
   57 
   58 object[type="image/svg+xml"], object[type="application/x-shockwave-flash"] {
   59   overflow: hidden;
   60 }
   61 
   62 /* Uncomment (and remove this text!) to get bold-faced definition list terms
   63 dl.docutils dt {
   64   font-weight: bold }
   65 */
   66 
   67 div.abstract {
   68   margin: 2em 5em }
   69 
   70 div.abstract p.topic-title {
   71   font-weight: bold ;
   72   text-align: center }
   73 
   74 div.admonition, div.attention, div.caution, div.danger, div.error,
   75 div.hint, div.important, div.note, div.tip, div.warning {
   76   margin: 2em ;
   77   border: medium outset ;
   78   padding: 1em }
   79 
   80 div.admonition p.admonition-title, div.hint p.admonition-title,
   81 div.important p.admonition-title, div.note p.admonition-title,
   82 div.tip p.admonition-title {
   83   font-weight: bold ;
   84   font-family: sans-serif }
   85 
   86 div.attention p.admonition-title, div.caution p.admonition-title,
   87 div.danger p.admonition-title, div.error p.admonition-title,
   88 div.warning p.admonition-title, .code .error {
   89   color: red ;
   90   font-weight: bold ;
   91   font-family: sans-serif }
   92 
   93 /* Uncomment (and remove this text!) to get reduced vertical space in
   94    compound paragraphs.
   95 div.compound .compound-first, div.compound .compound-middle {
   96   margin-bottom: 0.5em }
   97 
   98 div.compound .compound-last, div.compound .compound-middle {
   99   margin-top: 0.5em }
  100 */
  101 
  102 div.dedication {
  103   margin: 2em 5em ;
  104   text-align: center ;
  105   font-style: italic }
  106 
  107 div.dedication p.topic-title {
  108   font-weight: bold ;
  109   font-style: normal }
  110 
  111 div.figure {
  112   margin-left: 2em ;
  113   margin-right: 2em }
  114 
  115 div.footer, div.header {
  116   clear: both;
  117   font-size: smaller }
  118 
  119 div.line-block {
  120   display: block ;
  121   margin-top: 1em ;
  122   margin-bottom: 1em }
  123 
  124 div.line-block div.line-block {
  125   margin-top: 0 ;
  126   margin-bottom: 0 ;
  127   margin-left: 1.5em }
  128 
  129 div.sidebar {
  130   margin: 0 0 0.5em 1em ;
  131   border: medium outset ;
  132   padding: 1em ;
  133   background-color: #ffffee ;
  134   width: 40% ;
  135   float: right ;
  136   clear: right }
  137 
  138 div.sidebar p.rubric {
  139   font-family: sans-serif ;
  140   font-size: medium }
  141 
  142 div.system-messages {
  143   margin: 5em }
  144 
  145 div.system-messages h1 {
  146   color: red }
  147 
  148 div.system-message {
  149   border: medium outset ;
  150   padding: 1em }
  151 
  152 div.system-message p.system-message-title {
  153   color: red ;
  154   font-weight: bold }
  155 
  156 div.topic {
  157   margin: 2em }
  158 
  159 h1.section-subtitle, h2.section-subtitle, h3.section-subtitle,
  160 h4.section-subtitle, h5.section-subtitle, h6.section-subtitle {
  161   margin-top: 0.4em }
  162 
  163 h1.title {
  164   text-align: center }
  165 
  166 h2.subtitle {
  167   text-align: center }
  168 
  169 hr.docutils {
  170   width: 75% }
  171 
  172 img.align-left, .figure.align-left, object.align-left, table.align-left {
  173   clear: left ;
  174   float: left ;
  175   margin-right: 1em }
  176 
  177 img.align-right, .figure.align-right, object.align-right, table.align-right {
  178   clear: right ;
  179   float: right ;
  180   margin-left: 1em }
  181 
  182 img.align-center, .figure.align-center, object.align-center {
  183   display: block;
  184   margin-left: auto;
  185   margin-right: auto;
  186 }
  187 
  188 table.align-center {
  189   margin-left: auto;
  190   margin-right: auto;
  191 }
  192 
  193 .align-left {
  194   text-align: left }
  195 
  196 .align-center {
  197   clear: both ;
  198   text-align: center }
  199 
  200 .align-right {
  201   text-align: right }
  202 
  203 /* reset inner alignment in figures */
  204 div.align-right {
  205   text-align: inherit }
  206 
  207 /* div.align-center * { */
  208 /*   text-align: left } */
  209 
  210 .align-top    {
  211   vertical-align: top }
  212 
  213 .align-middle {
  214   vertical-align: middle }
  215 
  216 .align-bottom {
  217   vertical-align: bottom }
  218 
  219 ol.simple, ul.simple {
  220   margin-bottom: 1em }
  221 
  222 ol.arabic {
  223   list-style: decimal }
  224 
  225 ol.loweralpha {
  226   list-style: lower-alpha }
  227 
  228 ol.upperalpha {
  229   list-style: upper-alpha }
  230 
  231 ol.lowerroman {
  232   list-style: lower-roman }
  233 
  234 ol.upperroman {
  235   list-style: upper-roman }
  236 
  237 p.attribution {
  238   text-align: right ;
  239   margin-left: 50% }
  240 
  241 p.caption {
  242   font-style: italic }
  243 
  244 p.credits {
  245   font-style: italic ;
  246   font-size: smaller }
  247 
  248 p.label {
  249   white-space: nowrap }
  250 
  251 p.rubric {
  252   font-weight: bold ;
  253   font-size: larger ;
  254   color: maroon ;
  255   text-align: center }
  256 
  257 p.sidebar-title {
  258   font-family: sans-serif ;
  259   font-weight: bold ;
  260   font-size: larger }
  261 
  262 p.sidebar-subtitle {
  263   font-family: sans-serif ;
  264   font-weight: bold }
  265 
  266 p.topic-title {
  267   font-weight: bold }
  268 
  269 pre.address {
  270   margin-bottom: 0 ;
  271   margin-top: 0 ;
  272   font: inherit }
  273 
  274 pre.literal-block, pre.doctest-block, pre.math, pre.code {
  275   margin-left: 2em ;
  276   margin-right: 2em }
  277 
  278 pre.code .ln { color: grey; } /* line numbers */
  279 pre.code, code { background-color: #eeeeee }
  280 pre.code .comment, code .comment { color: #5C6576 }
  281 pre.code .keyword, code .keyword { color: #3B0D06; font-weight: bold }
  282 pre.code .literal.string, code .literal.string { color: #0C5404 }
  283 pre.code .name.builtin, code .name.builtin { color: #352B84 }
  284 pre.code .deleted, code .deleted { background-color: #DEB0A1}
  285 pre.code .inserted, code .inserted { background-color: #A3D289}
  286 
  287 span.classifier {
  288   font-family: sans-serif ;
  289   font-style: oblique }
  290 
  291 span.classifier-delimiter {
  292   font-family: sans-serif ;
  293   font-weight: bold }
  294 
  295 span.interpreted {
  296   font-family: sans-serif }
  297 
  298 span.option {
  299   white-space: nowrap }
  300 
  301 span.pre {
  302   white-space: pre }
  303 
  304 span.problematic {
  305   color: red }
  306 
  307 span.section-subtitle {
  308   /* font-size relative to parent (h1..h6 element) */
  309   font-size: 80% }
  310 
  311 table.citation {
  312   border-left: solid 1px gray;
  313   margin-left: 1px }
  314 
  315 table.docinfo {
  316   margin: 2em 4em }
  317 
  318 table.docutils {
  319   margin-top: 0.5em ;
  320   margin-bottom: 0.5em }
  321 
  322 table.footnote {
  323   border-left: solid 1px black;
  324   margin-left: 1px }
  325 
  326 table.docutils td, table.docutils th,
  327 table.docinfo td, table.docinfo th {
  328   padding-left: 0.5em ;
  329   padding-right: 0.5em ;
  330   vertical-align: top }
  331 
  332 table.docutils th.field-name, table.docinfo th.docinfo-name {
  333   font-weight: bold ;
  334   text-align: left ;
  335   white-space: nowrap ;
  336   padding-left: 0 }
  337 
  338 /* "booktabs" style (no vertical lines) */
  339 table.docutils.booktabs {
  340   border: 0px;
  341   border-top: 2px solid;
  342   border-bottom: 2px solid;
  343   border-collapse: collapse;
  344 }
  345 table.docutils.booktabs * {
  346   border: 0px;
  347 }
  348 table.docutils.booktabs th {
  349   border-bottom: thin solid;
  350   text-align: left;
  351 }
  352 
  353 h1 tt.docutils, h2 tt.docutils, h3 tt.docutils,
  354 h4 tt.docutils, h5 tt.docutils, h6 tt.docutils {
  355   font-size: 100% }
  356 
  357 ul.auto-toc {
  358   list-style-type: none }
  359 
  360 </style>
  361 </head>
  362 <body>
  363 <div class="document" id="xapian-synonym-support">
  364 <h1 class="title">Xapian Synonym Support</h1>
  365 
  366 <!-- Copyright (C) 2007,2008,2011 Olly Betts -->
  367 <div class="contents topic" id="table-of-contents">
  368 <p class="topic-title first">Table of contents</p>
  369 <ul class="simple">
  370 <li><a class="reference internal" href="#introduction" id="id1">Introduction</a></li>
  371 <li><a class="reference internal" href="#model" id="id2">Model</a></li>
  372 <li><a class="reference internal" href="#queryparser-integration" id="id3">QueryParser Integration</a></li>
  373 <li><a class="reference internal" href="#current-limitations" id="id4">Current Limitations</a><ul>
  374 <li><a class="reference internal" href="#explicit-multi-word-synonyms" id="id5">Explicit multi-word synonyms</a></li>
  375 <li><a class="reference internal" href="#backend-support" id="id6">Backend Support</a></li>
  376 </ul>
  377 </li>
  378 </ul>
  379 </div>
  380 <div class="section" id="introduction">
  381 <h1><a class="toc-backref" href="#id1">Introduction</a></h1>
  382 <p>Xapian provides support for storing a synonym dictionary, or thesaurus.  This
  383 can be used by the Xapian::QueryParser class to expand terms in user query
  384 strings, either automatically, or when requested by the user with an explicit
  385 synonym operator (<tt class="docutils literal">~</tt>).</p>
  386 <p>Note that Xapian doesn't offer automated generation of the synonym dictionary.</p>
  387 </div>
  388 <div class="section" id="model">
  389 <h1><a class="toc-backref" href="#id2">Model</a></h1>
  390 <p>The model for the synonym dictionary is that a term or group of consecutive
  391 terms can have one or more synonym terms.  A group of consecutive terms is
  392 specified in the dictionary by simply joining them with a single space between
  393 each one.</p>
  394 </div>
  395 <div class="section" id="queryparser-integration">
  396 <h1><a class="toc-backref" href="#id3">QueryParser Integration</a></h1>
  397 <p>In order for any of the synonym features of the QueryParser to work, you must
  398 call <tt class="docutils literal"><span class="pre">QueryParser::set_database()</span></tt> to specify the database to use.</p>
  399 <p>If <tt class="docutils literal">FLAG_SYNONYM</tt> is passed to <tt class="docutils literal"><span class="pre">QueryParser::parse_query()</span></tt> then the
  400 QueryParser will recognise <tt class="docutils literal">~</tt> in front of a term as indicating a request for
  401 synonym expansion.  If <tt class="docutils literal">FLAG_LOVEHATE</tt> is also specified, you can use <tt class="docutils literal">+</tt>
  402 and <tt class="docutils literal">-</tt> before the <tt class="docutils literal">~</tt> to indicate that you love or hate the synonym
  403 expanded expression.</p>
  404 <p>A synonym-expanded term becomes the term itself OR-ed with any listed synonyms,
  405 so <tt class="docutils literal">~truck</tt> might expand to <tt class="docutils literal">truck OR lorry OR van</tt>.  A group of terms is
  406 handled in much the same way.</p>
  407 <p>If a term to be synonym expanded will be stemmed by the QueryParser, then
  408 synonyms will be checked for the unstemmed form first, and then for the stemmed
  409 form, so you can provide different synonyms for particular unstemmed forms
  410 if you want to.</p>
  411 <p>If <tt class="docutils literal">FLAG_AUTO_SYNONYMS</tt> is passed to <tt class="docutils literal"><span class="pre">QueryParser::parse_query()</span></tt> then the
  412 QueryParser will automatically expand any term which has synonyms, unless the
  413 term is in a phrase or similar.</p>
  414 <p>If <tt class="docutils literal">FLAG_AUTO_MULTIWORD_SYNONYMS</tt> is passed to <tt class="docutils literal"><span class="pre">QueryParser::parse_query()</span></tt>
  415 then the QueryParser will look at groups of terms separated only by whitespace
  416 and try to expand them as term groups.  This is done in a &quot;greedy&quot; fashion, so
  417 the first term which can start a group is expanded first, and the longest group
  418 starting with that term is expanded.  After expansion, the QueryParser will
  419 look for further possible expansions starting with the term after the last
  420 term in the expanded group.</p>
  421 </div>
  422 <div class="section" id="current-limitations">
  423 <h1><a class="toc-backref" href="#id4">Current Limitations</a></h1>
  424 <div class="section" id="explicit-multi-word-synonyms">
  425 <h2><a class="toc-backref" href="#id5">Explicit multi-word synonyms</a></h2>
  426 <p>There ought to be a way to explicitly request expansion of multi-term synonyms,
  427 probably with the syntax <tt class="docutils literal">~&quot;stock market&quot;</tt>.  This hasn't been implemented
  428 yet though.</p>
  429 </div>
  430 <div class="section" id="backend-support">
  431 <h2><a class="toc-backref" href="#id6">Backend Support</a></h2>
  432 <p>Currently synonyms are supported by glass and chert databases.  They work
  433 with a single database or multiple databases (use Database::add_database() as
  434 usual).  We've no plans to support them for the InMemory backend, but we do
  435 intend to support them for the remote backend in the future.</p>
  436 </div>
  437 </div>
  438 </div>
  439 </body>
  440 </html>