"Fossies" - the Fresh Open Source Software Archive

Member "cb2bib-2.0.1/src/c2b/bibSearcher.cpp" (12 Feb 2021, 24742 Bytes) of package /linux/privat/cb2bib-2.0.1.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) C and C++ source code syntax highlighting (style: standard) with prefixed line numbers and code folding option. Alternatively you can here view or download the uninterpreted source code file. For more information about "bibSearcher.cpp" see the Fossies "Dox" file reference documentation and the latest Fossies "Diffs" side-by-side code changes report: 2.0.0_vs_2.0.1.

    1 /***************************************************************************
    2  *   Copyright (C) 2004-2021 by Pere Constans
    3  *   constans@molspaces.com
    4  *   cb2Bib version 2.0.1. Licensed under the GNU GPL version 3.
    5  *   See the LICENSE file that comes with this distribution.
    6  ***************************************************************************/
    7 #include "bibSearcher.h"
    8 
    9 #include "bibParser.h"
   10 #include "settings.h"
   11 
   12 #include <QCoreApplication>
   13 
   14 
   15 /** \page bibsearch Search BibTeX and PDF Document Files
   16 
   17 \section descrip Description
   18 
   19   - <b>Search pattern</b> \n Patterns and composite patterns can be either
   20   \htmlonly
   21   <a href="https://arxiv.org/abs/0705.0751" target="_blank">approximate strings</a>,
   22   \endhtmlonly
   23   strings, contexts, regular expressions, or wildcard filters. Patterns admit
   24   Unicode characters. The scope of each pattern can be the reference as a whole
   25   or be focused on a particular reference field. The fields <tt>year</tt>,
   26   <tt>file</tt>, and <tt>journal</tt> are treated specifically. The field
   27   <tt>year</tt> has the qualifiers <tt>Exact</tt>, <tt>Newer</tt>, and
   28   <tt>Older</tt>. The field <tt>file</tt> can optionally refer to either the
   29   filename or the contents of such a file. Finally, for <tt>journal</tt>, the
   30   input pattern is duplicated to the, if available, journal fullname, and they
   31   two are checked against the <tt>journal</tt> actual field contents and, if
   32   available, its expanded contents. For example, typing 'ijqc' retrieves all
   33   references with <tt>journal</tt> being 'Int. J. Quantum Chem.'. Or, typing
   34   'chemistry' retrieves any of 'J. Math. Chem.', 'J. Phys. Chem.', etc. This
   35   expansion is not performed when the pattern scope is set to <tt>all</tt>.
   36 
   37   - <b>Search scope</b> \n By default, searches are performed on the current
   38   BibTeX output file. If <b>Scan all BibTeX files</b> is checked the search
   39   will extend to all BibTeX files, extension .bib, present in the current
   40   directory. It might be therefore convenient to group all reference files in
   41   one common directory, or have them linked to that directory. When <b>Scan
   42   linked documents</b> is checked, and one or more pattern scope is
   43   <tt>all</tt> or <tt>file</tt>, the contents of the file in <tt>file</tt> is
   44   converted to text and scanned for that given pattern. See \ref
   45   c2bconf_utilities section to configure the external to text converter.
   46 
   47   - <b>Search modifier</b> \n
   48   \htmlonly
   49   cb2Bib converts TeX encoded characters to Unicode when parsing the
   50   references. This permits, for instance, for the pattern 'M&#248;ller' to
   51   retrieve either 'M&#248;ller' or 'M{\o}ller', without regard to how the
   52   BibTeX reference is written. By checking <b>Simplify source</b>, the
   53   reference and the converted PDF files are simplified to plain ASCII. In this
   54   way, the pattern '\bMoller\b' will hit any of 'M&#248;ller', 'M{\o}ller', or
   55   'Moller'. Additionally, all non-word characters are removed, preserving only
   56   the ASCII, word structure of the source. Note that source simplification is
   57   only performed for the patterns whose scope is <tt>all</tt> or <tt>file</tt>
   58   contents, and that and so far, cb2Bib has only a subset of such conversions.
   59   Implemented TeX to Unicode conversions can be easily checked by entering a
   60   reference. The Unicode to ASCII letter-only conversion, on the other hand, is
   61   the one that cb2Bib also uses to write the reference IDs and, hence, the
   62   renaming of dropped files. cb2Bib can understand minor sub and superscript
   63   formatting. For instance, the pattern 'H2O' will retrieve 'H<sub>2</sub>O'
   64   from a BibTeX string <code>H$_{2}$O</code>.
   65   \endhtmlonly
   66 
   67 
   68 \section contextsearch Contextual Search
   69 
   70   A convenient way to retrieve documents is by matching a set of keywords
   71   appearing in a close proximity context, while disregarding the order in which
   72   the words might had been written. cb2Bib considers two types of contextual
   73   searches. One flexibilizes phrase matching only at the level of the
   74   constituting words. It is accessed by selecting <tt>Fixed string:
   75   Context</tt> in the pattern type box. The other one, in addition, stems the
   76   supplied keywords. It is accessed by selecting <tt>Context</tt>. By way of
   77   stemming, the keyword <i>analyze</i>, for example, will also match
   78   <i>analyse</i>, and <i>aluminum</i> will match <i>aluminium</i> too.
   79 
   80 
   81   The syntax for <tt>Context</tt> type patterns is summarized in the following
   82   table:
   83 
   84 \verbatim
   85 
   86 
   87 Operator   Example                          Expansion
   88 
   89 space      contextual search                contextual AND search
   90 
   91 |          contextual search|matching       contextual AND (search|match)
   92 
   93 +          contextual search|+matching      contextual AND (search|\bmatching\b)
   94 
   95 _          contextual_search                contextual.{0,25}search
   96 
   97 -          non-parametric                   non.{0,1}parametr
   98 
   99 
  100 Diacritics and Greek letters:
  101 
  102            naïve search                     (naïve|naive) AND search
  103 
  104            kendall tau                      kendall AND (tau|τ)
  105 
  106 
  107 \endverbatim
  108 
  109   In the above examples, operator space <tt>AND</tt> means match words in any
  110   order. Operator <tt>_</tt> preserves word order, and operator <tt>+</tt>
  111   prevents stemming and forces exact word match. Operator <tt>-</tt> considers
  112   cases of words that might had been written either united, hyphenated, or
  113   space separated. Diacritics are expanded if the diacritic mark is specified.
  114   This is, <i>naive</i> will not match <i>naïve</i>. On the other hand, Greek
  115   letters are expanded only when typed by name.
  116 
  117 
  118 
  119 
  120 \section notes Notes
  121 
  122   - cb2Bib uses an internal cache to speed up the search of linked files.
  123     By default data is stored as <tt>current_file.bib.c2b</tt>. It might be
  124     more convenient, however, to setup a temporary directory out of the user
  125     data backup directories. See <b>Search In Files Cache Directory</b> in \ref
  126     c2bconf_files. When a linked file is processed for the first time, cb2Bib
  127     does several string manipulations, such as removing end of line
  128     hyphenations. This process is time consuming for very large files.
  129 
  130   - The <b>approximate string</b> search is described in reference
  131   \htmlonly
  132   <a href="https://arxiv.org/abs/0705.0751" target="_blank">https://arxiv.org/abs/0705.0751</a>.
  133   \endhtmlonly
  134   It reduces the chance of missing a hit due to transcription and decoding
  135   errors in the document files. Approximate string is also a form of
  136   serendipitous information retrieval.
  137 
  138 */
  139 
  140 /**
  141     Top level driver for searching BibTeX files
  142 */
  143 bibSearcher::bibSearcher(bibParser* bp, QObject* parento)
  144     : QObject(parento), _bpP(bp), _do_rank_results(true), _do_search_similar(false)
  145 {
  146     clear();
  147 }
  148 
  149 /**
  150     Top level driver for searching BibTeX files. Specialized constructor for
  151     searching repeated or similar references.
  152 */
  153 bibSearcher::bibSearcher(bibParser* bp, const QString& bib_dir, QObject* parento)
  154     : QObject(parento), _bpP(bp), _do_rank_results(false), _do_search_similar(true)
  155 {
  156     // Retrieve (any_author(context) AND any_title(approximate)) OR citeidName(exact)
  157     clear();
  158     setSearchScope(bib_dir, bib_dir, true, false);
  159     setSimplifySource(false);
  160     setBoolean(true);
  161     addPattern(false, false, searchPattern::type(searchPattern::Context), QLatin1String("all"), QChar(),
  162                _bpP->currentReference().anyAuthor());
  163     addPattern(false, false, searchPattern::type(searchPattern::ApproximateString), QLatin1String("all"), QChar(),
  164                _bpP->currentReference().anyTitle());
  165     _do_search_similar_citeid = _bpP->currentReference().citeidName;
  166     if (_do_search_similar_citeid.isEmpty())
  167         _do_search_similar_citeid = QLatin1Char('@');
  168 }
  169 
  170 bibSearcher::bibSearcher() : _bpP(0), _do_rank_results(false), _do_search_similar(false)
  171 {
  172     clear();
  173 }
  174 
  175 
  176 void bibSearcher::addPattern(bool Not, bool caseSensitive, const QString& patternType, const QString& scope,
  177                              const QChar& yearScope, const QString& pattern)
  178 {
  179     if (pattern.trimmed().isEmpty())
  180         return;
  181     _patterns.append(searchPattern(Not, caseSensitive, patternType, scope, yearScope, pattern));
  182     if (!_scopes.contains(scope))
  183         _scopes.append(scope);
  184 }
  185 
  186 void bibSearcher::exec()
  187 {
  188     if (_patterns.count() == 0 && !_do_search_similar)
  189         return;
  190     std::sort(_patterns.begin(), _patterns.end());
  191     _include_documents =
  192         _include_documents && (_scopes.contains(QLatin1String("all")) || _scopes.contains(QLatin1String("file")));
  193     if (_include_documents && !_scopes.contains(QLatin1String("file")))
  194         _scopes.append(QLatin1String("file"));
  195     _scopes.removeAll(QLatin1String("all"));
  196 
  197     QString and_or;
  198     if (_boolean_and)
  199         and_or = QLatin1String(".AND.");
  200     else
  201         and_or = QLatin1String(".OR.");
  202     for (int i = 0; i < _patterns.count(); ++i)
  203     {
  204         const searchPattern& pattern = _patterns.at(i);
  205         _log_string += tr("% Pattern%1: %2%3\n").arg(i + 1).arg(and_or, pattern.toString());
  206     }
  207 
  208     // Search In Files
  209     const QStringList flist(_all_bibtex_files ? c2bUtils::filesInDir(_bibtex_dir, QStringList() << "*.bib")
  210                             : c2bUtils::filesInDir(_bibtex_file, QStringList() << "*.bib"));
  211     for (int i = 0; i < flist.count(); ++i)
  212     {
  213         search(flist.at(i));
  214         if (_aborted)
  215         {
  216             clear();
  217             _error_counter = -1;
  218             return;
  219         }
  220     }
  221 
  222     // Search Done
  223     _log_string += tr("% Scanned References: %1  BibTeX Files: %2  Linked Files: %3\n")
  224                    .arg(_reference_counter)
  225                    .arg(_bibtex_counter)
  226                    .arg(_document_counter);
  227     if (_include_documents)
  228         _documents.unload();
  229     if (_result_references.count() == 0)
  230         return;
  231     if (_do_rank_results)
  232         quadrupleSortDescending(&_result_scores, &_result_references, &_result_html_data, &_result_html_abstracts);
  233     _result_string = "\n\n" + _result_references.join("\n\n") + "\n\n";
  234     _log_string += tr("% Total Unique Hits: %1\n").arg(_result_references.count());
  235 #ifdef C2B_DEBUG_SEARCHING
  236     if (_result_scores.count() > 0)
  237         qDebug() << "Scores: " << _result_scores;
  238 #endif
  239 }
  240 
  241 QString bibSearcher::searchDocumentKeyword(const QString& bibtexfn, const QString& documentfn, const QString& keyword)
  242 {
  243     bibSearcher bs;
  244     QString exc;
  245     bs._documents.load(bibtexfn, documentContents::Complete);
  246     if (bs._documents.setCurrent(documentfn))
  247     {
  248         QString p(keyword);
  249         p.replace(QRegExp("\\W"), ".{0,5}");
  250         p.replace("s", ".?");
  251         p = "\\b" + p + "\\w*\\b";
  252         bs.addPattern(false, false, searchPattern::type(searchPattern::RegularExpression), QLatin1String("all"),
  253                       QChar(), p);
  254         QString document(bs._documents.current().text());
  255         c2bUtils::stripDiacritics(document);
  256         if (bs._patterns.at(0).matches(document))
  257         {
  258             exc = c2bUtils::fileToString(":/htm/htm/excerpts.html");
  259             exc.replace("GET_EXCERPTS_TITLE", keyword);
  260             exc.replace("GET_EXCERPTS", bs.excerpts(document).mid(20));
  261         }
  262     }
  263     bs._documents.unload();
  264     return exc;
  265 }
  266 
  267 void bibSearcher::abort()
  268 {
  269     _aborted = true;
  270 }
  271 
  272 void bibSearcher::clear()
  273 {
  274     _aborted = false;
  275     _all_bibtex_files = false;
  276     _bibtex_counter = 0;
  277     _bibtex_dir.clear();
  278     _bibtex_file.clear();
  279     _boolean_and = true;
  280     _do_search_similar_citeid.clear();
  281     _document_counter = 0;
  282     _error_counter = 0;
  283     _include_documents = false;
  284     _log_string = "% cb2Bib " + C2B_VERSION + " / BibTeX Search Log\n";
  285     _patterns.clear();
  286     _reference_counter = 0;
  287     _reference_match_counter = 0;
  288     _reference_score = double(0);
  289     _result_html_abstracts.clear();
  290     _result_html_data.clear();
  291     _result_references.clear();
  292     _result_scores.clear();
  293     _result_string.clear();
  294     _scopes.clear();
  295     _simplify_source = false;
  296 }
  297 
  298 void bibSearcher::search(const QString& bib_file)
  299 {
  300     _bibtex_counter++;
  301     QString bib_file_contents;
  302     QFile file(bib_file);
  303     if (!file.open(QIODevice::ReadOnly | QIODevice::Text))
  304     {
  305         _error_counter++;
  306         _log_string +=
  307             tr("% [cb2bib] Unable to open the file %1 for reading. Error: '%2'.\n").arg(bib_file, file.errorString());
  308         return;
  309     }
  310     QTextStream stream(&file);
  311     stream.setCodec("UTF-8");
  312     stream.setAutoDetectUnicode(true);
  313     bib_file_contents = stream.readAll();
  314     _log_string += tr("% Scanning file %1\n").arg(bib_file.trimmed());
  315     if (_include_documents)
  316     {
  317         if (_simplify_source)
  318             _documents.load(bib_file, documentContents::Simplified);
  319         else
  320             _documents.load(bib_file, documentContents::Complete);
  321     }
  322 
  323     const int hits(_result_references.count());
  324     bibReference ref;
  325     _bpP->initReferenceParsing(bib_file, _scopes, &ref);
  326     while (_bpP->referencesIn(bib_file_contents, &ref))
  327     {
  328         _reference_counter++;
  329         if (_do_search_similar)
  330             searchSimilarReferences(bib_file, ref);
  331         else
  332             searchReference(bib_file, ref);
  333         QCoreApplication::processEvents();
  334         if (_aborted)
  335             return;
  336     }
  337     _log_string += tr("% File %1. Hits: %2\n").arg(bib_file.trimmed()).arg(_result_references.count() - hits);
  338 }
  339 
  340 void bibSearcher::searchReference(const QString& bib_file, const bibReference& ref)
  341 {
  342     const bool include_document(
  343         _include_documents &&
  344         _documents.setCurrent(ref.value(QLatin1String("file")), &_document_counter, &_log_string, &_error_counter));
  345 
  346     // Initialize composite search
  347     bool hit(_boolean_and);
  348 
  349     // Composite search
  350     for (int i = 0; i < _patterns.count(); ++i)
  351     {
  352         const searchPattern& pattern(_patterns.at(i));
  353         const searchPattern::modifiers& modifier(pattern.modifier());
  354         bool ihit(false);
  355 
  356         if (modifier.scope == QLatin1String("year"))
  357         {
  358             int istr(ref.value(modifier.scope).toInt());
  359             int jstr(modifier.string.toInt());
  360             if (modifier.yearScope == QLatin1Char('='))
  361                 ihit = istr == jstr;
  362             else if (modifier.yearScope == QLatin1Char('>'))
  363                 ihit = istr >= jstr;
  364             else if (modifier.yearScope == QLatin1Char('<'))
  365                 ihit = istr <= jstr;
  366         }
  367         else if (modifier.scope == QLatin1String("all"))
  368         {
  369             if (_simplify_source)
  370                 ihit = pattern.matches(c2bUtils::toAscii(ref.unicodeReference, c2bUtils::FromBibTeX));
  371             else
  372                 ihit = pattern.matches(ref.unicodeReference);
  373             if (!ihit)
  374                 if (include_document)
  375                     ihit = pattern.matches(_documents.current());
  376         }
  377         else if (modifier.scope == QLatin1String("journal"))
  378         {
  379             const QString pattern_full(_bpP->fullJournal(modifier.string));
  380             const QString j_orig(ref.value(modifier.scope));
  381             const QString j_full(_bpP->fullJournal(j_orig));
  382             ihit = j_full == pattern_full || pattern.matches(j_orig) || pattern.matches(j_full);
  383         }
  384         else if (modifier.scope == QLatin1String("file") && include_document)
  385             ihit = pattern.matches(_documents.current());
  386         else
  387             ihit = pattern.matches(ref.value(modifier.scope));
  388 
  389         if (modifier.NOT)
  390             ihit = !ihit;
  391         if (_boolean_and)
  392         {
  393             hit = hit && ihit;
  394             if (!hit)
  395                 break;
  396         }
  397         else
  398         {
  399             hit = hit || ihit;
  400             if (hit)
  401                 break;
  402         }
  403     }
  404     if (hit)
  405         if (!_result_references.contains(ref.rawReference))
  406         {
  407             _result_references.append(ref.rawReference);
  408             _reference_score = double(0);
  409             _reference_match_counter = 0;
  410             if (_scopes.contains(QLatin1String("title")))
  411                 setTitleRank(ref.value(QLatin1String("title")));
  412             else if (_scopes.contains(QLatin1String("booktitle")))
  413                 setTitleRank(ref.value(QLatin1String("booktitle")));
  414             if (_scopes.contains(QLatin1String("abstract")))
  415                 _result_html_abstracts.append(highlight(ref.value(QLatin1String("abstract"))));
  416             else
  417                 _result_html_abstracts.append(highlight(_bpP->singleReferenceField(QLatin1String("abstract"), ref)));
  418             if (include_document)
  419                 _result_html_data.append(location(bib_file, ref) + excerpts(_documents.current().text()));
  420             else
  421                 _result_html_data.append(location(bib_file, ref));
  422             _result_scores.append(_reference_score);
  423 #if C2B_DEBUG_SCORER
  424             _debug_scorer_scores.append(_reference_score);
  425             _debug_scorer_occurrences.append(_reference_match_counter);
  426             _debug_scorer_documents.append(_bpP->singleReferenceField(QLatin1String("title"), ref));
  427 #endif
  428         }
  429 }
  430 
  431 void bibSearcher::searchSimilarReferences(const QString& bib_file, const bibReference& ref)
  432 {
  433     if (ref.citeidName == _do_search_similar_citeid)
  434     {
  435         if (!_result_references.contains(ref.rawReference))
  436         {
  437             _result_references.append(ref.rawReference);
  438             _result_html_data.append(location(bib_file, ref));
  439         }
  440         return;
  441     }
  442     if (_patterns.count() == 0)
  443         return;
  444 
  445     // Initialize composite search
  446     bool hit(_boolean_and);
  447 
  448     // Composite search
  449     for (int i = 0; i < _patterns.count(); ++i)
  450     {
  451         const searchPattern& pattern(_patterns.at(i));
  452         hit = hit && pattern.matches(ref.unicodeReference);
  453         if (!hit)
  454             break;
  455     }
  456     if (hit)
  457         if (!_result_references.contains(ref.rawReference))
  458         {
  459             _result_references.append(ref.rawReference);
  460             _result_html_data.append(location(bib_file, ref));
  461         }
  462 }
  463 
  464 void bibSearcher::setTitleRank(const QString& title)
  465 {
  466     if (!_do_rank_results || title.isEmpty())
  467         return;
  468     for (int i = 0; i < _patterns.count(); ++i)
  469     {
  470         const searchPattern& pattern(_patterns.at(i));
  471         const searchPattern::modifiers& modifier(pattern.modifier());
  472         pattern.initializeScores();
  473         if (modifier.NOT)
  474             continue;
  475         if (modifier.scope != QLatin1String("title") && modifier.scope != QLatin1String("booktitle"))
  476             continue;
  477         int pos(0);
  478         while (pos >= 0)
  479         {
  480             pos = pattern.indexIn(title, pos);
  481             if (pos > -1)
  482             {
  483                 pattern.updateScore();
  484                 pos += pattern.matchedLength();
  485             }
  486         }
  487         _reference_score += 10 * pattern.matchedScore();
  488     }
  489 }
  490 
  491 QString bibSearcher::excerpts(const QString& contents)
  492 {
  493     const int max_excerpts(25);
  494     const int max_unmerged_excerpts(max_excerpts + 100);
  495     QMap<int, int> exc_endpos;
  496 
  497     for (int i = 0; i < _patterns.count(); ++i)
  498     {
  499         const searchPattern& pattern(_patterns.at(i));
  500         const searchPattern::modifiers& modifier(pattern.modifier());
  501         pattern.initializeScores();
  502         if (modifier.NOT)
  503             continue;
  504         if (modifier.scope != QLatin1String("all") && modifier.scope != QLatin1String("file"))
  505             continue;
  506         int n_excerpts(0);
  507         int pos(0);
  508         while (pos >= 0)
  509         {
  510             pos = pattern.indexIn(contents, pos);
  511             if (pos > -1)
  512             {
  513                 if (++n_excerpts > max_unmerged_excerpts)
  514                     break;
  515                 if (_do_rank_results)
  516                     pattern.updateScore();
  517                 exc_endpos.insert(pos, std::max(pos + pattern.matchedLength(), exc_endpos.value(pos)));
  518                 pos += pattern.matchedLength();
  519             }
  520         }
  521         if (_do_rank_results)
  522         {
  523             _reference_score += pattern.matchedScore();
  524             _reference_match_counter = pattern.matchedCounter();
  525         }
  526     }
  527     if (exc_endpos.isEmpty())
  528         return QString();
  529 
  530     // Merge
  531     QList<int> i_pos = exc_endpos.keys();
  532     int pos0(i_pos.at(0));
  533     for (int i = 1; i < i_pos.count(); ++i)
  534     {
  535         const int posi(i_pos.at(i));
  536         if (exc_endpos.value(pos0) < posi)
  537             pos0 = posi;
  538         else
  539         {
  540             const int endposi(exc_endpos.value(posi));
  541             exc_endpos.remove(posi);
  542             exc_endpos.insert(pos0, endposi);
  543         }
  544     }
  545     i_pos = exc_endpos.keys();
  546 
  547     const int context_length(75);
  548     QString exc;
  549     QRegExp lead_truncated_words("^.*\\s(?=\\w)");
  550     lead_truncated_words.setMinimal(true);
  551     QRegExp tail_truncated_words("\\W+\\w+\\W*$");
  552     tail_truncated_words.setMinimal(true);
  553 
  554     bool item_begins(true);
  555     bool item_complete;
  556     const int items(std::min(i_pos.count(), max_excerpts));
  557 
  558     for (int i = 0; i < items; ++i)
  559     {
  560         const int pos(i_pos.at(i));
  561         const int length(exc_endpos.value(pos) - pos);
  562         const QString match(QLatin1String("<span>") + c2bUtils::toHtmlString(contents.mid(pos, length)) +
  563                             QLatin1String("</span>"));
  564 
  565         if (item_begins)
  566         {
  567             QString pre_match(contents.mid(pos - context_length, context_length));
  568             pre_match.remove(lead_truncated_words);
  569             pre_match = c2bUtils::toHtmlString(pre_match);
  570             exc += QLatin1String("&#8226; ...") + pre_match + match;
  571         }
  572         else
  573             exc += match;
  574 
  575         if (i + 1 == items)
  576             item_complete = true;
  577         else
  578             item_complete = exc_endpos.value(pos) + (2 * context_length) < i_pos.at(i + 1);
  579         if (item_complete)
  580         {
  581             QString post_match(contents.mid(pos + length, context_length));
  582             post_match.remove(tail_truncated_words);
  583             post_match = c2bUtils::toHtmlString(post_match);
  584             exc += post_match + "... ";
  585             item_begins = true;
  586         }
  587         else
  588         {
  589             exc += contents.midRef(pos + length, i_pos.at(i + 1) - pos - length);
  590             item_begins = false;
  591         }
  592     }
  593 
  594     exc = QLatin1String("</p><p id=\"excerpt\">") + exc;
  595     if (i_pos.count() >= max_excerpts)
  596         exc += tr("</p><p><b>Found more than %1 occurrences</b>.").arg(max_excerpts);
  597     return exc;
  598 }
  599 
  600 QString bibSearcher::highlight(const QString& abstract)
  601 {
  602     if (abstract.isEmpty())
  603         return abstract;
  604     QMap<int, int> endpos;
  605     for (int i = 0; i < _patterns.count(); ++i)
  606     {
  607         const searchPattern& pattern(_patterns.at(i));
  608         const searchPattern::modifiers& modifier(pattern.modifier());
  609         pattern.initializeScores();
  610         if (modifier.NOT)
  611             continue;
  612         if (modifier.scope != QLatin1String("all") && modifier.scope != QLatin1String("abstract"))
  613             continue;
  614         int pos(0);
  615         while (pos >= 0)
  616         {
  617             pos = pattern.indexIn(abstract, pos);
  618             if (pos > -1)
  619             {
  620                 if (_do_rank_results)
  621                     pattern.updateScore();
  622                 endpos.insert(pos, std::max(pos + pattern.matchedLength(), endpos.value(pos)));
  623                 pos += pattern.matchedLength();
  624             }
  625         }
  626         if (_do_rank_results)
  627         {
  628             _reference_score += pattern.matchedScore();
  629             _reference_match_counter = pattern.matchedCounter();
  630         }
  631     }
  632     if (endpos.isEmpty())
  633         return c2bUtils::toHtmlString(abstract);
  634 
  635     // Merge
  636     QList<int> i_pos = endpos.keys();
  637     int pos0(i_pos.at(0));
  638     for (int i = 1; i < i_pos.count(); ++i)
  639     {
  640         const int posi(i_pos.at(i));
  641         if (endpos.value(pos0) < posi)
  642             pos0 = posi;
  643         else
  644         {
  645             const int endposi(endpos.value(posi));
  646             endpos.remove(posi);
  647             endpos.insert(pos0, endposi);
  648         }
  649     }
  650     i_pos = endpos.keys();
  651 
  652     QString hla;
  653     int npos(0);
  654     for (int i = 0; i < i_pos.count(); ++i)
  655     {
  656         const int pos(i_pos.at(i));
  657         const int length(endpos.value(pos) - pos);
  658         hla += c2bUtils::toHtmlString(abstract.mid(npos, pos - npos)) + QLatin1String("<span>") +
  659                c2bUtils::toHtmlString(abstract.mid(pos, length)) + QLatin1String("</span>");
  660         npos = endpos.value(pos);
  661     }
  662     hla += c2bUtils::toHtmlString(abstract.mid(npos, abstract.length() - npos));
  663     return hla;
  664 }
  665 
  666 QString bibSearcher::location(const QString& fn, const bibReference& ref) const
  667 {
  668     const QString at("<a href=\"%1:%2\" class=\"anchor\">"
  669                      "<img src=\":/icons/icons/edit16.png\" alt=\"action\" width=\"16\" height=\"16\" /></a>");
  670     return at.arg(QDir::cleanPath(fn)).arg(ref.positionValue);
  671 }