"Fossies" - the Fresh Open Source Software Archive

Member "cb2bib-2.0.1/src/c2bSaveRegExp.cpp" (12 Feb 2021, 8579 Bytes) of package /linux/privat/cb2bib-2.0.1.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) C and C++ source code syntax highlighting (style: standard) with prefixed line numbers and code folding option. Alternatively you can here view or download the uninterpreted source code file. For more information about "c2bSaveRegExp.cpp" see the Fossies "Dox" file reference documentation and the latest Fossies "Diffs" side-by-side code changes report: 2.0.0_vs_2.0.1.

    1 /***************************************************************************
    2  *   Copyright (C) 2004-2021 by Pere Constans
    3  *   constans@molspaces.com
    4  *   cb2Bib version 2.0.1. Licensed under the GNU GPL version 3.
    5  *   See the LICENSE file that comes with this distribution.
    6  ***************************************************************************/
    7 #include "c2bSaveRegExp.h"
    8 
    9 #include "c2b.h"
   10 #include "c2bSaveREHighlighter.h"
   11 #include "c2bSettings.h"
   12 #include "c2bUtils.h"
   13 
   14 #include <QPushButton>
   15 
   16 
   17 /** \page regexpeditor Regular Expression Editor
   18 
   19   Once a manual processing is done, cb2Bib clipboard area contains the
   20   extraction tags, plus, possibly, some other cb2Bib tags introduced during the
   21   preprocessing (see \ref clipboard). The <b>RegExp Editor</b> will generate a
   22   guess regular expression or matching pattern usable for automated
   23   extractions.
   24 
   25   The cb2Bib matching patterns consist of four lines: a brief description, the
   26   reference type, an ordered list of captured fields, and the regular
   27   expression itself.
   28 
   29 \htmlonly
   30 <pre>
   31 # cb2Bib GET_VERSION Pattern:
   32 American Chemical Society Publications
   33 article
   34 journal volume pages year title author abstract
   35 ^(.+), (\d+) \(.+\), ([\d|\-|\s]+),(\d\d\d\d)\..+&#060;NewLine3&#062;(.+)&#060;NewLine4&#062;
   36 (.+)&#060;NewLine5&#062;.+Abstract:&#060;NewLine\d+&#062;(.+)$
   37 </pre>
   38 \endhtmlonly
   39 
   40   The Regular Expression Editor provides the basic skeleton and a set of
   41   predefined suggestions. The regular expressions follow a Perl-like sintax.
   42   There are, however, some slight differences and minor limitations.
   43   Information about the basics on the editing and working with Regular
   44   Expressions as used by cb2Bib can be found at the Qt document file
   45 \htmlonly
   46 <a href="https://doc.qt.io/archives/qt-5.6/QRegExp.html#introduction"
   47 target="_blank">Qt Documentation's QRegExp Class</a>.
   48 \endhtmlonly
   49 
   50 
   51   <b>Remember when creating and editing regular expressions:</b>
   52 
   53   - Switch the clipboard mode to 'Tagged Clipboard Data', using the clipboard
   54   panel context menu.
   55 
   56   - Extract the bibliographic reference manually. On the clipboard panel will
   57   appear some cb2Bib tags that indicate which fields are being extracted. Once
   58   done, type Alt+I to enter to the regular expression editor. In the editor,
   59   there are the four line edits that define a cb2Bib pattern, one copy of the
   60   clipboard panel, and an information panel. The information panel displays
   61   possible issues, and, once everything is correct, the actual extracted
   62   fields. The clipboard panel highlights the captures for the current regular
   63   expression and current input text.
   64 
   65   - Patterns can be modified at any time by typing Alt+E to edit the regular
   66   expression file. Patterns are reloaded each time the automatic pattern
   67   recognition is started. This permits editing and testing.
   68 
   69   - cb2Bib processes sequentially the list of regular expressions as found
   70   in the regular expression file. It stops and picks the first match for the
   71   current input. <b>Therefore, the order of the regular expressions is
   72   important</b>. Consequently, to avoid possible clashing among similar
   73   patterns, consider sorting them from the most restrictive pattern to the less
   74   one. As a rule of thumb, the more captions it has the most restrictive a
   75   pattern is.
   76 
   77   - <b>The cb2Bib proposed patterns are general, and not necessarily the most
   78   appropriate for a particular capture</b>. E.g. tag <tt>pages</tt> becomes
   79   <tt>([\\d|\\-|\\s]+)</tt>, which considers digits, hyphens, and spaces. It
   80   must be modified accordingly for reference sources with, e.g., <tt>pages</tt>
   81   written as Roman ordinals.
   82 
   83   - <b>Avoid whenever possible general patterns <tt>(.+)</tt></b>. There is a
   84   risk that such a caption could include text intended for a posterior caption.
   85   This is why, sometimes, the cb2Bib proposed pattern is not hit by the input
   86   stream that originated it. <b>Use, whenever possible, cb2Bib anchors like
   87   <tt>\<NewLine1\></tt> instead of <tt>\<NewLine\\d+\></tt>. They prevent
   88   <tt>(.+)</tt> captions to overextend</b>.
   89 
   90   - To debug a large regular expression it might be useful to break it to the
   91   first capturing parenthesis. For instance, the above pattern will be
   92 
   93 \verbatim
   94 # cb2Bib GET_VERSION Pattern:
   95 American Chemical Society Publications
   96 article
   97 journal
   98 ^(.+),
   99 \endverbatim
  100 
  101   - Then, check if anything is captured and if this corresponds to
  102   <tt>journal</tt>.
  103 
  104   - Add on successive steps your set of captions and BibTeX fields.
  105 
  106 */
  107 c2bSaveRegExp::c2bSaveRegExp(const QStringList& pattern, const QString& input, QWidget* parentw) : QDialog(parentw)
  108 {
  109     Q_ASSERT_X(pattern.count() == 3, "c2bSaveRegExp", "Expected exactly three strings for pattern");
  110     ui.setupUi(this);
  111     setWindowFlags(windowFlags() & ~Qt::WindowContextHelpButtonHint);
  112     connect(ui.buttonBox, SIGNAL(helpRequested()), this, SLOT(help()));
  113     c2bSettings* settings(c2bSettingsP);
  114     ui.Input->setFont(settings->c2bMonoFont);
  115     ui.Type->setText(pattern.at(0));
  116     ui.Fields->setText(pattern.at(1));
  117     ui.RegExp->setText(pattern.at(2));
  118     ui.Name->setFocus();
  119     updateInput(input);
  120     _pattern_rx.setMinimal(true);
  121     _sreS = new c2bSaveREHighlighter(_pattern_rx, ui.Input->document());
  122     setInformation();
  123     connect(ui.Type, SIGNAL(textChanged(QString)), this, SLOT(setInformation()));
  124     connect(ui.Fields, SIGNAL(textChanged(QString)), this, SLOT(setInformation()));
  125     connect(ui.RegExp, SIGNAL(textChanged(QString)), this, SLOT(setInformation()));
  126     connect(ui.Input, SIGNAL(textChanged()), this, SLOT(inputMightHaveChanged()));
  127 }
  128 
  129 c2bSaveRegExp::~c2bSaveRegExp() {}
  130 
  131 
  132 void c2bSaveRegExp::setInformation()
  133 {
  134     QString info;
  135     bool can_save(false);
  136     const QStringList field_list(ui.Fields->text().split(' ', QString::SkipEmptyParts));
  137     const int fields(field_list.count());
  138     _pattern_rx.setPattern(ui.RegExp->text());
  139     const int captures(_pattern_rx.captureCount());
  140     if (_pattern_rx.isValid() && fields > 0 && fields == captures && !ui.Type->text().isEmpty())
  141     {
  142         info += tr("Reference type: %1\n").arg(ui.Type->text());
  143         info += tr("Number of fields: %1\n").arg(fields);
  144         can_save = true;
  145     }
  146     else
  147     {
  148         if (ui.Type->text().isEmpty())
  149             info += tr("[Error] Invalid pattern: empty reference type\n");
  150         if (fields == 0)
  151             info += tr("[Error] Invalid pattern: no fields declared\n");
  152         if (_pattern_rx.isValid())
  153         {
  154             if (captures == 0)
  155             {
  156                 if (_pattern_rx.pattern().isEmpty())
  157                     info += tr("[Error] Invalid pattern: empty regular expression\n");
  158                 else
  159                     info += tr("[Error] Invalid pattern: no captures defined in the regular expression\n");
  160             }
  161             else if (fields != captures)
  162                 info += tr("[Error] Invalid pattern: declared %1 fields while the regular expression has %2 captures\n")
  163                         .arg(fields)
  164                         .arg(captures);
  165         }
  166         else
  167             info += tr("[Error] Invalid regular expression: %1\n").arg(_pattern_rx.errorString());
  168     }
  169     if (can_save)
  170     {
  171         if (_pattern_rx.indexIn(ui.Input->toPlainText()) == -1 || _pattern_rx.matchedLength() < 1)
  172             info += tr("[Info] Regular expression does not match input text\n");
  173         else
  174         {
  175             bibParser* bp = c2b::bibParser();
  176             for (int i = 0; i < fields; ++i)
  177             {
  178                 const QString& f = field_list.at(i);
  179                 const QString v(bp->parse(f, _pattern_rx.cap(i + 1)));
  180                 info += QString("[%1]: '%2'\n").arg(f, v);
  181             }
  182         }
  183     }
  184     ui.Information->setPlainText(info);
  185     ui.buttonBox->button(QDialogButtonBox::Save)->setEnabled(can_save);
  186     _sreS->rehighlight();
  187 }
  188 
  189 void c2bSaveRegExp::updateInput(const QString& text)
  190 {
  191     if (!ui.Input->textCursor().hasSelection())
  192         ui.Input->setPlainText(c2b::bibParser()->setTags(text));
  193 }
  194 
  195 void c2bSaveRegExp::inputMightHaveChanged()
  196 {
  197     // Avoid recursively calling setInformation due to syntax highlighting
  198     if (_input_text == ui.Input->toPlainText())
  199         return;
  200     _input_text = ui.Input->toPlainText();
  201     setInformation();
  202 }
  203 
  204 void c2bSaveRegExp::accept()
  205 {
  206     const QString rx(ui.Type->text() + '\n' + ui.Fields->text() + '\n' + ui.RegExp->text());
  207     emit savePatternInfo(rx, ui.Name->text());
  208     QDialog::accept();
  209 }
  210 
  211 void c2bSaveRegExp::help()
  212 {
  213     c2bUtils::displayHelp("https://www.molspaces.com/cb2bib/doc/regexpeditor/");
  214 }