"Fossies" - the Fresh Open Source Software Archive

Member "docs/phpcrawl/PHPCrawlerURLFilter.html" (20 Jan 2013, 18525 Bytes) of package /linux/www/SitemapCreator.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) HTML source code syntax highlighting (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 <?xml version="1.0" encoding="iso-8859-1"?>
    2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    3   <html xmlns="http://www.w3.org/1999/xhtml">
    4         <head>
    5             <!-- template designed by Marco Von Ballmoos -->
    6             <title>Docs For Class PHPCrawlerURLFilter</title>
    7             <link rel="stylesheet" href="../media/stylesheet.css" />
    8             <meta http-equiv='Content-Type' content='text/html; charset=iso-8859-1'/>
    9         </head>
   10         <body>
   11             <div class="page-body">         
   12 <h2 class="class-name">Class PHPCrawlerURLFilter</h2>
   13 
   14 <a name="sec-description"></a>
   15 <div class="info-box">
   16     <div class="info-box-title">Description</div>
   17     <div class="nav-bar">
   18                     <span class="disabled">Description</span> |
   19                                                     <a href="#sec-var-summary">Vars</a> (<a href="#sec-vars">details</a>)
   20                         |                                           <a href="#sec-method-summary">Methods</a> (<a href="#sec-methods">details</a>)
   21                         
   22                     </div>
   23     <div class="info-box-body">
   24                 <!-- ========== Info from phpDoc block ========= -->
   25 <p class="short-description">Class for filtering URLs by given filter-rules.</p>
   26         <p class="notes">
   27             Located in <a class="field" href="_libs---PHPCrawler---PHPCrawlerURLFilter.class.php.html">/libs/PHPCrawler/PHPCrawlerURLFilter.class.php</a> (line <span class="field">8</span>)
   28         </p>
   29         
   30                 
   31         <pre></pre>
   32     
   33             </div>
   34 </div>
   35 
   36 
   37 
   38     <a name="sec-var-summary"></a>
   39     <div class="info-box">
   40         <div class="info-box-title">Variable Summary</span></div>
   41         <div class="nav-bar">
   42             <a href="#sec-description">Description</a> |
   43                         <span class="disabled">Vars</span> (<a href="#sec-vars">details</a>)
   44                             | 
   45                                     <a href="#sec-method-summary">Methods</a> (<a href="#sec-methods">details</a>)
   46                             
   47                                 </div>
   48         <div class="info-box-body">
   49             <div class="var-summary">
   50                                                                                                                                                                                                                                                                                                 <div class="var-title">
   51                     <span class="var-type"><a href="../phpcrawl/PHPCrawlerDocumentInfo.html">PHPCrawlerDocumentInfo</a></span>
   52                     <a href="#$CurrentDocumentInfo" title="details" class="var-name">$CurrentDocumentInfo</a>
   53                 </div>
   54                                                                 <div class="var-title">
   55                     <span class="var-type">int</span>
   56                     <a href="#$general_follow_mode" title="details" class="var-name">$general_follow_mode</a>
   57                 </div>
   58                                                                 <div class="var-title">
   59                     <span class="var-type">bool</span>
   60                     <a href="#$obey_nofollow_tags" title="details" class="var-name">$obey_nofollow_tags</a>
   61                 </div>
   62                                                                 <div class="var-title">
   63                     <span class="var-type">string</span>
   64                     <a href="#$starting_url" title="details" class="var-name">$starting_url</a>
   65                 </div>
   66                                                                 <div class="var-title">
   67                     <span class="var-type">array</span>
   68                     <a href="#$starting_url_parts" title="details" class="var-name">$starting_url_parts</a>
   69                 </div>
   70                                                                 <div class="var-title">
   71                     <span class="var-type">array</span>
   72                     <a href="#$url_filter_rules" title="details" class="var-name">$url_filter_rules</a>
   73                 </div>
   74                                                                 <div class="var-title">
   75                     <span class="var-type">array</span>
   76                     <a href="#$url_follow_rules" title="details" class="var-name">$url_follow_rules</a>
   77                 </div>
   78                                             </div>
   79         </div>
   80     </div>
   81 
   82     <a name="sec-method-summary"></a>
   83     <div class="info-box">
   84         <div class="info-box-title">Method Summary</span></div>
   85         <div class="nav-bar">
   86             <a href="#sec-description">Description</a> |
   87                                                                         <a href="#sec-var-summary">Vars</a> (<a href="#sec-vars">details</a>)
   88                  
   89                 |
   90                         <span class="disabled">Methods</span> (<a href="#sec-methods">details</a>)
   91         </div>
   92         <div class="info-box-body">         
   93             <div class="method-summary">
   94                                                                                                                                                                                 <div class="method-definition">
   95                     static                      <span class="method-result">void</span>
   96                                         <a href="#keepRedirectUrls" title="details" class="method-name">keepRedirectUrls</a>
   97                                             (<span class="var-type"><a href="../phpcrawl/PHPCrawlerDocumentInfo.html">PHPCrawlerDocumentInfo</a></span>&nbsp;<span class="var-name">$DocumentInfo</span>)
   98                                     </div>
   99                                                                                                 
  100                                                 <div class="method-definition">
  101                                             <span class="method-result">void</span>
  102                                         <a href="#addURLFilterRule" title="details" class="method-name">addURLFilterRule</a>
  103                                             (<span class="var-type"></span>&nbsp;<span class="var-name">$regex</span>)
  104                                     </div>
  105                                                                 <div class="method-definition">
  106                                             <span class="method-result">void</span>
  107                                         <a href="#addURLFilterRules" title="details" class="method-name">addURLFilterRules</a>
  108                                             (<span class="var-type"></span>&nbsp;<span class="var-name">$regex_array</span>)
  109                                     </div>
  110                                                                 <div class="method-definition">
  111                                             <span class="method-result">void</span>
  112                                         <a href="#addURLFollowRule" title="details" class="method-name">addURLFollowRule</a>
  113                                             (<span class="var-type"></span>&nbsp;<span class="var-name">$regex</span>)
  114                                     </div>
  115                                                                 <div class="method-definition">
  116                                             <span class="method-result">void</span>
  117                                         <a href="#filterUrls" title="details" class="method-name">filterUrls</a>
  118                                             (<span class="var-type"><a href="../phpcrawl/PHPCrawlerDocumentInfo.html">PHPCrawlerDocumentInfo</a></span>&nbsp;<span class="var-name">$DocumentInfo</span>)
  119                                     </div>
  120                                                                                                 <div class="method-definition">
  121                                             <span class="method-result">void</span>
  122                                         <a href="#setBaseURL" title="details" class="method-name">setBaseURL</a>
  123                                             (<span class="var-type">string</span>&nbsp;<span class="var-name">$starting_url</span>)
  124                                     </div>
  125                                                                 <div class="method-definition">
  126                                             <span class="method-result">bool</span>
  127                                         <a href="#urlMatchesRules" title="details" class="method-name">urlMatchesRules</a>
  128                                             (<span class="var-type"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>&nbsp;<span class="var-name">$url</span>)
  129                                     </div>
  130                                 </div>
  131         </div>
  132     </div>      
  133 
  134     <a name="sec-vars"></a>
  135     <div class="info-box">
  136         <div class="info-box-title">Variables</div>
  137         <div class="nav-bar">
  138             <a href="#sec-description">Description</a> |
  139                                         <a href="#sec-var-summary">Vars</a> (<span class="disabled">details</span>)
  140                         
  141             
  142                                         | 
  143                                     <a href="#sec-method-summary">Methods</a> (<a href="#sec-methods">details</a>)
  144                             
  145                     </div>
  146         <div class="info-box-body">
  147             <a name="var$CurrentDocumentInfo" id="$CurrentDocumentInfo"><!-- --></A>
  148 <div class="oddrow">
  149 
  150     <div class="var-header">
  151         <span class="var-title">
  152             <span class="var-type"><a href="../phpcrawl/PHPCrawlerDocumentInfo.html">PHPCrawlerDocumentInfo</a></span>
  153             <span class="var-name">$CurrentDocumentInfo</span>
  154              = <span class="var-default"> null</span>           (line <span class="line-number">62</span>)
  155         </span>
  156     </div>
  157 
  158     <!-- ========== Info from phpDoc block ========= -->
  159 <p class="short-description">Current PHPCrawlerDocumentInfo-object of the current document</p>
  160     <ul class="tags">
  161                 <li><span class="field">access:</span> protected</li>
  162             </ul>
  163     
  164     
  165         
  166         
  167 
  168 </div>
  169 <a name="var$general_follow_mode" id="$general_follow_mode"><!-- --></A>
  170 <div class="evenrow">
  171 
  172     <div class="var-header">
  173         <span class="var-title">
  174             <span class="var-type">int</span>
  175             <span class="var-name">$general_follow_mode</span>
  176              = <span class="var-default"> 2</span>          (line <span class="line-number">55</span>)
  177         </span>
  178     </div>
  179 
  180     <!-- ========== Info from phpDoc block ========= -->
  181 <p class="short-description">The general follow-mode of the crawler</p>
  182     <ul class="tags">
  183                 <li><span class="field">var:</span> <p>The follow-mode</p><p><ol><li>-&gt; follow every links</li><li>-&gt; stay in domain</li><li>-&gt; stay in host</li><li>-&gt; stay in path</li></ol></p></li>
  184                 <li><span class="field">access:</span> public</li>
  185             </ul>
  186     
  187     
  188         
  189         
  190 
  191 </div>
  192 <a name="var$obey_nofollow_tags" id="$obey_nofollow_tags"><!-- --></A>
  193 <div class="oddrow">
  194 
  195     <div class="var-header">
  196         <span class="var-title">
  197             <span class="var-type">bool</span>
  198             <span class="var-name">$obey_nofollow_tags</span>
  199              = <span class="var-default"> false</span>          (line <span class="line-number">43</span>)
  200         </span>
  201     </div>
  202 
  203     <!-- ========== Info from phpDoc block ========= -->
  204 <p class="short-description">Defines whether nofollow-tags should get obeyed.</p>
  205     <ul class="tags">
  206                 <li><span class="field">access:</span> public</li>
  207             </ul>
  208     
  209     
  210         
  211         
  212 
  213 </div>
  214 <a name="var$starting_url" id="$starting_url"><!-- --></A>
  215 <div class="evenrow">
  216 
  217     <div class="var-header">
  218         <span class="var-title">
  219             <span class="var-type">string</span>
  220             <span class="var-name">$starting_url</span>
  221              = <span class="var-default"> &quot;&quot;</span>           (line <span class="line-number">15</span>)
  222         </span>
  223     </div>
  224 
  225     <!-- ========== Info from phpDoc block ========= -->
  226 <p class="short-description">The full qualified and normalized URL the crawling-prpocess was started with.</p>
  227     <ul class="tags">
  228                 <li><span class="field">access:</span> protected</li>
  229             </ul>
  230     
  231     
  232         
  233         
  234 
  235 </div>
  236 <a name="var$starting_url_parts" id="$starting_url_parts"><!-- --></A>
  237 <div class="oddrow">
  238 
  239     <div class="var-header">
  240         <span class="var-title">
  241             <span class="var-type">array</span>
  242             <span class="var-name">$starting_url_parts</span>
  243              = <span class="var-default">array()</span>         (line <span class="line-number">22</span>)
  244         </span>
  245     </div>
  246 
  247     <!-- ========== Info from phpDoc block ========= -->
  248 <p class="short-description">The URL-parts of the starting-url.</p>
  249     <ul class="tags">
  250                 <li><span class="field">var:</span> The URL-parts as returned by PHPCrawlerUtils::splitURL()</li>
  251                 <li><span class="field">access:</span> protected</li>
  252             </ul>
  253     
  254     
  255         
  256         
  257 
  258 </div>
  259 <a name="var$url_filter_rules" id="$url_filter_rules"><!-- --></A>
  260 <div class="evenrow">
  261 
  262     <div class="var-header">
  263         <span class="var-title">
  264             <span class="var-type">array</span>
  265             <span class="var-name">$url_filter_rules</span>
  266              = <span class="var-default">array()</span>         (line <span class="line-number">36</span>)
  267         </span>
  268     </div>
  269 
  270     <!-- ========== Info from phpDoc block ========= -->
  271 <p class="short-description">Array containing regex-rules for URLs that should NOT be followed.</p>
  272     <ul class="tags">
  273                 <li><span class="field">access:</span> protected</li>
  274             </ul>
  275     
  276     
  277         
  278         
  279 
  280 </div>
  281 <a name="var$url_follow_rules" id="$url_follow_rules"><!-- --></A>
  282 <div class="oddrow">
  283 
  284     <div class="var-header">
  285         <span class="var-title">
  286             <span class="var-type">array</span>
  287             <span class="var-name">$url_follow_rules</span>
  288              = <span class="var-default">array()</span>         (line <span class="line-number">29</span>)
  289         </span>
  290     </div>
  291 
  292     <!-- ========== Info from phpDoc block ========= -->
  293 <p class="short-description">Array containing regex-rules for URLs that should be followed.</p>
  294     <ul class="tags">
  295                 <li><span class="field">access:</span> protected</li>
  296             </ul>
  297     
  298     
  299         
  300         
  301 
  302 </div>
  303                         
  304         </div>
  305     </div>
  306     
  307     <a name="sec-methods"></a>
  308     <div class="info-box">
  309         <div class="info-box-title">Methods</div>
  310         <div class="nav-bar">
  311             <a href="#sec-description">Description</a> |
  312                                                             <a href="#sec-var-summary">Vars</a> (<a href="#sec-vars">details</a>)
  313                                                                     <a href="#sec-method-summary">Methods</a> (<span class="disabled">details</span>)
  314                         
  315         </div>
  316         <div class="info-box-body">
  317             <A NAME='method_detail'></A>
  318 <a name="methodkeepRedirectUrls" id="keepRedirectUrls"><!-- --></a>
  319 <div class="evenrow">
  320     
  321     <div class="method-header">
  322         <span class="method-title">static method keepRedirectUrls</span> (line <span class="line-number">107</span>)
  323     </div> 
  324     
  325     <!-- ========== Info from phpDoc block ========= -->
  326 <p class="short-description">Filters out all non-redirect-URLs from the URLs given in the PHPCrawlerDocumentInfo-object</p>
  327     <ul class="tags">
  328                 <li><span class="field">access:</span> public</li>
  329             </ul>
  330     
  331     <div class="method-signature">
  332         static
  333         <span class="method-result">void</span>
  334         <span class="method-name">
  335             keepRedirectUrls
  336         </span>
  337                     (<span class="var-type"><a href="../phpcrawl/PHPCrawlerDocumentInfo.html">PHPCrawlerDocumentInfo</a></span>&nbsp;<span class="var-name">$DocumentInfo</span>)
  338             </div>
  339     
  340             <ul class="parameters">
  341                     <li>
  342                 <span class="var-type"><a href="../phpcrawl/PHPCrawlerDocumentInfo.html">PHPCrawlerDocumentInfo</a></span>
  343                 <span class="var-name">$DocumentInfo</span><span class="var-description">: PHPCrawlerDocumentInfo-object containing all found links of the current document.</span>         </li>
  344                 </ul>
  345         
  346             
  347     </div>
  348 
  349 <a name="methodaddURLFilterRule" id="addURLFilterRule"><!-- --></a>
  350 <div class="oddrow">
  351     
  352     <div class="method-header">
  353         <span class="method-title">addURLFilterRule</span> (line <span class="line-number">217</span>)
  354     </div> 
  355     
  356     <!-- ========== Info from phpDoc block ========= -->
  357 <p class="short-description">Adds a rule to the list of rules that decide which URLs found on a page should be ignored by the crawler.</p>
  358     <ul class="tags">
  359                 <li><span class="field">access:</span> public</li>
  360             </ul>
  361     
  362     <div class="method-signature">
  363         <span class="method-result">void</span>
  364         <span class="method-name">
  365             addURLFilterRule
  366         </span>
  367                     (<span class="var-type"></span>&nbsp;<span class="var-name">$regex</span>)
  368             </div>
  369     
  370             <ul class="parameters">
  371                     <li>
  372                 <span class="var-type"></span>
  373                 <span class="var-name">$regex</span>            </li>
  374                 </ul>
  375         
  376             
  377     </div>
  378 <a name="methodaddURLFilterRules" id="addURLFilterRules"><!-- --></a>
  379 <div class="evenrow">
  380     
  381     <div class="method-header">
  382         <span class="method-title">addURLFilterRules</span> (line <span class="line-number">231</span>)
  383     </div> 
  384     
  385     <!-- ========== Info from phpDoc block ========= -->
  386 <p class="short-description">Adds a bunch of rules to the list of rules that decide which URLs found on a page should be ignored by the crawler.</p>
  387     <ul class="tags">
  388                 <li><span class="field">access:</span> public</li>
  389             </ul>
  390     
  391     <div class="method-signature">
  392         <span class="method-result">void</span>
  393         <span class="method-name">
  394             addURLFilterRules
  395         </span>
  396                     (<span class="var-type"></span>&nbsp;<span class="var-name">$regex_array</span>)
  397             </div>
  398     
  399             <ul class="parameters">
  400                     <li>
  401                 <span class="var-type"></span>
  402                 <span class="var-name">$regex_array</span>          </li>
  403                 </ul>
  404         
  405             
  406     </div>
  407 <a name="methodaddURLFollowRule" id="addURLFollowRule"><!-- --></a>
  408 <div class="oddrow">
  409     
  410     <div class="method-header">
  411         <span class="method-title">addURLFollowRule</span> (line <span class="line-number">203</span>)
  412     </div> 
  413     
  414     <!-- ========== Info from phpDoc block ========= -->
  415     <ul class="tags">
  416                 <li><span class="field">access:</span> public</li>
  417             </ul>
  418     
  419     <div class="method-signature">
  420         <span class="method-result">void</span>
  421         <span class="method-name">
  422             addURLFollowRule
  423         </span>
  424                     (<span class="var-type"></span>&nbsp;<span class="var-name">$regex</span>)
  425             </div>
  426     
  427             <ul class="parameters">
  428                     <li>
  429                 <span class="var-type"></span>
  430                 <span class="var-name">$regex</span>            </li>
  431                 </ul>
  432         
  433             
  434     </div>
  435 <a name="methodfilterUrls" id="filterUrls"><!-- --></a>
  436 <div class="evenrow">
  437     
  438     <div class="method-header">
  439         <span class="method-title">filterUrls</span> (line <span class="line-number">82</span>)
  440     </div> 
  441     
  442     <!-- ========== Info from phpDoc block ========= -->
  443 <p class="short-description">Filters the given URLs (contained in the given PHPCrawlerDocumentInfo-object) by the given rules.</p>
  444     <ul class="tags">
  445                 <li><span class="field">access:</span> public</li>
  446             </ul>
  447     
  448     <div class="method-signature">
  449         <span class="method-result">void</span>
  450         <span class="method-name">
  451             filterUrls
  452         </span>
  453                     (<span class="var-type"><a href="../phpcrawl/PHPCrawlerDocumentInfo.html">PHPCrawlerDocumentInfo</a></span>&nbsp;<span class="var-name">$DocumentInfo</span>)
  454             </div>
  455     
  456             <ul class="parameters">
  457                     <li>
  458                 <span class="var-type"><a href="../phpcrawl/PHPCrawlerDocumentInfo.html">PHPCrawlerDocumentInfo</a></span>
  459                 <span class="var-name">$DocumentInfo</span><span class="var-description">: PHPCrawlerDocumentInfo-object containing all found links of the current document.</span>         </li>
  460                 </ul>
  461         
  462             
  463     </div>
  464 <a name="methodsetBaseURL" id="setBaseURL"><!-- --></a>
  465 <div class="oddrow">
  466     
  467     <div class="method-header">
  468         <span class="method-title">setBaseURL</span> (line <span class="line-number">69</span>)
  469     </div> 
  470     
  471     <!-- ========== Info from phpDoc block ========= -->
  472 <p class="short-description">Sets the base-URL of the crawling process some rules relate to</p>
  473     <ul class="tags">
  474                 <li><span class="field">access:</span> public</li>
  475             </ul>
  476     
  477     <div class="method-signature">
  478         <span class="method-result">void</span>
  479         <span class="method-name">
  480             setBaseURL
  481         </span>
  482                     (<span class="var-type">string</span>&nbsp;<span class="var-name">$starting_url</span>)
  483             </div>
  484     
  485             <ul class="parameters">
  486                     <li>
  487                 <span class="var-type">string</span>
  488                 <span class="var-name">$starting_url</span><span class="var-description">: The URL the crawling-process was started with.</span>            </li>
  489                 </ul>
  490         
  491             
  492     </div>
  493 <a name="methodurlMatchesRules" id="urlMatchesRules"><!-- --></a>
  494 <div class="evenrow">
  495     
  496     <div class="method-header">
  497         <span class="method-title">urlMatchesRules</span> (line <span class="line-number">125</span>)
  498     </div> 
  499     
  500     <!-- ========== Info from phpDoc block ========= -->
  501 <p class="short-description">Checks whether a given URL matches the rules.</p>
  502     <ul class="tags">
  503                 <li><span class="field">return:</span> TRUE if the URL matches the defined rules.</li>
  504                 <li><span class="field">access:</span> protected</li>
  505             </ul>
  506     
  507     <div class="method-signature">
  508         <span class="method-result">bool</span>
  509         <span class="method-name">
  510             urlMatchesRules
  511         </span>
  512                     (<span class="var-type"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>&nbsp;<span class="var-name">$url</span>)
  513             </div>
  514     
  515             <ul class="parameters">
  516                     <li>
  517                 <span class="var-type">string</span>
  518                 <span class="var-name">$url</span><span class="var-description">: The URL as a PHPCrawlerURLDescriptor-object</span>            </li>
  519                 </ul>
  520         
  521             
  522     </div>
  523                         
  524         </div>
  525     </div>
  526 
  527 
  528     <p class="notes" id="credit">
  529         Documentation generated on Sun, 20 Jan 2013 21:18:50 +0200 by <a href="http://www.phpdoc.org" target="_blank">phpDocumentor 1.4.4</a>
  530     </p>
  531     </div></body>
  532 </html>