"Fossies" - the Fresh Open Source Software Archive

Member "docs/phpcrawl/PHPCrawlerRobotsTxtParser.html" (20 Jan 2013, 13846 Bytes) of package /linux/www/SitemapCreator.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) HTML source code syntax highlighting (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 <?xml version="1.0" encoding="iso-8859-1"?>
    2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    3   <html xmlns="http://www.w3.org/1999/xhtml">
    4         <head>
    5             <!-- template designed by Marco Von Ballmoos -->
    6             <title>Docs For Class PHPCrawlerRobotsTxtParser</title>
    7             <link rel="stylesheet" href="../media/stylesheet.css" />
    8             <meta http-equiv='Content-Type' content='text/html; charset=iso-8859-1'/>
    9         </head>
   10         <body>
   11             <div class="page-body">         
   12 <h2 class="class-name">Class PHPCrawlerRobotsTxtParser</h2>
   13 
   14 <a name="sec-description"></a>
   15 <div class="info-box">
   16     <div class="info-box-title">Description</div>
   17     <div class="nav-bar">
   18                     <span class="disabled">Description</span> |
   19                                                     <a href="#sec-var-summary">Vars</a> (<a href="#sec-vars">details</a>)
   20                         |                                           <a href="#sec-method-summary">Methods</a> (<a href="#sec-methods">details</a>)
   21                         
   22                     </div>
   23     <div class="info-box-body">
   24                 <!-- ========== Info from phpDoc block ========= -->
   25 <p class="short-description">Class for parsing robots.txt-files.</p>
   26         <p class="notes">
   27             Located in <a class="field" href="_libs---PHPCrawler---PHPCrawlerRobotsTxtParser.class.php.html">/libs/PHPCrawler/PHPCrawlerRobotsTxtParser.class.php</a> (line <span class="field">8</span>)
   28         </p>
   29         
   30                 
   31         <pre></pre>
   32     
   33             </div>
   34 </div>
   35 
   36 
   37 
   38     <a name="sec-var-summary"></a>
   39     <div class="info-box">
   40         <div class="info-box-title">Variable Summary</span></div>
   41         <div class="nav-bar">
   42             <a href="#sec-description">Description</a> |
   43                         <span class="disabled">Vars</span> (<a href="#sec-vars">details</a>)
   44                             | 
   45                                     <a href="#sec-method-summary">Methods</a> (<a href="#sec-methods">details</a>)
   46                             
   47                                 </div>
   48         <div class="info-box-body">
   49             <div class="var-summary">
   50                                                                                                 <div class="var-title">
   51                     <span class="var-type"><a href="../phpcrawl/PHPCrawlerHTTPRequest.html">PHPCrawlerHTTPRequest</a></span>
   52                     <a href="#$PageRequest" title="details" class="var-name">$PageRequest</a>
   53                 </div>
   54                                             </div>
   55         </div>
   56     </div>
   57 
   58     <a name="sec-method-summary"></a>
   59     <div class="info-box">
   60         <div class="info-box-title">Method Summary</span></div>
   61         <div class="nav-bar">
   62             <a href="#sec-description">Description</a> |
   63                                                                         <a href="#sec-var-summary">Vars</a> (<a href="#sec-vars">details</a>)
   64                  
   65                 |
   66                         <span class="disabled">Methods</span> (<a href="#sec-methods">details</a>)
   67         </div>
   68         <div class="info-box-body">         
   69             <div class="method-summary">
   70                                                                                                                                                                                 <div class="method-definition">
   71                     static                      <span class="method-result"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>
   72                                         <a href="#getRobotsTxtURL" title="details" class="method-name">getRobotsTxtURL</a>
   73                                             (<span class="var-type"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>&nbsp;<span class="var-name">$Url</span>)
   74                                     </div>
   75                                                                 
   76                                                 <div class="method-definition">
   77                                             <span class="method-result">PHPCrawlerRobotsTxtParser</span>
   78                                         <a href="#__construct" title="details" class="method-name">__construct</a>
   79                                         ()
   80                                     </div>
   81                                                                 <div class="method-definition">
   82                                             <span class="method-result">array</span>
   83                                         <a href="#buildRegExpressions" title="details" class="method-name">buildRegExpressions</a>
   84                                             (<span class="var-type">array</span>&nbsp;<span class="var-name">&$applying_lines</span>, <span class="var-type">string</span>&nbsp;<span class="var-name">$base_url</span>)
   85                                     </div>
   86                                                                 <div class="method-definition">
   87                                             <span class="method-result">array</span>
   88                                         <a href="#getApplyingLines" title="details" class="method-name">getApplyingLines</a>
   89                                             (<span class="var-type"></span>&nbsp;<span class="var-name">&$robots_txt_content</span>, <span class="var-type"></span>&nbsp;<span class="var-name">$user_agent_string</span>)
   90                                     </div>
   91                                                                 <div class="method-definition">
   92                                             <span class="method-result">string</span>
   93                                         <a href="#getRobotsTxtContent" title="details" class="method-name">getRobotsTxtContent</a>
   94                                             (<span class="var-type"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>&nbsp;<span class="var-name">$Url</span>)
   95                                     </div>
   96                                                                                                 <div class="method-definition">
   97                                             <span class="method-result">array</span>
   98                                         <a href="#parseRobotsTxt" title="details" class="method-name">parseRobotsTxt</a>
   99                                             (<span class="var-type"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>&nbsp;<span class="var-name">$Url</span>, <span class="var-type">string</span>&nbsp;<span class="var-name">$user_agent_string</span>)
  100                                     </div>
  101                                 </div>
  102         </div>
  103     </div>      
  104 
  105     <a name="sec-vars"></a>
  106     <div class="info-box">
  107         <div class="info-box-title">Variables</div>
  108         <div class="nav-bar">
  109             <a href="#sec-description">Description</a> |
  110                                         <a href="#sec-var-summary">Vars</a> (<span class="disabled">details</span>)
  111                         
  112             
  113                                         | 
  114                                     <a href="#sec-method-summary">Methods</a> (<a href="#sec-methods">details</a>)
  115                             
  116                     </div>
  117         <div class="info-box-body">
  118             <a name="var$PageRequest" id="$PageRequest"><!-- --></A>
  119 <div class="evenrow">
  120 
  121     <div class="var-header">
  122         <span class="var-title">
  123             <span class="var-type"><a href="../phpcrawl/PHPCrawlerHTTPRequest.html">PHPCrawlerHTTPRequest</a></span>
  124             <span class="var-name">$PageRequest</span>
  125                         (line <span class="line-number">15</span>)
  126         </span>
  127     </div>
  128 
  129     <!-- ========== Info from phpDoc block ========= -->
  130 <p class="short-description">A PHPCrawlerHTTPRequest-object for requesting robots.txt-files.</p>
  131     <ul class="tags">
  132                 <li><span class="field">access:</span> protected</li>
  133             </ul>
  134     
  135     
  136         
  137         
  138 
  139 </div>
  140                         
  141         </div>
  142     </div>
  143     
  144     <a name="sec-methods"></a>
  145     <div class="info-box">
  146         <div class="info-box-title">Methods</div>
  147         <div class="nav-bar">
  148             <a href="#sec-description">Description</a> |
  149                                                             <a href="#sec-var-summary">Vars</a> (<a href="#sec-vars">details</a>)
  150                                                                     <a href="#sec-method-summary">Methods</a> (<span class="disabled">details</span>)
  151                         
  152         </div>
  153         <div class="info-box-body">
  154             <A NAME='method_detail'></A>
  155 <a name="methodgetRobotsTxtURL" id="getRobotsTxtURL"><!-- --></a>
  156 <div class="oddrow">
  157     
  158     <div class="method-header">
  159         <span class="method-title">static method getRobotsTxtURL</span> (line <span class="line-number">218</span>)
  160     </div> 
  161     
  162     <!-- ========== Info from phpDoc block ========= -->
  163 <p class="short-description">Returns the Robots.txt-URL related to the given URL</p>
  164     <ul class="tags">
  165                 <li><span class="field">return:</span> Url of the related to the passed URL.</li>
  166                 <li><span class="field">access:</span> public</li>
  167             </ul>
  168     
  169     <div class="method-signature">
  170         static
  171         <span class="method-result"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>
  172         <span class="method-name">
  173             getRobotsTxtURL
  174         </span>
  175                     (<span class="var-type"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>&nbsp;<span class="var-name">$Url</span>)
  176             </div>
  177     
  178             <ul class="parameters">
  179                     <li>
  180                 <span class="var-type"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>
  181                 <span class="var-name">$Url</span><span class="var-description">: The URL as PHPCrawlerURLDescriptor-object</span>          </li>
  182                 </ul>
  183         
  184             
  185     </div>
  186 
  187 <a name="method__construct" id="__construct"><!-- --></a>
  188 <div class="evenrow">
  189     
  190     <div class="method-header">
  191         <span class="method-title">Constructor __construct</span> (line <span class="line-number">17</span>)
  192     </div> 
  193     
  194     <!-- ========== Info from phpDoc block ========= -->
  195     <ul class="tags">
  196                 <li><span class="field">access:</span> public</li>
  197             </ul>
  198     
  199     <div class="method-signature">
  200         <span class="method-result">PHPCrawlerRobotsTxtParser</span>
  201         <span class="method-name">
  202             __construct
  203         </span>
  204                 ()
  205             </div>
  206     
  207         
  208             
  209     </div>
  210 <a name="methodbuildRegExpressions" id="buildRegExpressions"><!-- --></a>
  211 <div class="oddrow">
  212     
  213     <div class="method-header">
  214         <span class="method-title">buildRegExpressions</span> (line <span class="line-number">151</span>)
  215     </div> 
  216     
  217     <!-- ========== Info from phpDoc block ========= -->
  218 <p class="short-description">Returns an array containig regular-expressions corresponding  to the given robots.txt-style &quot;Disallow&quot;-lines</p>
  219     <ul class="tags">
  220                 <li><span class="field">return:</span> Numeric array containing regular-expresseions created for each &quot;disallow&quot;-line.</li>
  221                 <li><span class="field">access:</span> protected</li>
  222             </ul>
  223     
  224     <div class="method-signature">
  225         <span class="method-result">array</span>
  226         <span class="method-name">
  227             buildRegExpressions
  228         </span>
  229                     (<span class="var-type">array</span>&nbsp;<span class="var-name">&$applying_lines</span>, <span class="var-type">string</span>&nbsp;<span class="var-name">$base_url</span>)
  230             </div>
  231     
  232             <ul class="parameters">
  233                     <li>
  234                 <span class="var-type">array</span>
  235                 <span class="var-name">&$applying_lines</span><span class="var-description">: Numeric array containing &quot;disallow&quot;-lines.</span>           </li>
  236                     <li>
  237                 <span class="var-type">string</span>
  238                 <span class="var-name">$base_url</span><span class="var-description">: Base-URL the robots.txt-file was found in.</span>            </li>
  239                 </ul>
  240         
  241             
  242     </div>
  243 <a name="methodgetApplyingLines" id="getApplyingLines"><!-- --></a>
  244 <div class="evenrow">
  245     
  246     <div class="method-header">
  247         <span class="method-title">getApplyingLines</span> (line <span class="line-number">67</span>)
  248     </div> 
  249     
  250     <!-- ========== Info from phpDoc block ========= -->
  251 <p class="short-description">Function returns all RAW lines in the given robots.txt-content that apply to  the given useragent-string.</p>
  252     <ul class="tags">
  253                 <li><span class="field">return:</span> Numeric array with found lines</li>
  254                 <li><span class="field">access:</span> protected</li>
  255             </ul>
  256     
  257     <div class="method-signature">
  258         <span class="method-result">array</span>
  259         <span class="method-name">
  260             getApplyingLines
  261         </span>
  262                     (<span class="var-type"></span>&nbsp;<span class="var-name">&$robots_txt_content</span>, <span class="var-type"></span>&nbsp;<span class="var-name">$user_agent_string</span>)
  263             </div>
  264     
  265             <ul class="parameters">
  266                     <li>
  267                 <span class="var-type"></span>
  268                 <span class="var-name">&$robots_txt_content</span>          </li>
  269                     <li>
  270                 <span class="var-type"></span>
  271                 <span class="var-name">$user_agent_string</span>            </li>
  272                 </ul>
  273         
  274             
  275     </div>
  276 <a name="methodgetRobotsTxtContent" id="getRobotsTxtContent"><!-- --></a>
  277 <div class="oddrow">
  278     
  279     <div class="method-header">
  280         <span class="method-title">getRobotsTxtContent</span> (line <span class="line-number">194</span>)
  281     </div> 
  282     
  283     <!-- ========== Info from phpDoc block ========= -->
  284 <p class="short-description">Retreives the content of a robots.txt-file</p>
  285     <ul class="tags">
  286                 <li><span class="field">return:</span> The content of the robots.txt or NULL if no robots.txt was found.</li>
  287                 <li><span class="field">access:</span> protected</li>
  288             </ul>
  289     
  290     <div class="method-signature">
  291         <span class="method-result">string</span>
  292         <span class="method-name">
  293             getRobotsTxtContent
  294         </span>
  295                     (<span class="var-type"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>&nbsp;<span class="var-name">$Url</span>)
  296             </div>
  297     
  298             <ul class="parameters">
  299                     <li>
  300                 <span class="var-type"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>
  301                 <span class="var-name">$Url</span><span class="var-description">: The URL of the robots.txt-file</span>         </li>
  302                 </ul>
  303         
  304             
  305     </div>
  306 <a name="methodparseRobotsTxt" id="parseRobotsTxt"><!-- --></a>
  307 <div class="evenrow">
  308     
  309     <div class="method-header">
  310         <span class="method-title">parseRobotsTxt</span> (line <span class="line-number">34</span>)
  311     </div> 
  312     
  313     <!-- ========== Info from phpDoc block ========= -->
  314 <p class="short-description">Parses the robots.txt-file related to the given URL and returns regular-expression-rules  corresponding to the containing &quot;disallow&quot;-rules that are adressed to the given user-agent.</p>
  315     <ul class="tags">
  316                 <li><span class="field">return:</span> Numeric array containing regular-expressions for each &quot;disallow&quot;-rule defined in the robots.txt-file                that's adressed to the given user-agent.</li>
  317                 <li><span class="field">access:</span> public</li>
  318             </ul>
  319     
  320     <div class="method-signature">
  321         <span class="method-result">array</span>
  322         <span class="method-name">
  323             parseRobotsTxt
  324         </span>
  325                     (<span class="var-type"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>&nbsp;<span class="var-name">$Url</span>, <span class="var-type">string</span>&nbsp;<span class="var-name">$user_agent_string</span>)
  326             </div>
  327     
  328             <ul class="parameters">
  329                     <li>
  330                 <span class="var-type"><a href="../phpcrawl/PHPCrawlerURLDescriptor.html">PHPCrawlerURLDescriptor</a></span>
  331                 <span class="var-name">$Url</span><span class="var-description">: The URL</span>            </li>
  332                     <li>
  333                 <span class="var-type">string</span>
  334                 <span class="var-name">$user_agent_string</span><span class="var-description">: User-agent.</span>          </li>
  335                 </ul>
  336         
  337             
  338     </div>
  339                         
  340         </div>
  341     </div>
  342 
  343 
  344     <p class="notes" id="credit">
  345         Documentation generated on Sun, 20 Jan 2013 21:18:50 +0200 by <a href="http://www.phpdoc.org" target="_blank">phpDocumentor 1.4.4</a>
  346     </p>
  347     </div></body>
  348 </html>