"Fossies" - the Fresh Open Source Software Archive

Contents of apache-nutch-1.4-src.tar.gz (26 Nov 19:49, 2066616 Bytes)

About: Apache Nutch is a web-search software. It builds on Lucene and Solr, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc. Source code.


up home help comments

Fossies path: /unix/www/apache-nutch-1.4-src.tar.gz   [Download | Doxygen docs | CLOC analysis]
Alternative downloads: tar.bz2 | tar.lz | tar.xz | zip
Member sort order: docs related (infos|docs|other) | original | date | pathname | filename | size (top100) | top-path files

Basic infos (README, FAQ, INSTALL, ChangeLog, ...):
-rw-rw-r--  60391 2011-11-05 00:18 apache-nutch-1.4/CHANGES.txt
-rw-rw-r-- 329066 2011-05-07 09:04 apache-nutch-1.4/LICENSE.txt
-rw-rw-r--   2579 2009-10-09 19:02 apache-nutch-1.4/NOTICE.txt
-rw-rw-r--   1593 2010-07-25 08:28 apache-nutch-1.4/README.txt
-rw-rw-r--    113 2009-10-09 19:02 apache-nutch-1.4/src/java/overview.html
-rw-rw-r--   1659 2010-07-08 16:13 apache-nutch-1.4/lib/native/README.txt
-rw-rw-r--    299 2009-10-09 19:02 apache-nutch-1.4/src/plugin/subcollection/README.txt
-rw-rw-r--     71 2009-10-09 19:02 apache-nutch-1.4/src/plugin/creativecommons/README.txt
-rw-rw-r--    469 2009-10-09 19:02 apache-nutch-1.4/src/plugin/urlfilter-automaton/lib/automaton.LICENCE.txt
-rw-rw-r--   1474 2006-02-03 19:49 apache-nutch-1.4/src/plugin/parse-swf/lib/javaswf-LICENSE.txt

Basic docs (manual pages, PDF-,HTML-,/doc/-files, ...):
-rw-rw-r--   3431 2010-02-12 07:59 apache-nutch-1.4/src/plugin/parse-tika/sample/encrypted.pdf
-rw-rw-r--   3151 2010-02-12 07:59 apache-nutch-1.4/src/plugin/parse-tika/sample/pdftest.pdf
-rw-rw-r--   8192 2010-02-12 07:59 apache-nutch-1.4/src/plugin/parse-tika/sample/word97.doc

First 50 (from 643) other files:
-rw-rw-r--     19 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.f0
-rw-rw-r--     19 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.f1
-rw-rw-r--     19 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.f2
-rw-rw-r--     19 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.f3
-rw-rw-r--     19 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.f4
-rw-rw-r--     19 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.f5
-rw-rw-r--   2450 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.fdt
-rw-rw-r--    152 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.fdx
-rw-rw-r--     66 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.fnm
-rw-rw-r--   8675 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.frq
-rw-rw-r--  17355 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.prx
-rw-rw-r--    504 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.tii
-rw-rw-r--  34814 2006-09-19 21:36 apache-nutch-1.4/src/testresources/testcrawl/index/_0.tis
-rwxrwxr-x   8074 2011-09-22 17:10 apache-nutch-1.4/src/java/org/apache/nutch/crawl/AbstractFetchSchedule.java
-rw-rw-r--   1985 2010-07-30 21:50 apache-nutch-1.4/src/java/org/apache/nutch/tools/proxy/AbstractTestbedHandler.java
-rwxrwxr-x   6788 2010-07-14 22:11 apache-nutch-1.4/src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java
-rwxrwxr-x    469 2009-10-09 19:02 apache-nutch-1.4/src/plugin/creativecommons/data/anchor.html
-rw-rw-r--   2648 2011-09-22 17:10 apache-nutch-1.4/src/plugin/index-anchor/src/java/org/apache/nutch/indexer/anchor/AnchorIndexingFilter.java
-rw-rw-r--   1812 2009-10-09 19:02 apache-nutch-1.4/src/java/org/apache/nutch/tools/arc/ArcInputFormat.java
-rw-rw-r--   9342 2011-09-22 17:10 apache-nutch-1.4/src/java/org/apache/nutch/tools/arc/ArcRecordReader.java
-rw-rw-r--  14019 2011-09-22 17:10 apache-nutch-1.4/src/java/org/apache/nutch/tools/arc/ArcSegmentCreator.java
-rw-rw-r--   1397 2009-10-09 19:02 apache-nutch-1.4/src/plugin/urlfilter-automaton/lib/automaton.COPYING.txt
-rw-rw-r--  21522 2006-03-21 23:29 apache-nutch-1.4/src/plugin/urlfilter-automaton/lib/automaton.jar
-rw-rw-r--   3373 2011-03-09 12:48 apache-nutch-1.4/src/plugin/urlfilter-automaton/src/java/org/apache/nutch/urlfilter/automaton/AutomatonURLFilter.java
-rw-rw-r--   1524 2011-07-18 11:23 apache-nutch-1.4/conf/automaton-urlfilter.txt.template
-rw-rw-r--   3675 2011-09-22 17:10 apache-nutch-1.4/src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java
-rw-rw-r--   2585 2009-10-09 19:02 apache-nutch-1.4/src/plugin/protocol-httpclient/jsp/basic.jsp
-rw-rw-r--   7753 2011-09-22 17:10 apache-nutch-1.4/src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
-rwxrwxr-x  10132 2011-09-14 14:13 apache-nutch-1.4/src/java/org/apache/nutch/tools/Benchmark.java
-rw-rw-r--    748 2006-03-21 23:26 apache-nutch-1.4/src/plugin/urlfilter-regex/sample/Benchmarks.rules
-rw-rw-r--    758 2006-03-21 23:29 apache-nutch-1.4/src/plugin/urlfilter-automaton/sample/Benchmarks.rules
-rw-rw-r--  12992 2006-03-21 23:26 apache-nutch-1.4/src/plugin/urlfilter-regex/sample/Benchmarks.urls
-rw-rw-r--  12992 2006-03-21 23:29 apache-nutch-1.4/src/plugin/urlfilter-automaton/sample/Benchmarks.urls
-rw-rw-r--    969 2009-10-09 19:02 apache-nutch-1.4/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/BlockedException.java
-rw-rw-r--   2477 2010-02-12 07:50 apache-nutch-1.4/src/plugin/parse-tika/build-ivy.xml
-rwxrwxr-x   9559 2010-07-07 10:48 apache-nutch-1.4/src/plugin/build-plugin.xml
-rw-rw-r--   1039 2009-10-09 19:02 apache-nutch-1.4/src/plugin/urlfilter-domain/build.xml
-rw-rw-r--   1058 2009-10-09 19:02 apache-nutch-1.4/src/plugin/scoring-link/build.xml
-rw-rw-r--   1058 2009-10-09 19:02 apache-nutch-1.4/src/plugin/scoring-opic/build.xml
-rw-rw-r--   1065 2009-10-09 19:02 apache-nutch-1.4/src/plugin/microformats-reltag/build.xml
-rw-rw-r--   1081 2011-01-07 18:18 apache-nutch-1.4/src/plugin/protocol-file/build.xml
-rw-rw-r--   1100 2010-07-07 11:22 apache-nutch-1.4/src/plugin/lib-nekohtml/build.xml
-rw-rw-r--   1114 2010-07-07 10:48 apache-nutch-1.4/src/plugin/nutch-extensionpoints/build.xml
-rw-rw-r--   1216 2010-07-07 11:22 apache-nutch-1.4/src/plugin/lib-xml/build.xml
-rw-rw-r--   1234 2009-10-09 19:02 apache-nutch-1.4/src/plugin/urlnormalizer-regex/build.xml
-rw-rw-r--   1252 2009-10-09 19:02 apache-nutch-1.4/src/plugin/parse-ext/build.xml
-rw-rw-r--   1381 2010-07-02 12:53 apache-nutch-1.4/src/plugin/parse-zip/build.xml
-rw-rw-r--   1417 2011-09-24 18:23 apache-nutch-1.4/src/plugin/language-identifier/build.xml
-rw-rw-r--   1434 2011-05-04 17:20 apache-nutch-1.4/src/plugin/parse-tika/build.xml
-rw-rw-r--   1475 2010-07-07 11:22 apache-nutch-1.4/src/plugin/protocol-httpclient/build.xml
...

A hint: In order to limit the size of this page, in total 593 archive member files - probably not "information" or "documentation" related - are omitted here. But all those files can be found in the complete docs-related index file or in the originally, by date, by pathname or by filename sorted index files (roughly file size each: 0.3 MB).
 MD5 (apache-nutch-1.4-src.tar.gz): 9714e139e315f5900d2803992e049124
SHA1 (apache-nutch-1.4-src.tar.gz): 7228f181e16814cd58c429398ecff97f1c15dd21