Nutch is a web-search software. It builds on Lucene and Solr, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc. Source code. 1.x series.

apache-nutch-1.19-src.tar.gz  ( / nutch / 1.19 / apache-nutch-1.19-src.tar.gz)

3712358 bytes,  2022-08-22 17:15 (2022-09-06 19:27),  dbf336f62dc3850626532f9718f7ac26

/ linux / www / apache-nutch-1.19-src.tar.gz/

3656 (2424 regular files in 1232 directories)

Overall:  crc css doc docx dtd gif html java js md odt pdf png rss rtf sh sxw test txt urls xlsx xml xsd xsl zip  (+ remaining files)
Top 10:  html (1305)  java (609)  xml (278)  txt (75)  crc (22)  js (16)  test (13)  png (13)  md (13)  urls (9)

