Introduction
PHPCrawl is a set of classes written in PHP for crawling/spidering websites, so just call it
a webcrawler-library for PHP.
The crawler "spiders" websites and delivers information about all found pages, links, files and so on to
users of the library.
By overriding a special method of the main-class users now decide what should happen to the pages and
their content, files and other information the crawler finds.
PHPCrawl povides a lot of options to specify the behaviour of the crawler like URL- and Content-Type-filters,
cookie-handling, limiter-options and much more.
Requirements
- PHP 4.0.4 or later version with sockets enabled
- PCRE library package (Perl-Compatible Regular Expression, already bundeled with PHP >= 4.2.0, see "requirements" and "installation" in the php-manual)
- PHP with OpenSSL-support for SSL-connections (https). Not necessary for http-connects.
License: GPL (GNU General Public License)
Author: Uwe Hunfeld, phpcrawl | at | cuab.de