Introduction

PHPCrawl is a set of classes written in PHP for crawling/spidering websites, so just call it a webcrawler-library for PHP.

The crawler "spiders" websites and delivers information about all found pages, links, files and so on to users of the library. By overriding a special method of the main-class users now decide what should happen to the pages and their content, files and other information the crawler finds.

PHPCrawl povides a lot of options to specify the behaviour of the crawler like URL- and Content-Type-filters, cookie-handling, limiter-options and much more.

Requirements

License: GPL (GNU General Public License)
Author: Uwe Hunfeld, phpcrawl | at | cuab.de