运作模式
爬山虎有两种运作模式:
1、单 worker 运作模式:限定只能编写一个特定的downloader实例,即可完成所有的爬虫需求。
2、多 worker 运作模式:支持自由编写任意多个worker实例,这是爬山虎原本默认的工作模式。
开发规范
01、编写全局启动脚本:
全局启动脚本是一个独立的全局启动脚本,其一次性加载了多个业务 worker 实例,其存放位置随意, 默认由爬山虎应用助手自动生成,如果手动自由编写,只需要保证脚本能够正常引入如下代码即可:
<?php
require_once "/path/to/PHPCreeper-Appication/Application/Core/Launcher.php";
全局启动脚本的名称最好和配置的爬虫名称保持一致,当你希望随意命名时而导致引擎找不到爬虫或者爬虫名称无效时, 你还可以通过如下代码的start()方法显式的传参来设置爬虫名称,启动代码片段:
<?php
require_once "/path/to/PHPCreeper-Appication/Application/Core/Launcher.php";
\PHPCreeperApp\Core\Launcher::start(); //start方法接受一个可选的参数即爬虫名称
02、编写单一启动脚本:
单一启动脚本指的是各个独立的业务 worker 启动脚本,同样默认由爬山虎应用助手自动生成, 除非你手动自由编写,否则这些脚本的存放位置不可随意摆放,必须位于如下特定的目录中:
/path/to/PHPCreeper-Appication/Application/Spider/项目名/Start/单一启动脚本1.php
/path/to/PHPCreeper-Appication/Application/Spider/项目名/Start/单一启动脚本2.php
/path/to/PHPCreeper-Appication/Application/Spider/项目名/Start/单一启动脚本3.php
/path/to/PHPCreeper-Appication/Application/Spider/项目名/Start/单一启动脚本N.php
单一启动脚本代码片段 AppProducer.php:
<?php
namespace PHPCreeperApp\Spider\News\Start;
require_once dirname(__FILE__, 4) . '/Core/Launcher.php';
use PHPCreeperApp\Core\Launcher;
use PHPCreeper\PHPCreeper;
use PHPCreeper\Producer;
class AppProducer
{
/**
* single instance
*
* @var object
*/
static protected $_instance;
/**
* producer instance
*
* @var object
*/
protected $_producer;
/**
* @brief get single instance
*
* @return object
*/
static public function getInstance()
{
if(!self::$_instance instanceof self)
{
self::$_instance = new self();
}
return self::$_instance;
}
/**
* @brief start entry
*
* @return mixed
*/
public function start($config)
{
//single instance
$this->_producer = new Producer($config);
//set process name
$this->_producer->setName('producer1');
//set process number
$this->_producer->setCount(1);
//set user callback
$this->_producer->onProducerStart = array($this, 'onProducerStart');
$this->_producer->onProducerStop = array($this, 'onProducerStop');
$this->_producer->onProducerReload = array($this, 'onProducerReload');
}
/**
* @brief onProducerStart
*
* @param object $producer
*
* @return mixed
*/
public function onProducerStart($producer)
{
}
/**
* @brief onProducerStop
*
* @param object $producer
*
* @return mixed
*/
public function onProducerStop($producer)
{
}
/**
* @brief onProducerReload
*
* @param object $producer
*
* @return mixed
*/
public function onProducerReload($producer)
{
}
}
//!!! WARN: DON'T CHANGE THE CODES BELOW ALL !!!
//!!! WARN: DON'T CHANGE THE CODES BELOW ALL !!!
//!!! WARN: DON'T CHANGE THE CODES BELOW ALL !!!
if(!defined('GLOBAL_START'))
{
$classname = pathinfo(__FILE__, PATHINFO_FILENAME);
$config = Launcher::getSpiderConfig($spider ?? getSpiderName(), $classname);
$_classname = __NAMESPACE__ . "\\" . $classname;
$_classname::getInstance()->start($config);
PHPCreeper::start();
}
03、每个业务启动脚本名称必须和相应的配置文件名称完全一致:
/path/to/Application/Spider/News/Start/AppProducer.php
/path/to/Application/Spider/News/Config/AppProducer.php