运作模式


爬山虎有两种运作模式:
1、单 worker 运作模式:限定只能编写一个特定的downloader实例,即可完成所有的爬虫需求。
2、多 worker 运作模式:支持自由编写任意多个worker实例,这是爬山虎原本默认的工作模式。

开发规范


01、编写全局启动脚本:

全局启动脚本是一个独立的全局启动脚本,其一次性加载了多个业务 worker 实例,其存放位置随意, 默认由爬山虎应用助手自动生成,如果手动自由编写,只需要保证脚本能够正常引入如下代码即可:

<?php
require_once "/path/to/PHPCreeper-Appication/Application/Core/Launcher.php";

全局启动脚本的名称最好和配置的爬虫名称保持一致,当你希望随意命名时而导致引擎找不到爬虫或者爬虫名称无效时, 你还可以通过如下代码的start()方法显式的传参来设置爬虫名称,启动代码片段:

<?php
require_once "/path/to/PHPCreeper-Appication/Application/Core/Launcher.php";
\PHPCreeperApp\Core\Launcher::start(); //start方法接受一个可选的参数即爬虫名称

02、编写单一启动脚本:

单一启动脚本指的是各个独立的业务 worker 启动脚本,同样默认由爬山虎应用助手自动生成, 除非你手动自由编写,否则这些脚本的存放位置不可随意摆放,必须位于如下特定的目录中:

/path/to/PHPCreeper-Appication/Application/Spider/项目名/Start/单一启动脚本1.php 
/path/to/PHPCreeper-Appication/Application/Spider/项目名/Start/单一启动脚本2.php 
/path/to/PHPCreeper-Appication/Application/Spider/项目名/Start/单一启动脚本3.php 
/path/to/PHPCreeper-Appication/Application/Spider/项目名/Start/单一启动脚本N.php 

单一启动脚本代码片段 AppProducer.php:

<?php
namespace PHPCreeperApp\Spider\News\Start;

require_once dirname(__FILE__, 4) . '/Core/Launcher.php';

use PHPCreeperApp\Core\Launcher;
use PHPCreeper\PHPCreeper;
use PHPCreeper\Producer;

class AppProducer
{
    /**
     *  single instance
     *
     *  @var object 
     */
    static protected $_instance;

    /**
     *  producer instance
     *
     *  @var object
     */
    protected $_producer;

    /**
     * @brief   get single instance 
     *
     * @return  object
     */
    static public function getInstance()
    {
        if(!self::$_instance instanceof self)
        {
            self::$_instance = new self();
        }

        return self::$_instance;
    }

    /**
     * @brief    start entry
     *
     * @return   mixed
     */
    public function start($config)
    {
        //single instance
        $this->_producer = new Producer($config);

        //set process name
        $this->_producer->setName('producer1');

        //set process number
        $this->_producer->setCount(1);

        //set user callback
        $this->_producer->onProducerStart   = array($this, 'onProducerStart');
        $this->_producer->onProducerStop    = array($this, 'onProducerStop');
        $this->_producer->onProducerReload  = array($this, 'onProducerReload');
    }


    /**
     * @brief    onProducerStart  
     *
     * @param    object $producer
     *
     * @return   mixed
     */
    public function onProducerStart($producer)
    {
    }

    /**
     * @brief    onProducerStop
     *
     * @param    object $producer
     *
     * @return   mixed
     */
    public function onProducerStop($producer)
    {
    }

    /**
     * @brief    onProducerReload     
     *
     * @param    object $producer
     *
     * @return   mixed
     */
    public function onProducerReload($producer)
    {
    }
}


//!!! WARN: DON'T CHANGE THE CODES BELOW ALL !!!
//!!! WARN: DON'T CHANGE THE CODES BELOW ALL !!!
//!!! WARN: DON'T CHANGE THE CODES BELOW ALL !!!
if(!defined('GLOBAL_START'))  
{
    $classname = pathinfo(__FILE__, PATHINFO_FILENAME);
    $config = Launcher::getSpiderConfig($spider ?? getSpiderName(), $classname);
    $_classname = __NAMESPACE__ . "\\" . $classname;
    $_classname::getInstance()->start($config);
    PHPCreeper::start();
}

03、每个业务启动脚本名称必须和相应的配置文件名称完全一致:

/path/to/Application/Spider/News/Start/AppProducer.php
/path/to/Application/Spider/News/Config/AppProducer.php
Free Web Hosting