Have powerful content filtering, you can use the jQuey selector to filter content

Has a high degree of modular design, scalability and strong

Have an expressive API

Has a wealth of plug-ins

Through plug-ins you can easily implement things like:

Multithreaded crawl

Crawl JavaScript dynamic rendering page (PhantomJS/headless WebKit)

Image downloads to local

Simulate browser behavior such as submitting Form forms

Web crawler

.....

Requirements

PHP >= 7.0

Installation

By Composer installation:

composer require jaeger/querylist

Usage

DOM Traversal and Manipulation

Crawl「GitHub」all picture links

QueryList::get('https://github.com')->find('img')->attrs('src');

Crawl Google search results

$ql = QueryList::get('https://www.google.co.jp/search?q=QueryList');
$ql->find('title')->text(); //The page title$ql->find('meta[name=keywords]')->content; //The page keywords$ql->find('h3>a')->texts(); //Get a list of search results titles$ql->find('h3>a')->attrs('href'); //Get a list of search results links$ql->find('img')->src; //Gets the link address of the first image$ql->find('img:eq(1)')->src; //Gets the link address of the second image$ql->find('img')->eq(2)->src; //Gets the link address of the third image// Loop all the images$ql->find('img')->map(function($img){
echo$img->alt; //Print the alt attribute of the image
});

Bind function extension

Customize the extension of a myHttp method:

$ql = QueryList::getInstance();
//Bind a `myHttp` method to the QueryList object$ql->bind('myHttp',function ($url){
// $this is the current QueryList object$html = file_get_contents($url);
$this->setHtml($html);
return$this;
});
// And then you can call by the name of the binding$data = $ql->myHttp('https://toutiao.io')->find('h3 a')->texts();
print_r($data->all());

Or package to class, and then bind:

$ql->bind('myHttp',function ($url){
returnnewMyHttp($this,$url);
});

Plugin used

Use the PhantomJS plugin to crawl JavaScript dynamically rendered pages: