Home page

Use

Scrapy is a fast high-level screen scraping and web crawling framework,
used to crawl websites and extract structured data from their pages.
It can be used for a wide range of purposes, from data mining to monitoring
and automated testing.

You can use Scrapy to extract any kind of data from a web page, in HTML, XML, CSV
and other formats. I recently used it to automate the extraction of domains and
emails on the ISPA Spam Hall of Shame list, for use in a DNSBL.

Installation

pip install scrapy

Usage

Scrapy is a very extensive package it is not possible to describe its full
usage in a single blog post, There is tutorial
on the scrapy website as well as extensive documentation.

For this post i will describe how i used it to extract listed domains from the
ISPA hall of shame website.