urlwatch - a tool for monitoring webpages for updates

This script is intended to help you watch URLs and get notified (via
email or in your terminal) of any changes. The change notification will
include the URL that has changed and a
unified diff
of what has changed.

The script supports the use of a filtering hook function to strip
trivially-varying elements of a webpage.

Basic features

Simple configuration (text file, one URL per line)

Easily hackable (clean Python implementation)

Can run as a cronjob and mail changes to you

Always outputs only plaintext - no HTML mails :)

Supports removing noise (always-changing website parts)

Example hooks to filter content in Python

Uses If-Modified-Since header to save bandwidth (new in 1.9)

Convert non-UTF8 web pages to UTF-8 for mail (new in 1.10)

Handle non-zero shell exit codes as error (new in 1.11)

Support for concurrent/parallel downloads (new in 1.13)

Support for handling UTF-8 in Lynx and html2text (new in 1.15)

Current version: 1.18

2015-02-27: urlwatch 1.18 adds lots of new features
and fixes some bugs:

2014-01-29: urlwatch 1.16 fixes a bug parsing
content-encoding headers, includes a new and improved setup.py
script for easy installation and adds basic support for e-mail
delivery (contributed by Xavier Izard).

2012-08-30: urlwatch 1.15 adds
support for optional UTF-8 handling in the html2text function
of the "html2txt" helper module. Patch contributed by Slavko.

2011-12-08: If you are experiencing problems with
the concurrent page updates, try setting the number of threads
to 1. This might make the updates go slower, but according to
at least one user, it is more stable this way. YMMV.

urlwatch 1.13 adds support for watching websites that only
work with HTTP POST requests. You can add the POST data in
URL-encoded form after the website URL in the urls.txt file,
separated by a single space. This release also adds support
for Python 3.x by providing an appropriate converter script.

For Python versions earlier than 3.2, this release now depends
on the "futures"
package from PyPI (this module is included in the 3.2 standard
library). The usage of futures should reduce the total time
needed to watch several URLs, because network requests are sent
in parallel, which usually leads to better bandwidth usage.

Python compatibility

urlwatch is compatible with Python 2.x (2.5 and newer) and with
Python 3.x. For Python 3, you have to use the included converter
script, which will convert the source code to be compatible with
Python 3 (by using the 2to3 tool included in Python).

Download

urlwatch is available as package in various Linux distributions
for easy installation via the package manager. It also is available
as source tarball and via PyPI (pip install urlwatch).