A fairly flexible app that will analyze and report on links in any model that
you register with it.

Links can be bare (urls or image and file fields) or
embedded in HTML (linkcheck handles the parsing). It’s fairly easy to override
methods of the Linkcheck object should you need to do anything more
complicated (like generate URLs from slug fields etc).

You should run its management command via cron or similar to check external
links regularly to see if their status changes. All links are checked
automatically when objects are saved. This is handled by signals.

Basic usage

Add a file named linklists.py to every app (see an example in examples/linklists.py) that either:

has models that contain content (e.g. url/image fields, chunks of markup
or anything that gets transformed into a IMG or HREF when displayed

can be the target of a link - i.e. is addressed by a url - in this case
make sure it has an instance method named ‘get_absolute_url’

Run ./manage.py migrate.

Add to your root url config:

url(r'^admin/linkcheck/', include('linkcheck.urls'))

View /admin/linkcheck/ from your browser.

The file notifications.py is completely optional. It works with
django-admin-blocks to display a notification about broken links as
shown in the screenshot above.

We are aware that this documentation is on the brief side of things so any
suggestions for elaboration or clarification would be gratefully accepted.

Linklist classes

The following class attributes can be added to your Linklist subclasses to
customize the extracted links:

object_filter: a dictionary which will be passed as a filter argument to
the filter applied to the default queryset of the target class. This
allows you to filter the objects from which the links will be extracted.
(example: {'active': True})

object_exclude: a dictionary which will be passed as a filter argument to
the exclude applied to the default queryset of the target class. As with
object_filter, this allows you to exclude objects from which the links
will be extracted.

html_fields: a list of field names which will be searched for links.

url_fields: a list of URLField field names whose content will be
considered as links. If the field content is empty and the field name is
in ignore_empty, the content is ignored.

ignore_empty: a list of fields from url_fields. See the explanation
above. (new in django-linkcheck 1.1)

image_fields: a list of ImageField field names whose content will be
considered as links. Empty ImageField content is always ignored.

Management commands

findlinks

This command goes through all registered fields and records the URLs it finds.
This command does not validate anything. Typically run just after installing
and configuring django-linkcheck.

checklinks

For each recorded URL, check and report the validity of the URL. All internal
links are checked, but only external links that have not been checked during
the last LINKCHECK_EXTERNAL_RECHECK_INTERVAL minutes are checked. This
interval can be adapted per-invocation by using the --externalinterval
(-e) command option (in minutes).

You can also limit the maximum number of links to be checked by passing a number
to the --limit (--l) command option.

Settings

LINKCHECK_EXTERNAL_RECHECK_INTERVAL

Default: 10080 (1 week in minutes)

Will not recheck any external link that has been checked more recently than this value.

LINKCHECK_EXTERNAL_REGEX_STRING

Default: r’^https?://’

A string applied as a regex to a URL to determine whether it’s internal or external.

LINKCHECK_MEDIA_PREFIX

Default: ‘/media/’

Currently linkcheck tests whether links to internal static media are correct by wrangling the URL to be a local filesystem path.

It strips MEDIA_PREFIX off the interal link and concatenates the result onto settings.MEDIA_ROOT and tests that using os.path.exists

This ‘works for me’ but it is probably going to break for other people’s setups. Patches welcome.

LINKCHECK_RESULTS_PER_PAGE

Controls pagination.

Pagination is slightly peculiar at the moment due to the way links are grouped by object.