Use this Perl script to fetch and maintain a directory of podcast subscriptions. Information about which podcasts to fetch are stored in a configuration file. You can control how many podcast files to keep, where to keep them, whether to rationalize their ID3 tags, and whether to delete older podcast files. This script will also create playlists listing recently-fetched podcasts.

To use this script, create a configuration file (see "Configuration File Format") listing global options and the podcasts you wish to subscribe to; a sample configuration file is located in the distribution directory undef ./conf/fetchpods.conf. Then run fetch_pods from the command line. You can explicitly give the path to the configuration file on the command line as shown here:

fetch_pods.pl ~/pods/news_pods.conf

If called with no arguments, fetch_pods.pl will look for a default configuration file in the following locations:

The script will stop and process the first configuration file it finds. Note that when you installed this script, it created a default fetchpods.conf in /etc/fetchpods.conf. You may want to delete this file.

This config file designates the directory ~/podcasts (subdirectory "podcasts" in your home directory) to receive the podcast files. We turn on verbose progress reporting and activate the rewrite_filename and upgrade_tag features. As described in more detail later, the first option replaces the cryptic default names of the podcast files with longer more informative names, while the second normalizes the ID3 tag information in the podcast files (e.g. setting the genre to "Podcast").

We define two feeds, each in its own section. The [NYTimes] feed subscribes to the New York Times front page podcast, located at the url indicated by the "url" option. We specify a limit of 2 on the number of podcasts to have on hand. The [NPR] feed does the same thing for the National Public Radio morning news summary, except that there is no limit.

After running the script the /tmp/podcasts directory will contain one directory each for the NYTimes and NPR feeds, and will look something like this:

The name of the [Globals] section is significant (including the capital "G") and cannot be changed. In particular, if no base is specified, then the script will not run. The names of each of the feed sections are not meaningful except that they must be unique. In other words, we could just as easily have named them [Feed 1] and [Feed 2].

Most of the [Globals] options can be overridden in individual feed sections. This allows you to create global settings, such as "limit", which are then overidden on a case-by-case basis in each feed section.

For boolean (true/false) options, you may use perl-style truth values (0 for false, 1 for true) or the strings "no" or "yes".

A line that begins with the # symbol is a comment and has no effect on processing the config file.

If generate_playlist is set to a true value, this option allows you to control the name of the generated playlist file. This can contain strftime() style time interpolation codes so that the playlist name contains a timestamp. The default is "%Y-%m-%d_podcasts.m3u", which will produce playlists like "2006-12-28_podcasts.m3u". You will want to add the hour (%H) and minute (%M) if you plan on freshening podcasts more than once a day.

The playlist name may also contain path elements - the subdirectories will be created as necessary. Relative paths will be resolved relative to the base.

If generate_playlist is set to a true value, and if you are mirroring the podcasts to a removable medium such as an sdcard for later use with a portable music player device, you will need to change this option. It contains the directory path to each podcast file as it will appear to the music player. For example, if you mount the medium at /mnt/sdcard and keep podcasts in /mnt/sdcard/podcasts, then the base and playlist_base options might look like this:

base = /mnt/sdcard/podcasts
playlist_base = /podcasts

For Windows-based devices, you might have to specify a playlist_base using Windows filesystem conventions, e.g.:

playlist_base = \podcasts

or even

playlist_base = C:\podcasts

The default is to use the same base path as specified by the "base" option.

Ordinarily each podcast will be placed in a directory named after its channel, directly underneath the directory specified by "base." If this boolean is set to true ("yes"), then each feed section can specify additional levels of directories in which to place the podcast files using the "subdir" option.

For example, the example config file shown earlier will store the NY Times podcasts under ~/podcasts/New_York_Times_Front_Page, and the NPR news summaries under ~/podcasts/7AM_ET_News_Summary. You can organize your podcasts a little better in this way:

This script writes out a file containing its process ID (PID) in order to prevent the script from being run twice at the same time. This option lets you change the path to the file (default /tmp/fetch_pods.pid).

The following options appear in individual feed sections. They can also be placed in the [Globals] section to provide defaults for all feeds. For example, you can place "limit = 5" in [Globals] in order to limit all feeds by default to a maximum number of five podcast files per feed, and then override the limit on a feed-by-feed basis with additional "limit" options in individual feed sections.

This option controls how mirroring of podcast files is performed and can be one of "modified-since" or "exists". The default, "modified-since," will fetch the podcast file if either:

1. a local copy of the podcast is absent.
2. a local copy exists, but the version on the remote server is
more recent.

In contrast "exists" will cause the file to be fetched only if (1) applies. This reduces network traffic, but opens the slight possibility that the remote podcast might be updated (e.g. for a correction) and that you won't mirror the change. This option is primarily intended to work around broken podcasts which have their modification dates set in the future and thus are unecessarily refreshed each time the script runs.

Some podcast files have informative ID3 tags, but many don't. Particularly annoying is the genre, which may be given as "Speech", "Podcast", or anything else. The upgrade_tag option, if set to a non-false value, will attempt to normalize the ID3 tags from the information provided by the RSS feed information. Specifically, the title will be set to the title of the podcast, the album will be set to the title of the channel (e.g. "New York Times Front Page"), the artist will be set to the channel author (e.g. "The New York Times"), the year will be set to the publication date, the genre will be set to "Podcast" and the comment will be set to the channel description. You can change some of these values using the options "force_genre," "force_album," and "force_artist."

The value of upgrade_tag is one of:

no Don't mess with the ID3 tags
id3v1 Upgrade the ID3 version 1 tag
id3v2.3 Upgrade the ID3 version 2.3 tag
id3v2.4 Upgrade the ID3 version 2.4 tag
auto Choose the best tag available

Depending on what optional Perl ID3 manipulation modules you have installed, you may be limited in what level of ID3 tag you can update:

Audio::TagLib all versions through 2.4
MP3::Tag all versions through 2.3
MP3::Info only version 1.0

Choosing "auto" is your best bet. It will dynamically find what Perl modules you have installed, and choose the one that provides the most recent tag version.

By default, fetch_pods.pl will fetch all the podcasts that are currently mentioned in the RSS feed XML document. MP3 files that are no longer listed in the document will be removed. You can override this behavior using the "limit" option. This sets an upper bound on the number of MP3 files that can be stored, either on a global basis, or a per-feed basis. For example, if you specify limit=4 for the NY Times Front Page, then only the most recent four NYT podcasts will be stored, even if the RSS feed lists more.

This is a boolean option which, if true, will cause expired podcasts to be kept even if after they are no longer listed in the RSS feed file. This also changes the behavior of the limit option. When keep_old is false, limit will delete older podcasts in order to make room for newer ones. When keep_old is true, then newer podcasts will not be fetched if the total number of stored podcasts for the current feed exceeds the limit. You will have to manually delete some podcast files in order to make room for more.

If you have "upgrade_tag" set to a true value (and at least one tag-writing module installed) then each podcast's ID3 tag will be modified to create a consistent set of fields using information provided by the RSS feed. The title will be set to the title of the podcast, the album will be set to the title of the channel (e.g. "New York Times Front Page"), the artist will be set to the channel author (e.g. "The New York Times"), the year will be set to the publication date, the genre will be set to "Podcast" and the comment will be set to the channel description.

You can change some of these values using these three options:

force_genre Change the genre to whatever you specify.
force_artist Change the artist.
force_album Change the album.

For example, the NY Times front page RSS feed specifies "The New York Times" as the artist. You can force this to a shorter abbreviation with the following modified feed section:

Note that if you are forced to use ID3v1 tagging (e.g. MP3::Info) then you must choose one of the predefined genres; in particular, there is no genre named "Podcast." You must force something else, like "Speech" instead.