I'm not affiliated with the netfilter team. I'm writing this simply because there's not much instructive documentation (e.g. tutorials, how-tos) available pertaining to the use of ipsets, so I'm sharing what I've learned by playing with it, hopefully it will save people some time.

ipset is an extremely useful plugin to iptables, particularly if you want to have a firewall rule that matches against a large set of addresses and/or ports, or if you want to dynamically change the addresses and/or ports that a rule matches against.

It lets you create huge lists of ip addresses and/or ports (with tens of thousands of entries or more) which are stored in a tiny piece of ram with extreme efficiency. In your iptables rules, you can then simply refer to the lists by name, and the entire list is checked with remarkable speed and in a single netfilter rule. Also, you can change the contents of the list while the firewall is running.

iptables is actually just the user interface to netfilter. When you write an iptables rule that lists multiple addresses or ports to be blocked (other than "from-to" ranges or netmasks), each item in the list is actually translated into its own netfilter rule, each of which must be processed. Unnecessary processing of this type can really slow down your traffic, since every new connection (or in some cases, every packet) has to be processed through these rules.

Although it possible to add and remove iptables rules while the firewall is running, it is not very efficient to do so. This is how most firewalls (i.e. iptables "front-ends") add and remove entries to their "dynamic blacklist". Blacklist a new address? That's a new rule that must be processed for every connection. You can't "edit" an iptables rule in a running firewall to change the addresses or ports that it matches. Ipset is, in my opinion, by far the best way to run blacklists, whether static, periodically updated from the Internet, or dynamically managed by your firewall or intrusion detection rules.

Also, while iptables provides for "address ranges" and port ranges (in a from-to, netmask, or CIDR format), it cannot efficiently handle long lists of non-contiguous addresses. Try writing iptables rules to block 3,000 specific addresses that have nothing in common. With ipset, this is trivial. It's also efficient, because you can check a packet against the ipset in a single netfilter action (as opposed to 3,000) much, much faster.

Ipset is made by the same netfilter team that makes iptables. It is actually a collection of additional netfilter-related kernel modules (additional "matches" and "targets"). It is part of the Xtables add-on package. In Gentoo, you can get it simiply by emerging ipset (after your "make modules_install") and enabling the appropriate kernel modules.

There are different types of ipsets for storing different types of information (e.g., for random ip addresses, for addresses that are from the same netblock, for random blocks of addresses, for same-sized blocks of addresses, for random ports, etc.).

You can group multiple ipsets of different types into a single "setlist" (an ipset of type list:set) which is still treated as a single ipset by iptables (therefore, you only need one rule to match packets against multiple ipsets). You can bind ipsets together (e.g. a list of addresses and a list of ports). The man page explains it all.

An example of updating an ipset in an automated fashion, from an internet source:

There are many ways to use ipsets. This script is just an example. This script is intended to periodically (hourly) update an ipset used as a "blacklist" in a firewall (while the firewall is running and actively processing). The list in this case is a list of "Class C"-sized networks (i.e. CIDR /24 blocks of addresses), published hourly by DShield.org, listing the top blocks of networks from which port-scanning activity has been coming in the last 3 days. It's a tiny list as ipsets go, but it serves the purpose of this example (ipsets can contain many thousands of entries).

That's all incidental. What's important is that it's a list of /24 netblocks, and we want to blacklist them. There is a type of ipset called an "iphash" (hash:ip) that is very efficient and handling lists of same-sized networks (i.e., the ipset efficiently contains a list of networks that have the same netmask, in this case /24), so we'll use that hash:ip type of ipset. We'll use wget to retrieve the block list only if it's been updated. Then, if we've got a new list, we'll load it up to the firewall. Since we can instantaneously swap the contents of one ipset for another, that's how we'll update the live firewall -- we'll parse each address out of the downloaded block list and add it to a temporary ipset, then swap the contents of the temporary ipset into the live ipset in the running firewall, instantly updating the firewall's blacklist en masse.

In later posts, there are other examples that are variations on the theme, demonstrating the use of different types of ipsets and other related concepts. This, however, is good place to start.

This script runs hourly by cron on my system, about five minutes after DShield publishes its hourly update (presently, between HH+15m - HH:20m). For logging purposes it uses "logger", which you can install, or you can substitute what you want or modify the logging lines to simply echo to your log file.

# Purpose: Load DShield.org Recommended Block List into an ipset in a running
# firewall. That list contains the networks from which the most malicious
# traffic is being reported by DShield participants.

# Notes: Call this from crontab. Feed updated every 15 minutes.
# netmask=24: dshield's list is all class C networks
# hashsize=64: default is 1024 but 64 is more than needed here

Here's another. This one is used to accurately match (block or allow) traffic from or to entire countries.

This script takes a single command-line argument: the 2-letter IANA country code (the top-level domain, such as "us" or "cn"). Based on that, it downloads a list of all the networks of various sizes that are registered to that country code with the regional registrar. Last I checked, these lists are published twice daily by the ipdeny.com project. These are larger lists than the previous example.

Like the list in the previous example, this is a list of networks. However, unlike the previous example, they are networks which vary in size (i.e., have various netmasks). For that reason, we will use the nethash type of ipset (formally now called hash:net, although referring to it the old way still works).

Selecting the appropriate hashsize: In the example above, where the list is always a specific size and very small, I specifically set the hashsize to something smaller than the default. We will not do that here, because the default hashsize (1024) is appropriate for all but the smallest of the country codes, and the hashsize is just a starting point anyway (as an ipset gets larger, the hashsize is adjusted upward dynamically). Setting the hashsize manually only makes sense when you know the approximate size of the ipset in advance, and the point of doing so is to: (a) save a few KiB of ram in the rare case where you are using a list you know is smaller than the default; or (b) to save a bit of rehashing (which occurs as the ipset grows) in the cases where you know the list is larger than the default. It is also possible to optimize this dynamic resizing process in various ways using the "--probes" and "--resize" options, but the defaults are fine, and we don't need to go into that now.

This script also demonstrates the use of of the setlist type of ipset. All of the country-specific ipsets are added to a single, combined setlist, so we can refer to the whole group as a single ipset (and matching against it is a single efficient netfilter action). In the script you'll see where the required setlist is created (if the setlist does not already exist) and where the ipsets are added to the setlist (if they are not already a member of the setlist).

Other than taking a command-line paramater, using a different type of ipset that is of more of a normal size, and employing a hierarchy of ipsets (i.e., using a setlist), this script is pretty much the same as the one above. I actually run this script twice a day from cron, about an hour after ipdeny.com updates the lists. I run it multiple times, fetching the lists for several countries and creating a corresponding ipset for each (these are just example countries here; no offense intended to anyone):

Those four ipsets could be used directly in iptables (for example, by using "+cn" anywhere an ip address would normally go), but as noted above, I have combined them into a single "setlist". That way, I can simply refer to all the networks in all the countries as "+ipdeny" (there are over 7,000 networks in the example I give here). To check if a packet matches any of those, Netfilter needs only execute a single action: one call to check the setlist, which responds with a match or no-match almost instantaneously.

In fact, using ipset in general is trivial, with the exception of initially gaining an understanding the different set types and what they are for (that takes a thorough reading of the man page).

# Purpose: Load ip networks registered in a country into an ipset and load that
# ipset into a setlist containing several such ipsets, while this setlist is
# being used in a running firewall.
#
# Notes: Call this from crontab. Feed updated about 05:07 and 15:07 daily.
#
# Usage: 'ipdeny <TLD>' (where TLD is top-level national domain, such as "us")

Thanks. I was aware of filtering based on country. Doesn't appeal to me personally as much as a dynamic block list, it is just too general, but people should find it useful. Well done with the scripts.

Thanks. I was aware of filtering based on country. Doesn't appeal to me personally as much as a dynamic block list, it is just too general, but people should find it useful. Well done with the scripts.

You are right. It serves to demonstrate the basics, though.

Beyond that, it's also easy to dynamically blacklist attackers, etc., by writing iptables rules that use the "SET" target (as opposed to the "set" match). The basic iptables options are:
--match-set (compare a packet to an ipset)
--add-set (add address to an ipset)
--del-set (remove address from ipset)

Gentoo provides a rudimentary initscript that saves ipsets upon shutdown and restores them on startup. Some firewall tools, such as Shorewall, have built-in facilities for loading and saving ipsets when the firewall is started or stopped. You can use those, or manage them yourself (just make sure they are loaded up before iptables).

This inistscript is specific to the init system used by Gentoo Linux, and other Linux distributions provide their own startup scripts. The latest version (as of this edit) accommodates setlist type ipsets (which much be destroyed before the sets they contain can be destroyed). If you are installing ipset manually, you can use this as a model.

Here is another that's more practically useful. This script creates an ipset that can be used to block all bogons (not just rfc1918 private IP addresses, but every network block that is unassignable or has not yet been assigned by the regional authorities). These addresses are often used by botnets, spammers, and so on. It creates a fairly large ipset of a complex type, so it takes tens of seconds to initially load up the temporary ipset (the swapout operation is still virtually instantaneous, as are queries).

# Purpose: Periodically update an ipset used in a running firewall to block
# bogons. Bogons are addresses that nobody should be using on the public
# Internet because they are either private, not to be assigned, or have
# not yet been assigned.
#
# Notes: Call this from crontab. Feed updated every 4 hours.

If anyone is wondering how I came up with the ${ipset_params}, the answer is trial-and-error. In the meantime, I'll share the limited understanding I have gained. I should also point out that I've been working exclusively with hash-type ipsets and the iptreemap, so my optimization insights do not extend to other types.

I would be appreciative of any tips regarding this by a more knowledgeable person, on how to identify optimal values for of these and similar ipset parameters.

When you build an ipset that is one of the "hash" types, do an 'ipset -L' and look at it's size. The default size is 1024 (which I assume to be bytes). If your list is small and you guess that a smaller hash could store it, or if the hashsize has been dynamically "grown" to be substantially larger (e.g. well into the multiple megabyte range), then you may want to try to optimize it (i.e., cause the process of loading the ipset to create a more efficient hash).

This is not necessary, since ipset will dynamically grow and rehash an ipset as entries are added (I haven't tested to see if it will dynamically shrink them as entries are removed, but I doubt it). When all is said and done, pursuing this optimization process might cut the size of an ipset in half, and since they typically stay resident in RAM, this can free up some tens of megabytes of RAM. Whether that's worth your time is up to you.

There may be a better way to do this, but I basically run the process of loading the ipset multiple times, with varying parameters, trying to get a smaller hash size. Living by the rule "default is good", and noting that there is typically a point of diminishing returns on varying from them, one can bracket one's way into a reasonable value without too much effort. Basically the process would be:

I find these scripts really useful. I am learning some perl and I have made a perl version of them. I started with perl on Christmas, as a hobby, so do not expect too much of this version. In the the exercise, I have learned some perl, some bash, and really enjoyed deciphering the regex expressions, still a mystery to me!

Glad you found them useful, and thanks for the credit on your page and in your scripts. I'd be interested to know how they compare in terms of performance._________________Deja Moo: the feeling that you've heard this bull before

Moose is the guilty one! _________________Please add [solved] to the initial post's subject line if you feel your problem is resolved.
Thank the community answering other people's post, specially those unanswered.

Last edited by mimosinnet on Wed Apr 25, 2012 10:20 am; edited 1 time in total

Truc, it took me a long time to get back to this, but thank you for the excellent corrections and suggestions. I am incorporating all of them (with one exception, and one in modified form, as below).

Exception: I got rid of the BASH array as you suggested (don't know what I was thinking there), but I left the $networks variable in place because I think it's more apparent what's going on that way.

Modified: The stderr redirection in the get_timestamp function was there to allow processing to continue if the file is not present (which is a normal condition on first run and when the user might purge the data file from /var/tmp). I took it out per your suggestion, so I made the assignment of "old_timestamp" (which calls the function) conditional on the presence of the data file.

Thank you._________________Deja Moo: the feeling that you've heard this bull before

# Purpose: Load DShield.org Recommended Block List into an ipset in a running
# firewall. That list contains the networks from which the most malicious
# traffic is being reported by DShield participants.

# Notes: Call this from crontab. Feed updated every 15 minutes.
# netmask=24: dshield's list is all class C networks
# hashsize=64: default is 1024 but 64 is more than needed here

It wasn't the script that was messed up; it was his kernel configuration.

Also, maybe he's using a different version of cron, or a different method of employing it, but I don't believe most people should need to change the scripts in order to use them with cron. As I see it, those variables belong in the crontab, not in the scripts._________________Deja Moo: the feeling that you've heard this bull before

Sure. That looks fine to me. That's what I intended, that these examples would help people create and maintain their own, and in general make use of ipsets._________________Deja Moo: the feeling that you've heard this bull before

Sorry to post into this old thread. I noticed this morning target="http://feeds.dshield.org/block.txt" is not working for me since this morning. It's redirected to something. I am using target="http://www.dshield.org/block.txt" . Hope this helps some people.

Hi Bones, inspired by your script I made one a little more flexible and easy to upgrade.
I hope to be of your interest and help to others using Shorewall + ipset with a live blacklist feed.

Best regards, Bernardo.

Code:

#! /bin/bash

# Bernardo
# Rev. 1 at March 2015
# Tested with ipset 6.12.1
# Purpose: dynamic update a many block ip/network lists into an ipset
# Notes: call this from crontab. Feed updated every 10/15 minutes
# Tip: to add a new source list copy this after # List Sources:
#
# x=$(($x + 1))
# target[${x}${tname}]="Dshield"
# target[${x}${turl}]="http://feeds.dshield.org/block.txt"
# target[${x}${tawktype}]="0"
#
# and replace turl whit the url of the new source. Verify that the tawktype corresponds the type of awk to convert the line of the list.