Google Analytics

Google Custom Search

"... an engineer who is not only competent at the analytics and technologies of engineering, but can bring value to clients, team well, design well, foster adoptions of new technologies, position for innovations, cope with accelerating change and mentor other engineers" -- CACM 2014/12

Syndicate This Blog

Thursday, April 19. 2018

using nft from nftables, I created some IP filter rules inside a partially
virtualized (Linux Vserver, www.linux-vserver.org) machine. Almost all rules
are working as desired, but rules that need connection tracking helpers,
like ftp and tftp, do not . some ip packets are blocked though they should
be allowed. As the same tftp rules - I am sure that I made no mistake - work
on a real host, there is probably some requirement for these helpers to work
correctly and that is not fulfilled inside a Vserver.

Monday, December 25. 2017

Each rule is evaluated left-to-right. Every element has a "truth" value,
a boolean pass-fail. So things like "counter" and "log" are "always
true" (it has no "test-like action").

Other elements like "tcp dport http" are multiple tests in one, where
first it has to be "tcp" and, now that the tcp-ness is established the
packet is known to have a destination port, so the port number can be
compared to 80 (the http well known port) for equality.

The first time you hit something that is "false" the rest of the line is
skipped.

So mentally put the words "and then" between each directive.

So :: counter and then limit rate 10 kbytes/minute and then counter and
then tcp sport http and then counter and then log prefix "whatever"
flags all and then counter

A very few things break this rule. Like "accept" and "drop" are terminal
to the rule, and the chain, once you "accept" then the rest of the rule
and the rest of the chain just don't matter.

So limit comes before action if you want that action to be limited by
that limit.

Same for literally everything else.

Now what this means is that order is important, but unlike iptables, you
can stack the non-terminal actions several deep in a rule, even
returning adding extra tests after one action and before the next.

So you can make long and complex rules that get progressively more picky
about the packets that get to that point.

The carriage return is like a "start-from-scratch" where the previous
rule is done and the condition is reset to "true".

That is, any rule could be mentally rewritten as "true and then (rest of
rule).

This is why you can just have a log statement as the whole rule if you
want to log every packet that hasn't yet hit a terminal like accept or
drop. Indeed a bare log rule at the end of a chain is a great way to log
all the packets that are about to hit the "policy" of the chain, or
which have otherwise gotten to the end of a sub-chain when you were
expecting to handle every possibility.

So just relax and understand that the whole thing is "as you read",
while iptables was "test-and-do".

So no, the limit isn't limiting "all packets that enter the rule", it
limits all packets that "get as far along the rule as the limit statement".

So ...

limit rate 10 kbytes/minute tcp dport http log "whatever" flags all

and

tcp dport http limit rate 10 kbytes/minute log "whatever" flags all

produce completely different results.

In the first one you are only considering 10 kilobytes a minute of
packets, then of that limited set you are logging the http requests.

In the second one you are considering the first 10 kilobytes of http
requests per minute.

So in the first rule the ssh and ftp and sip and dhcp and all the other
traffic are being considered in the limit, and then only the http is
being logged. This is almost certianly not what you want.

In the second the http-ness is considered first, and then you limit the
logging rate.

And of course it is not "10 kbytes of log" it's 10 kbytes of data" so if
you are sending large requests, and the Mtu is the nominal 1500 bytes
per packet, your ten kbytes is probably seven or eight packets a minute
total.

I use group 1 for ingress ports, 2 for local bridges, 3 for bridge
members and so on. Now I have a greater-than one domain for interior and
equal-to 1 for configured interfaces.

When an interface is being built it's group is "default" (as in zero) so
it starts life as blocked by the drop policies until it's assigned its
correct group.

More from Robert White:

The whole set of
all group numbers is available even when no interfaces are assigned to
the numbers.

so

"iif" and "oif" is the interface by unique number, and can not be
predicted by the kernel as it's driver initialization order dependent.
It also varies by the addition and removal of temporary endpoints like
VPNs and tunnels. The "nft" command simply looks up the number of the
interface when you use a name after the opcode. That means "iif ppp4"
can only be resolved if ppp4 exists when the nft command is run.

"iifname" and "oifname" simply preserve the string you provide and do a
string compare at runtime. You can add and remove the named interface
and the rule set just doesn't care. But it's slow because you are doing
string compares.

Both also suffer from scaling issues as if you have a hundred interfaces
(not likely, but not impossible) then you need to have rules for all 100.

Sets let you cut that down by a bunch.

But what _I_ do is use interface _group_ numbers.

Interfaces are instantiated in the "default" group, which is zero.

You assign a group number to an interface with the ip command. But the
syntax is poorly documented. The manual pages define "group DEVNUM" as a
_selector_, just like "dev DEVNAME". It's a selector in that if you do
something like "ip set group 5 down" all the interfaces in group 5 will
be shut down. (This is a feature). But when you use "ip set dev DEVNAME
group DEVNUM" then the "group" stanza is an assignment.

So I run a fairly simple site, where I picked one (1) as the group for
all my ingress/egress ports, and all the numbers greater than one
represent internal purposes. Group 2 is all my raw internal ports. Group
3 is my bridges. and so on (in my other post I swapped 2 and 3 by accident).

The important point is there is a fixed numeric break between the low
"untrusted" port group numbers, and the higher "trusted" port group numbers.

In a complex site, like if you offer PPP service, you might want the
break number to be higher. with 1 for PPP (the least trustworthy) and 2
for wired public interfaces, and your trusted domain starting at 3. That
way you can filter things like DCHP to be legal on 2 and above, but
preventing your PPP clients from trying to inject DCHP packets (or
whatever).

Anyway, you can now write a concise set of rules using just the group
numbers and, most importantly, load those rules before any interfaces
are active at all.

You can now load a fixed and static set of rules that are much simpler.

You can also design the rules to explicitly or implicitly just
block/drop/reject any interface in group 0.

Then in your various interface up and down scripts you use the ip
command to put the interfaces into their groups at the point you
consider them "ready".

You can even migrate an interface between groups at will. Like maybe
pppX before and after some validation event. (It's also a good way to do
bad things like redirect a soft PPP link to a login page and then bump
it into full service after the login is validated.)

In general, if you pick the numbers wisely you can get a lot of very
good results with extremely high performance and none of the mess.

ASIDE: Group numbers work equally well in iptables, and I've been using
them there for years. I'm still migrating to nft.

It's also a great help to shutting down or stopping an intrusion and
such since "ip link set group 1 down" closes many doors all at once.

So proper use of interface groups can make things way more simple and
quite a lot faster for your task.

Friday, July 21. 2017

Here are some brain triggers for modifications to nftables when working with Strongswan IPSEC policies. Included are rules for NAT.

fraggod offered up the secret sauce to work around the lack of a direct replacement of iptables xt_policy module in nftables.

Wiki for nftables says that both nat prerouting and nat postrouting need to be in place in order to ensure connection tracking and nat processing work properly, even if only outbound nat rules or only inbound nat rules are configured

Given that 172.16.6.0/24 is a subnet on the left side to be encrypted, and 172.16.5.0/24 is a subnet on the right side to be encrypted, they need to bypass the nat rules on the edge interface. The two rules here are for the left hand side. The accept statement is used to bypass nat. The masquerade statement will nat everything else from that subnet. The order of the two rules is important.

IPSEC processing on the receive side is a two stage affair: decrypt, then normal connection tracking processing. The idea here is to mark esp packets, the content of which is unknown. That mark is then used at a later stage to accept the traffic through the connection tracker once decrypted.

NFQUEUE and libnetfilter_queue: In userspace, a software must used libnetfilter_queue to connect to queue 0 (the default one) and get the messages from kernel. It then must issue a verdict on the packet.

Disclaimer: This site may include market analysis. All ideas, opinions, and/or
forecasts, expressed or implied herein, are for informational purposes only and should not
be construed as a recommendation to invest, trade, and/or speculate in the markets. Any
investments, trades, and/or speculations made in light of the ideas, opinions, and/or
forecasts, expressed or implied herein, are committed at your own risk, financial or
otherwise.