If the "broken session" occurs at the exact same point of the transfer (and does not appear to related to some other effect such as congestion) it would definitely point to an issue with the firewall or router. (Any state issue because of the sequence in traffic is causing the issue). Can you can confirm that?

Basically the "inspect ftp" functionality fixes up ip address and port mapping that occurs when crossing a firewall, particularly when NATting is occuring. What is curious though from your traces is that there is obviously no NATting and the nor port translations going on.

Also from what I understand because in "passive" mode both connections are initiated by the client, then "ftp inspect" may not be required.

(FTP is an ugly protocol - that's why most of us avoid it like the plague ;-) )

Thank you for your time, effort and expertise! After
receiving your messages and doing some more detailed investigation I have been
able to determine that my initial topology diagram was missing a component and
is more correctly drawn as…

Both the router and firewall are Cisco devices. The firewall
is an ASA that is actively performing firewall functions between the two
subnets. As such it has "FTP inspect" turned on. Both are
provided as a managed service to our company and the level of support and
expertise that comes along with them isn't what I would consider to be robust.
However, I've discovered an internal resource that has access to them so we
should be able to make packet traces on the firewall as needed.

Both the iptables firewall and selinux functionality on the
server are turned off.

At this point I am starting to lean toward it being a
firewall issue. I've been able to recreate the "broken session"
using these same two systems by repeatedly sandblasting the same data set from
client to server (using "mput *" over and over and over) but have not
been able to recreate it from another system on the same subnet as the server.
I will expand the packet capture data size to "-s 256" and make
another set.

As to your questions regarding the specific behaviors of the
"router/firewall" I'm afraid those are beyond my area of expertise.

To: Community support list for WiresharkSubject: Re: [Wireshark-users] Analyzing a "broken" FTP
session

Martin,
John,

I
have used editcap and mergecap to combine the client and server file into one
(with a calculated timeshift so that they are now in order). I also used
bittwiste to alter the ip-addresses to make life easier. I used 10.0.0.1 for
the client, 10.0.0.2 for the client-side interface of the router, 10.0.0.3 for
the server-side interface of the router and 10.0.0.4 for the server. The result
is attached.

First
of all, the "router" seems to be application aware or at least tcp
aware. It looks like it does some proxying and not only routing the packets.
This can be seen in the attached file in frames 115-121. Somehow it does not
forward the client ACK in frame115 to the server, but when the server
retransmits the un-acked data in frame 116, the "router" does not
forward this data, but now forwards the client ACK to ack the data (frame 117,
same ip.id as in frame 115). Frame 118 is not immediately forwarded to the
client, but after 200ms the "router" acks the previous segment
(why the previous segment and not the last segment from the server???),
but now it does forward the packet to the client in frame 120.
This is not router-like behavior, this looks more like a loadbalancer or proxy.

Then
the client ACKs the packet and sends a PASV for another transfer. However,
these packets are rejected by the TCP stack on the server, even though the SEQ
and ACK are OK. They both are sent with a SEQ of 127, but the server says:
"Hey, don't send me that, send me the data starting from SEQ=127". Of
course the client thinks the server did not get the data and sends the data
again. Now the client and the server have a little loop.

So
why would the server reject the data, even though it is the data it is waiting
for? Assuming the traces were made on the server itself, the packets are
captured with libpcap, which sits on top of the driver, before the packets hit
the IP and TCP layer. In between libpcap and the IP layer there could be
iptables, netfilter or any other filtering/natting/etc module. My guess is that
this module between libpcap and the TCP layer alters the sequence numbers in
such a way that the TCP layer thinks they are out of state and so it asks for
the "proper" segments.

So:

1)
What kind of device is the "router"?

2)
Why does it not forward frame 115 immediately?

3)
Why does it not ack frame 118 to the server, but instead acks frame 116 for the
second time? Is this a bug?

4)
Why does the server not accept frames 123 and 124 with seq=127 and asks for
data starting at seq=127??? Is this a bug in a filtering/natting module?

Indeed
it would be interesting to see whether this behavior is seen on all events
where a file-transfer is broken.

As you possibly know, for FTP, there is always two sessions between client and
server. The control session is opened by the client on TCP port 21. Then for
every subsequent data transfer another data session is required. The control
session is always long-lived.

In your case a new data session is opened for each transfers (you are using
Passive, PASV, mode). We see 4 STOR requests, and hence 4 new TCP sessions for
each data transfer are established.

The first 3 all go through without incident. However on the 4th some trouble
appears. What seems to happen is that the data transfer is finished, but the
command channel still had not completed the transaction. It seems that the
server has got confused.

Looking at made_on_server.pcap:-

51: Client issues STOR request for CAMBRID06_QT.CFLT.pgp
52: Server responds, asking the client to open a BINARY data channel
53-57 The transfer happens on server TCP port 47818
58: The server sends another copy of the of the server open BINARY data
channel response that it sent in 52. Now the client did ACK the server response
sent in frame 52 in good time (you can see this on the made_on_client.pcap),
but the server hasn't seen this yet and got impatient (a 200ms timer went off).
(Unfortunately your packet captures are truncated, so we assume it is the same
data in the response)
59: The late ACK from the client arrives
60: The server tells the client that data connection is closed (presumably for
CAMBRID06_QT.CFLT.pgp)
61: Client ACKs again Frame 52 and 58
62: Clients ACKs frame 60
63: Client requests a new PASV connection (for next STOR presumably)
64-65: The server ACKs packet 59 again
66: The server repeats the data connection closed response of 60 (even though
it was already ACKed by the client)
67: The client ACKs again
68: The client retransmits the PASV request of 63
69: The server only ACKs to 62
70: The client retransmits the PASV request of 63
71: The server only ACKs to 62
and-so-on
Retransmission timers get longer and longer.
88: Keep-alives start kicking in
89: Client responds to keep-alive
...
109: Server kills FTP session (with RST)

>From what I can see, it seems that the server has got into a state where it
won't accept a valid PASV request. I have a feeling it is because of the the
fact that the data transfer in 53-57 is finished before it even thinks it
should have started. In the earlier transfers the ACK from the client does come
after the transfer is finiished but within the retransmission timer.

The FTP protocol (and the implementations) should be able to handle this sort
of scenario, so I am not quite sure what is going on there.

(Though one thing is slightly curious the packet sent by the server in frame 58
(the duplicate open BINARY channel command) never reaches the client. I'm not
sure whether this is significant in things getting into the wrong state.)

It would be interesting to see whether this sort of sequence (with
retransmissions) is occuring everytime your FTP session locks up.

Can I suggest for future captures you use a longer tcpdump snap length. The
default is 96 bytes which means we don't see the full FTP command. "-s
0" would be nice, but at least "-s 256" would be OK.

Network information:
The systems are both operating at 100 Mb/s.
They are both in the same physical location.
Client <-> Switch <-> Router <-> Switch <-> Server
The problem is generally seen with FTP sessions involving hundreds of small
files.

I understand that the issue may be network as opposed to server related, and I
understand that the packet captures may not contain enough information to make
a definitive judgment.

Posting snippets of packet captures with reasonable problem reports are
generally welcome. A lot of us here enjoy the challenge of trying interpret a
stream of bytes with the hope of actually diagnosing the higher level issue
that may be causing the problem.

That said, some of us do this work for a living and get paid quite a bit of
money to do it, so doing it for free is only viable if it doesn't take a lot of
time. Any time I respond to such requests is out of a spirit of mentoring the
community and also hopefully giving myself a little confidence that I actually
sometimes know what I am talking about :-)

I'm not sure if this is the correct forum for this but I am hoping to
get some help identifying a problem that sometimes occurs between an FTP client
and server. (If this isn't the right forum can someone point me in the
right direction?)

I have PCAP files made on both systems using tcpdump that have captured
a recent failure, but I do not have enough expertise in packet analysis or the
guts of the FTP protocol to read them and draw a definitive conclusion regarding
why the connection "broke".

If someone can help I am happy to provide more information regarding the
systems themselves, the network topology between them, and the trimmed PCAP
files for analysis.

This e-mail and any files transmitted with it are confidential and are intended
solely for the use of the individual or entity to whom they are addressed. This
communication may contain Photronics' confidential information. If you are not the
intended recipient or the person responsible for delivering the e-mail to the
intended recipient, be advised that you have received this e-mail in error and
that any use, dissemination, forwarding, printing, or copying of this e-mail is
strictly prohibited.