Hello together,
I recently came across a segmentation fault problem in smbclient (or more specific the functions interpret_long_filename and cli_list_new of clilist.c).
(Checked only version 3.2.5)
I get the following message when connect to my NAS system and performing an "ls".
cli_list_new: Error: unable to parse name from info level 1
Segmentation fault
Tracking down the problem revealed that the function interpret_long_filename (called from function cli_list_new) can return out of "switch (level) case (1)"
without setting finfo->name which is then accessed within function cli_list_new.
Here is the call scenario:
1. Function "cli_list_new" line 396 calls function "interpret_long_filename"
2. Function "interpret_long_filename" returns (line 91) finfo->name is not set
3. Function "cli_list_new" line 421 access with Null-Pointer, causing the seg. fault.
This problem is not present for me in 3.0.33 (interpret_long_filename did not leave finfo->name unset).
Regards,
Marcus Schmitz

It is a PyroGate MM-4110. I think it is also sold under different names by different "manufacturers".
Maybe it wasn't the best buy ever :(
In case it is "just" a hardware/firmware related problem I will stick to samba-3.0.33 for the moment, as I did not experience troubles here.
Marcus

Two things. 1). I'm pretty sure the +1 was required by an OS/2 server. I'm hoping kukks can comment on this (he was the person with the capture trace that showed this). We may end up having to store a "server arch" flag in the cli struct and use two different codepaths here.
2). NAS-BASIC49 is a custom (Tiawanese I think) implementation of a CIFS server which only recently started supporting user-level security, so it may be a little behind the times. I'm guessing that the way our client write code stresses the server is causing the server to die.
We need to know if the server is dying when we write to it. If so there's not much pressure to fix the directory listing bug as the server is not long for this world anyway.
Jeremy.

(In reply to comment #12)
> Two things. 1). I'm pretty sure the +1 was required by an OS/2 server. I'm
> hoping kukks can comment on this (he was the person with the capture trace that
> showed this). We may end up having to store a "server arch" flag in the cli
> struct and use two different codepaths here.
Sure, I'll wait for Günter's feedback before pushing anything here.
> 2). NAS-BASIC49 is a custom (Tiawanese I think) implementation of a CIFS server
> which only recently started supporting user-level security, so it may be a
> little behind the times. I'm guessing that the way our client write code
> stresses the server is causing the server to die.
> We need to know if the server is dying when we write to it. If so there's not
> much pressure to fix the directory listing bug as the server is not long for
> this world anyway.
Hm, is there a good way to detect this server if the server reply doesn't send it's OS and version? I've seen multiple reports with different NAS devices mentioned.

I've been thinking about this, the easiest way to fix this is to change the 'bool is_samba' in cli_struct to a enum arch_type (can't remember the exact name of the type :-) and then set it to RA_OS2 if the string OS/2 is detected in the server type (like we do with the "Samba" name in the cli_connect functions). Then we'll only use the +1 in the case where it's RA_OS2, and just len in every other case.
We don't need to specifically test for the NAS-BASIC server type as it acts the same as Windows. It's OS/2 that's the odd one out here.
Jeremy.

In February 2008, the main (OS/2 related) segfaulting issues were related to
a) cli_list_new() not checking finfo->name == NULL after returning
from interpret_long_filename()
b) to avoid case a) fix interpret_long_filename() to be "OS/2 compatible"
That days, interpret_long_filename() was using
if (p + len + 2 > pdata_end) {
and had "to be fixed to be OS/2 friendly" to
if (p + len + 1 > pdata_end) {
-------------------------
Kai's new patch
if (p + len > pdata_end) {
does also work with os/2 servers .... (as a side note)
BUT - Marcus already said:
"The patch fixed the visibility problem :) Writing problem remains unchanged."
--------------------------------------------
So a _write_ (smbclient "put") problem is leftover here!
What semantics/async write additions have been added lately to 3.2.x smbclient (libs) which are _not_ in 3.0.x ?
Cheers, Günter

Jeremy, Volker,
I'm a bit confused when looking at the sniff "trace.cap" from Marcus
using wireshark:
1.) There's a failing SMB WriteAndX response, but i don't see _any_ SMB
WriteAndX request!
2.) Why is the write data send via NBSS continuation messages and not via tcp?
Or is that just a different view, cause the sniff was done with "tcpdump"?
I don't remember having seen this before ...
You surely can explain that a bit further. :-)
Cheers, Günter

Created attachment 3814[details]
trace: successful put via 3.0.33
Hello together,
here is some more food for the wireshark ;)
The attachment contains a trace taken while putting a file via 3.0.33.
Hope this is of any help.
Marcus

Created attachment 3816[details]
Manually revert padding added in 3d3d1b
Hm, looking at traces I got in the Launchpad bug report, the only difference I can see is the null padding Jeremy added in 3d3d1b806aef3617abaac46daf230ed32076e2ce
Here's a patch reverting the parts of the change that seem responsible for padding and bcc calculation. Can you check if that helps and get me a trace if it doesn't?

This null padding was added to make us match Windows. It also has the effect of allowing the server to use receivefile on smbclient connections (as it makes our write calls identical to those from a Windows client) so I don't think it's a good idea to lose this padding.
Jeremy.

(In reply to comment #21)
> This null padding was added to make us match Windows. It also has the effect of
> allowing the server to use receivefile on smbclient connections (as it makes
> our write calls identical to those from a Windows client) so I don't think it's
> a good idea to lose this padding.
I'm just trying to figure out if the padding causes the problem, as it's the only difference I found in the traces. I'm not planning to remove it, just trying to put a finger on the problem.

Yeah that doesn't surprise me, the changes make us closer to a Windows client.
So lets take a step back, the problem seems to be that when we do a put the server dies, correct ? My guess is that this server can't do max-mux packets, as Windows rarely sends more than one outstanding write. I'd try adding an sleep(1) between each packet send to see if it's a speed issue.
Jeremy.

Created attachment 3821[details]
trace ~5kB via 3.2.6
This is the trace of 3.2.6 (unsuccessful put). For both traces the same ascii file has been used.
Jeremy,
I have tried adding the sleep has suggested to slow down the packet sends, nevertheless, the problem persists.
Marcus

Created attachment 3822[details]
wild guess...
Can you try the attached patch, just to minimize the differences in behaviour. This should just avoid the header being sent in a separate tcp frame? Can you get us a new 3.2.6 sniff with that?
Thanks,
Volker

Created attachment 3824[details]
patch
Attached find a patch that might be a real fix, not just a bad workaround. If you feel like it, you might try it as well.
Jeremy, if that looks ok, I've got it as a 4-step patchset.
Thanks,
Volker

Oh, that looks like nice work ! Let's see if it fixes the issue with this server. I'm 100% convinced this is a server error, not a client one, but this is a really elegant workaround that maintains the speed of direct writes - thanks !
Jeremy.