Gilou: what's the current status on this?
I don't know if my comments (via codemastr) came trough, but.. last thing I saw coming by was some mpatrol output with quite some unresolved symbols [shown as '???'], my reply at that was to use an mpatrol patch for this which would provide meaningful output [http://www.cbmamiga.demon.co.uk/mpatrol/, patch2, apply & recompile/reinstall mpatrol and make clean; make @ unreal]
[note that with this patch the ircd shouldn't be /rehash'ed since that often causes a crash]
Or perhaps this bug was already solved? Or are you working on it codemastr?
I've no idea, it seems most communication went outside bugs* so I couldn't track what's going on ;)
As you know, I've been "ill" for over a month, but I'm able to debug now.

Could whoever is experiencing this bug (Gilou.. m339) keep us up to date etc?
I mean, if nobody is cooperating/responding, there's little we can do...
Furthermore, that kinda makes us feel like you don't care much / it isn't important.
Thanks.

The url was mentioned, along with the instructions on what to do... go to http://www.cbmamiga.demon.co.uk/mpatrol/ and download patch 2. Also, please re-read the comment(s) again, since you obviously didn't, else we'll probably have trouble later (like: if you forget to recompile).

Ok, that was like 10 days ago..
I wonder what you guys are doing over there... All I asked was to install 1 library with a full tutorial/howto on how to do that (ok, except for applying a patch.. but that's like 1 command).
I didn't ask anyone to actually trace down the bug or spent hours/days/weeks on reading the source, that's what we (unreal coders) are for ;).
So, let us know any results.

Really, the longer you wait, the more releases you will have with your bug in it (assuming that it is indeed a real bug). You can't really blame us that we aren't willing to help you out, I've been ready for interpreting any results for over 2 weeks now!

Hey ... I've been waiting a long time before being able to have those kind of answers, now I'm a bit busy, and things have changed, so I'll work with you when I'll have time to ... Anyway, we're getting used to those crashes you know, that'll be ... 3 months that we have the problem, so we're not that in a hurry ...
I'll contact you when I'm ok, or someone of my team is ...

Odd, is that with the patched mpatrol + all instructions followed? ;)
Anyway, could you somewhere upload or mail (zipped) the following files:
- src/ircd
- src/modules/commands.so
- the core file
- the mpatrol log file
Also, you said this is Unreal3.2?
- Unmodified, no 3rd party mods I presume? ;p
- With what options compiled? ssl? zip? ipv6?
Thanks.

This happens using a recent mpatrol, with the patch nÃ‚Â°2 as you asked.
It's a vanilla version, just tar'ed out of your files, and it's compiled with no special features (no ssl, no zip, no ipv6, no remote includes).

Hm, I see what you mean now. The core file seems corrupt indeed.
Could you check:
- if you haven't reached your quota limit on the machine? ('quota'.. although you probably would have noticed it ;p)
- could you paste the output of 'ulimit -a', 'ulimit -a -H' and 'ulimit -c'?
Thanks.

Why do you have to be so rude all the time?
If someone from an opensource project is trying to help you out, then you usually shouldn't go annoy them and be arrogant and such things.
I try to put these things aside, but I must admit I'm not good at that. If it keeps going on you shouldn't be surprised if suddenly all help dissapeared.

Anyway, the core (and mpatrol stuff) is pretty much useless... Nearly all data in the core is 0x00 which cannot not be true (whole &me is 0x00, same for IRCstats, clientTable, local, etc)... so it's clearly corrupt, my thoughts were you had hit your hard core size limit, but apperantly that's not the case.. Well, whatever it is.. we cannot debug now.
The only alternative I can think of is attaching a gdb right from the start.
--
gdb src/ircd
handle SIGPIPE nostop noprint
c
--
Probably best to put that into a 'screen' or something since it will have to sit like that till it crashes.
Then, when the irc crashes (note: it will hang instead of disconnect your users), you can do things like:
--
bt
bt full
p *sptr
p *cptr
frame 1
p *sptr
p *cptr
[whatever]
--
But obviously this is a very imperfect solution since you would have to print any valuable data immediately and while doing that you cannot start a new ircd and all your users will hang, etc... (unless of course, this is ok, then just leave the gdb around to wait for further instructions or so I can logon, whatever..).
Still, if you somehow manage to trace why that box isn't producing proper core files and are able to fix that, then that's even better ;).

You said this was a test server. Were there any clients connected? Was it linked to your network? Basically... what kind of traffic did the server get? (local/global)

And another one.. do all servers of your network have this problem? Or some don't? [since you reported like xx bugs and changed names it's hard to find your whole bug-history back]

All our servers randomly crash, for a reason we haven't find, that's a fact.
So we're using a test server to run mpatrol, to work with Unreal team to find out what's going wrong.
This test server is linked to the network, and crash at about link time, with a 132 MB core file which looks corrupted as you said. I can't find out why, ulimit -c is set to "unlimited", so maybe a system limit is applied, but I have no idea how to set that ... If someone have a clue about it (I'll ask our gurus huh).
I don't have another server able to run mpatrol correctly (system with hard core (aha) limits, too poor CPU / RAM ...), but I think it should be enough.
Now, I can run the ircd in gdb if it's needed, but I agree that a nice mpatrol output would be better.
Another point someone of my team pointed at is the fact that on BSD it seems to crash less often, so we thought about the core limits not being the same ... but this way is obscure :)

If you think you need an access to the box, we can think about it also...

About being rude, sorry for that, you maybe didn't deserved it, but ... you should understand we've been waiting a long time and that we're thinking about changing ircd software if we can't get Unreal more reliable, but it'll be a lot of work migrating, getting features we need (without spamfilters, our net is down) on it blabla ... And also, we're experiencing lower user counts, and although I'm not sure reliability is the reason of it, it's a bit annoying to think it could :) A few reasons which make us not feel that comfortable

About 'rudeness' and stuff.. I don't think you can blame *us* for like 70% of the time delay, it's pretty much the refusal of following instructions on your side that kept delaying this and the lack of proper communication.
You first reported an issue in mid-June, we are now mid-October, that's 4 months. I'm probably responsible for like 1-1.5 month delay due to my physical problems (which I obviously could not do anything about), but those other ~2.5 months you cannot blame on me.
In fact between 2004-06-23 when I asked you to use a clean version w/mpatrol (0001883, last bugnote) and 2004-08-19 when you actually announced you were going to report the stuff, that's a 2 months delay!

Back on-topic..
So it crashed when linking already? Now that's interresting/fast ;).
[somehow I interpreted it like 'mpatrol at link time' previous time]

Yeah, could you mail me with login info of that box / where to find the ircd that I/we can use. Then I'll try to start it in gdb etc and see if I can find out the cause.
That's pretty much my last hope really, I hope we will be able to find it then :).

SUMMARY: If a remote client changed his/her nick into a qlined nick (eg: an oper using a qlined nick, or: a user using a qlined nick that is not qlined on the other server) then it would slowly corrupt the heap.

Since this was a quite complex bug (although easy to find with mpatrol), perhaps it's worth explaining in-detail to other coders what happend :) :

When getting a nickchange...
if (!IsULine(sptr) && (tklban = find_qline(sptr, nick, &ishold)))
.. and find_qline() got called for a remote client, and it matched..

return (BadPtr(str)) ? star : str;
}
.. which could cause corruption since if it encountered a "space character" before the "end of string" (=untill a 0x00 was encountered) it would replace it with 0x00. So, it would slowly corrupt the heap.
Ex: a pointer to <whatever> containing 0x8020bf12 would become 0x8000bf12 since the 0x20 [space] would become a 0x00.