Description

I believe that this issue exists through all Asterisk versions as it appears to be an omission in the SIP RFCs (I am happy to be corrected if that is not true).

In a "normal" INVITE, we have a global timeout that is specified by the Dial command's ring duration, such that if all else fails, app_dial or equivalent will stop the channel from existing beyond its natural lifespan. With a Re-INVITE, this is not the case, and I have a scenario where Asterisk leaks RTP ports:

Inbound call (any channel tech) to SIP endpoint A.

Original call on-hold by SIP A

SIP A calls SIP B, and direct-media call is set up.

SIP A 'REFER's the call to bridge the inbound call to SIP B.

In response to the REFER, Asterisk Re-INVITEs the SIP A and SIP B RTP audio back to Asterisk

SIP A has a bug that means it responds "100 Trying" to the Re-INVITE, but then nothing more.

The 100 Trying clears SIP Timer B, and no other SIP events will occur to progress the Re-INVITE. At this point, Asterisk has a channel that will never timeout, and SIP A considers the channel finished with. This results in us leaking any RTP ports opened for that channel as they are never cleaned up.

This leaves a potential DoS where the original SIP A RTP ports are leaked, and over a period of time can prevent Asterisk and/or the host from working.

Activity

If the SIP A channel sends a 1xx or 18x response to a Re-INVITE packet (A media update, COLP update INVITE, or any other non-initial INVITE), then perhaps we should reset the PVT destroy timer rather than clearing it in handle_response_invite() ? A non-initial INVITE ought to be completed in sub-1 second, so I think allowing the default Timer-B 32 seconds would be plenty?

Could this change be applied to any SIP channel that is in AST_STATE_UP ? Or is that not a safe way to identify a Re-INVITE rather than an initial INVITE?

Steve Davies
added a comment - 13/Jun/12 10:55 AM Thoughts:
If the SIP A channel sends a 1xx or 18x response to a Re-INVITE packet (A media update, COLP update INVITE, or any other non-initial INVITE), then perhaps we should reset the PVT destroy timer rather than clearing it in handle_response_invite() ? A non-initial INVITE ought to be completed in sub-1 second, so I think allowing the default Timer-B 32 seconds would be plenty?
Could this change be applied to any SIP channel that is in AST_STATE_UP ? Or is that not a safe way to identify a Re-INVITE rather than an initial INVITE?

The currently checked-in patch had 2 issues in my testing. I hope the following makes sense. Basically I added about 20 extra warning messages into the timer code to see what was happening.

1) For the Okay UA case where no 1xx packet sent (INVITE/OK/ACK):

This works 100% fine. My trace shows:

> reINVITE
< 200 OK (invite)
> ACK
> BYE
< 200 OK (bye)

2) For the Okay UA case, 1xx packet sent (INVITE/Trying/OK/ACK):

If a "100 Trying" is sent in response to the reINVITE, then check_pendings() is called when it arrives, and because we are an "ongoing_reinvite" we allow the BYE to be sent mid-transaction. This BYE is sent even though the UA is about to send a "200 OK". This "early BYE" is one of the symptoms we are trying to avoid, or I was anyway This early-BYE does upset our UA, which does send a "100 Trying".

The reINVITE 200 OK and ACK do happen, so the pvt is cleared down fine.

It starts as per 2) above,... Asterisk receives the "100 Trying", and immediately sends "BYE" but because the 200 OK and ACK never happen, when __sip_autodestroy() is eventually called for final cleanup pvt->packets is non-empty, and the last method is stuck on INVITE, so __sip_autodestroy just retries forever.

"sip show channels" showed a stuck channel still. Debug showed __sip_autodestroy being called every 10 seconds on this pvt.

NOTE: I just added the "THIS is the difference!" comment. If the UA still responds to the BYE, then the pvt lastmsg is changed from BYE back to the outstanding INVITE, and __sip_autodestroy gets stuck.

Steve Davies
added a comment - 05/Jul/12 10:09 AM The currently checked-in patch had 2 issues in my testing. I hope the following makes sense. Basically I added about 20 extra warning messages into the timer code to see what was happening.
1) For the Okay UA case where no 1xx packet sent (INVITE/OK/ACK):
This works 100% fine. My trace shows:
> reINVITE
< 200 OK (invite)
> ACK
> BYE
< 200 OK (bye)
2) For the Okay UA case, 1xx packet sent (INVITE/Trying/OK/ACK):
If a "100 Trying" is sent in response to the reINVITE, then check_pendings() is called when it arrives, and because we are an "ongoing_reinvite" we allow the BYE to be sent mid-transaction. This BYE is sent even though the UA is about to send a "200 OK". This "early BYE" is one of the symptoms we are trying to avoid, or I was anyway This early-BYE does upset our UA, which does send a "100 Trying".
The reINVITE 200 OK and ACK do happen, so the pvt is cleared down fine.
The trace I got was:
> reINVITE
< 100 Trying (invite)
> BYE <--- too soon, UA gets confused.
< 200 OK (invite)
> ACK
< 487 Cancel (invite) <--- Perhaps UA sees "BYE" instead of "ACK" ?
< 487 Cancel (invite)
< 487 Cancel (invite)
< 487 Cancel (invite)
etc.
3) For the Broken UA case (INVITE/Trying... Dead air):
It starts as per 2) above,... Asterisk receives the "100 Trying", and immediately sends "BYE" but because the 200 OK and ACK never happen, when __sip_autodestroy() is eventually called for final cleanup pvt->packets is non-empty, and the last method is stuck on INVITE, so __sip_autodestroy just retries forever.
At least that is what happened here.
> reINVITE
< 100 Trying (invite)
> BYE <---- THIS is the difference!
< 200 OK (bye)
"sip show channels" showed a stuck channel still. Debug showed __sip_autodestroy being called every 10 seconds on this pvt.
NOTE: I just added the "THIS is the difference!" comment. If the UA still responds to the BYE, then the pvt lastmsg is changed from BYE back to the outstanding INVITE, and __sip_autodestroy gets stuck.
With my PROPOSED patch I get:
1) Same result.
> reINVITE
< 200 OK (invite)
> ACK
> BYE
< 200 OK (bye)
2) BYE waits until after reINVITE
> reINVITE
< 100 Trying (invite)
< 200 OK (invite) <--- Clears reinvite timer
> ACK
> BYE <--- Happens here naturally.
< 200 OK (bye)
3) reINVITE timer expires, BYE is sent, all cleans up.
> reINVITE
< 100 Trying (invite)
(pause for new reinvite timer here)
> BYE
< 200 OK (bye)
(pause for __sip_autodestroy here)
"sip show channels" shows clean.
As always I fully expect you to be able to find a flaw in all this

I'll try to find time sometime to look at this, but mmichelson may have to take. I'm not really working today or tomorrow and am about to hop on a motorcycle and ride about 250 miles to get home. So this is still a re-INVITE in the context of an attended transfer that you are seeing issues?

Terry Wilson
added a comment - 05/Jul/12 10:23 AM I'll try to find time sometime to look at this, but mmichelson may have to take. I'm not really working today or tomorrow and am about to hop on a motorcycle and ride about 250 miles to get home. So this is still a re-INVITE in the context of an attended transfer that you are seeing issues?

Yes, the context is the same as originally, so there are 2 legs. 1st leg handles the "REFER/Accept/NOTIFY/BYE/OK" normally, then the second leg gets the reINVITE sequence above, and proceeds well or badly depending on nastyness of the UA.

FYI for the purposes of reproducing the issue, I am testing this with:

Steve Davies
added a comment - 05/Jul/12 10:39 AM Yes, the context is the same as originally, so there are 2 legs. 1st leg handles the "REFER/Accept/NOTIFY/BYE/OK" normally, then the second leg gets the reINVITE sequence above, and proceeds well or badly depending on nastyness of the UA.
FYI for the purposes of reproducing the issue, I am testing this with:
Bad UA = Aastra 55i, 2.6.0 firmware
Good UA = Aastra 55i, 3.2.2 firmware
So I am not just making this stuff up

I think the reason why Terry wasn't seeing the bad behavior is that the SIPp scenarios he is using are not testing transfers so much as the general reinvite case. In Steve's case, the SIP_PENDINGBYE flag is being set and in Terry's, it's not. This means that when the 100 Trying is received, Steve sees Asterisk behave badly by sending a BYE out immediately whereas Terry does not. I'm going to have a look at Steve's patch in more detail to see if it's the right way to move forward. I suspect it is though.

Mark Michelson
added a comment - 05/Jul/12 11:11 AM I think the reason why Terry wasn't seeing the bad behavior is that the SIPp scenarios he is using are not testing transfers so much as the general reinvite case. In Steve's case, the SIP_PENDINGBYE flag is being set and in Terry's, it's not. This means that when the 100 Trying is received, Steve sees Asterisk behave badly by sending a BYE out immediately whereas Terry does not. I'm going to have a look at Steve's patch in more detail to see if it's the right way to move forward. I suspect it is though.

Mark Michelson
added a comment - 05/Jul/12 11:38 AM I think Steve's latest patch has it right.
If the reinvite times out with no pending BYE, things just move on. If the reinvite times out with a BYE pending, though, then Asterisk will send a BYE.