Well heck, good job Apple! I just tested facetime and did a quick check on its protocol. No hacking needed - just an on the wire black box inspection - its just plain SIP and STUN for firewall discovery. Apple plans to make this protocol public, and they seem to have done an excellent job. And thanks for showing the world that you don't need complicated encryption and proprietary tunneling tricks for an excellent experience. You need a good codec set, a good media stack that can adaptively switch codecs and manage buffers and a good 'point-of-presence' network for the most part.

I am just going to restrict this post to an overview of the flow.

Enjoy:

click on each image for a larger size (if they are small)

This is a facetime all flow - good, plain, SIP (they use MESSAGE for some proprietary data exchange during the call)

@jason: The complete req-URI in INVITE is user@myip:port - basically, facetime, in this version is sending it to the IP address+port of my iphone. On SIP/IMS: well, this is really plain SIP. It does not use any of the mandatory IMS extensions like PANI or others.

@donnib - that would really depend on how apple will authenticate/admit users. As I mentioned, while most of this is vanilla SIP, there is proprietary stuff going on, along with new headers in the message exchange (primarily MESSAGE). Anyhow, I'd prefer for Apple to first publish their protocol formally before I post a blog on these details.

It seems to me that the most impressive step occurs right before the SIP INVITE.

They are doing a smooth transition from a 3G call into the VoIP session. Somehow, they are mapping a phone number to a visible IP address. Impressive enough in the simple cases, but downright amazing in the face of multiple operators, hidden caller-id, etc.

We...ll, only the call part is SIP. There is a lot of cipher/TLS/SSL exchanges going on to authenticate a facetime user - so don't expect to make a call to a Facetime SIP client using X-lite anytime soon ;-)

@David, well, it seems pretty straightforward. Remember, Facetime does not do both hand-out (from CS-Wifi) and hand-in (Wifi-CS). It only does Hand-out. Once in wifi, you can't get back to CS - that call is dropped. Doing a handover from CS to Wifi is pretty straightforward. Basically, a Wifi call can be set up in the background while the CS call is active. If the Wifi call fails for any reason, the CS call continues. I can't speak for iphone, but in many other phones (like android), a CS call media is handled at the baseband level and for a VoIP call, it would be at the media framework level. Trying to establish a voip call does not interfere with the CS call at all. In fact, the part FaceTime does *not* do is more complicated - handover back from WiFi to CS - thats the more challenging part with respect to smooth media transition.

It's not hard for apple at all to map the PSTN # to a VoIP #. My strong guess is that Apple has already authenticated the binding with their facetime servers via their TLS/SSL exchanges (try it out, disable/enable facetime in settings and each time you do it, you will see these new security associations being set up)

With the identity authenticated, all apple needs to do really is to send you an INVITE to the IP:port that is discovered by STUN (maybe they use other ICE procedures if STUN fails). As far as the # that is displayed on your screen during facetime, that is just the From header text in the SIP INVITE (which is fine, because Apple has already authenticated the identity outside of SIP). Similarly, now, apple can use the same PSTN # (Which is unique to every phone) to differentiate VoIP users too- this is typical VoIP stuff - see the Contact header for example, in the INVITE that is received.

[...] its promise to publish the FaceTime video calling protocol, some details are starting to emerge. Arjun Roychowdhury did a little packet sniffing and reports that the calls seem to go over vanilla SIP and STUN. The [...]

I don't think David's question is answered yet: How does Apple get the two phones' IP addresses from the phone call, unless every iPhone 4 pings Apple every time it makes a call to report it's current IP.

@Marcus, well, if you look at it, there are many ways Apple may know your phone's IP. The entire framework of push notifications in iphone is based on a foundation of the apple push servers maintaining a persistent TCP connection as much as possible with your phone. There is HTTP traffic that also flows between your iphone and apple - when connected through WiFi, that would be your WiFi IP address. I don't know in the case of Facetime, which channel it uses to get your IP, but my point is there are several channels as described above - to get the initial INVITE to your phone (apple uses a different port for SIP). Then STUN comes in before the media starts flowing (All of this is a guess - but I think it is reasonable)

The SIP session is pretty standard, and then SDP will be sent in the INVITE to negotiate media endpoints and codecs to use to setup the call. This is where STUN comes in, as it allows media traversal through NAT.

The FaceTime servers must have a media relay capability as well, because there will be many situations where two iPhones can't connect directly to one another and must use something in the cloud to pass the media between the two.

Thanks for the interesting information, that you have provided. In the following I have some remarks:

The initial INVITE is sent, when you have answered the call at the called side.Have a look at the time stamps of the SIP messages, especially 180 Ringing and 200 OK! The SIP message flow does not correspond with the real call states.

There is only one port (16402) for SIP signalling, RTP streams and RTCP!

Apple does not use a SIP registrar / proxy. The session is established directly between the user agents.

STUN does not address a STUN server in the cloud, but is end-to-end too. It seems, that it is used only to create the bindings in the NAT tables of the routers, simultaneously from both sides.

Arjun, do you find any phone number, which is involved in the call, in the From, To, Via or Contact header?Or are only IP addresses used?Is the FQDN of the XMPP server part of the SIP URIs?Is it possible to post the XMPP server name / IP address?

@Matthias:1) Well, this session was when I received a Facetime call (INVITE came to my iphone 4) - INVITE is the invitation I got to answer the caller's facetime call. I looked at the call flow again - its absolutely in line with a standard SIP call - first I got an INVITE, then I sent 100, then I sent 180, then I sent 200, then I received ACK.

2) Yes, the call is P2P as far as SIP goes, no proxy cuteness as far as I could see (looking at Via) - don't remember this fully, I'll check again, but I think thats correct. (As far as I could tell, apple is using encrypted HTTP and potentially SMS to assert the identity and routing path to the user)

3) I'll take a look at the RTP packets again tomorrow, but no, I don't believe I saw SIP and RT/C/P on the same ports

4) Yes, I found phone numbers. From, To and Contact.

5) No, I am not comfortable posting the server IPs - I really don't want to give out apple server IPs at this stage (I fully understand anyone with FaceTime can easily see a wireshark dump for themselves, just that I don't think it is kosher for me to post it)

1) The time between initial INVITE and ACK is 146 msec, between 180 and 200 OK only 19 msec. It is not possible to answer to the call so quickly. That's because the real call establishment happens in XMPP (which is encrypted and you have no chance to decode it). The SIP session is used only to establish the media streams.

2) Sure, P2P is a very basic SIP scenario, but the 'normal' way is to use Registrar and Proxy.

3) You can also look at the port information in the From header and the m-lines for audio and video in SDP of the initial INVITE, that you published.

To answer the question, how does Apple map IP to phone number: when the phone is set up, there is communication to registration.ess.apple.com, afterwards in all calls only to invitation.ess.apple.com. Further, the phone sends a SMS to Apple (in Europe via a UK number, you can check your bill) to link phone to number. If you change the SIM card, this happens again. Then, before call set up, the calling phone asks Apple for the IP of the target phone.

Hello would you mind letting me know which webhost you're using? I've loaded your blog in 3 completely different internet browsers and I must say this blog loads a lot quicker then most. Can you suggest a good hosting provider at a fair price? Thanks a lot, I appreciate it!

After such a long time, anyone able to grab video stream from the packet and verified it is valid H.264 stream? It seems to me the video has been encrypted or manipulated so it does not look like a valid video stream.

OK, but apart from all the tech details, if I make a Facetime call to my wife in the UK, while I am away in the USA, it won't use mobile roaming data services, but will be billed as a normal phone call?

Yes, the folks at packetscan did a much deeper analysis and reported that the video feed was not encrypted. Note that this was an early version of facetime (they did the analysis a few weeks after I did mine) - so I don't know if recent updates to facetime encrypted it. Read http://www.packetstan.com/2010/07/special-look-fa... for details.

SRTP packets *can* look just like RTP packets, if the optional fields are not provided. So the fact thy the conversation appears to be RTP is not conclusive evidence that the media payloads are not encrypted.

An impressive share! I have just forwarded this onto a co-worker who had been conducting a little homework on this. And he actually bought me breakfast because I stumbled upon it for him... lol. So allow me to reword this.... Thank YOU for the meal!! But yeah, thanks for spending time to talk about this topic here on your blog.

There is no need to prep them, just pop open a bag of baby carrots and enjoy the sweet crunchy taste. Whey protein powder is not only great for weight loss but it will help keep you satisfied until your next meal or snack.