Now that Google have bought Gizmo it’s my guess that there will be a whole lot of work commencing at Goozimo (Google+Gizmo) around peer-to-peer SIP. The reason is that Goozimo will want to compete head on with Skype to become the communications solution of the masses and the only way to effectively compete is to switch from a centralised SIP model to a hybrid peer-to-peer SIP model.
Why? A centralised SIP model starts to get very expensive and very cumbersome when you have to proxy media. VoIP providers can just get away with it now when voice is the media they carry but what happens once that shifts to video and then high definition video and then multi-party, high definition video and … well you get the idea, the media payload sizes are just going to keep on growing. Even a company the size of Google that probably has per Mbps bandwidth costs in the sub USD1 range will struggle to absorb that level of traffic. And apart from the cost it’s just not a good architectural solution to take something which is inherently peer-to-peer, two end users talking to each other, and turn it into a peer-to-server-to-peer. VoIP providers at the moment don’t worry too much about it because unlike Skype the majority of their traffic is from end users to their PSTN gateways and the percentage of end user to end user calls is negligible. Like the media payload sizes that to will change in the coming years.
The problem for Goozimo is that SIP is poorly designed for peer-to-peer communications. I’ve harped on about this before but it’s always worth reiterating that the SIP designers must have been out on the booze the night before they were due to draw up the methods for dealing with NAT because they just left it out completely, a cardinal sin for any internet protocol. The only reason SIP has prospered with such a massive deficiency is that the only real competing protocol H.323 was designed by PSTN engineers and is even worse.
What can Goozimo do about it? That’s the question. Ideally they’d like to throw SIP away and do things properly and use a proper internet protocol such as XMPP. However that’s not really an option because of the size of the deployed SIP user base coupled with the manufacturing momentum behind it. It’s the same reason Google needed Gizmo rather than just expanding their GTalk service, the Google engineers know XMPP not SIP. My bet is that Goozimo will add to the plethora of SIP “enhancement” standards which must already be nearing triple figures and add a set of features that will make it easier to deal with NAT.
Will peer-to-peer SIP work? Yes, it has to. It’s either that or start from scratch with an alternative protocol and we’re already too far gone for that to happen: IPv4 and IPv6 being a case in point.
How will it work? The saving grace in the whole mess is that the solution probably isn’t that difficult. SIP developers have now had a lot of experience as to how to deal with mangling private IP addresses in the SDP payloads and can let each end of the SIP call know the socket the media should be sent to. The missing piece is the NAT in front of each SIP user agent allowing the media through. Most NATs will only permit an incoming packet through if there has already been an outgoing packet sent to the originating socket. That can be a problem when both ends of a SIP call are behind NAT and the port on one or both of the user agents has been re-mapped by NAT. Essentially that’s the problem that needs to be solved for peer-to-peer SIP to start working. There are different ways it could be done but it’s not so much the technical solution used as to a company the size of Google getting behind it and encouraging manufacturers to roll it out.
Interestingly a lot of NATs do not re-map ports by default, at least not until there is a conflict and they are forced to, and therefore SIP P2P calls are already quite feasible. The reason they are not more widespread comes back to the motivation of the commercial SIP Providers which is to get billable calls into their PSTN gateways. Supporting P2P SIP calls is not going to generate any revenue for them.
The sipsorcery service on the other hand is very interested in P2P SIP calls since being a free service it cannot afford the extra cost of proxying media. In fact every single calls that has ever been placed through the mysipswitch/sipsorcery services has been a P2P one. Generally the calls are between an end user and a SIP Provider but as far as the sipsorcery server is concerned the SIP Provider is just another end point and it treats the call exactly the same as if it was between two end users. A sipsorcery call between two end users will still have an issue if the SDP ports have been re-mapped by the NAT at either end but in practice that seems to be a small percentage of calls.
What does a P2P SIP call mean for an end user? The answer is not much for voice, it’s still better to stick those through a 3rd party provider to take advantage of the NAT handling in their gateway, but for video it’s a different story. Most people I know that use video calls use Skype and all report the video often drops or is choppy. The reason is that Skype’s P2P overlay network still relies on super nodes, which are just other Skype users with good bandwidth in close network vicinity, to proxy the media. When the Skype network gets busy there will be increased contention on the overlay network and the media will suffer. The ideal situation is for the media to travel directly between the two end user agents and not to be proxied by anyone. In the SIP network that provides the added advantage that the end user agents can find the best matching media capability between them rather than is currently the case where it’s the best matching capability between the two end user agents and the SIP Providers server.
As a practical example I have tested video calls with Counterpath’s Bria softphone through the sipsorcery.com service and it works very well. The video capability in the Bria’s is better than Skype and while there can still be chop and break on the video at least now it’s down only to the internet connections at either end of the call rather than also the ones of the Skype supernodes.
There is one trick to getting the Bria’s to work with sipsorcery and that is to ensure the call is made as a video call initially and not a voice call followed by an attempt to start a video one. In the latter case the re-INVITEs can end up with the wrong IP addresses as the sipsorcery server does not mangle in dialogue requests. If the call is placed as a video one straight away the sipsorcery server will mangle the initial SDP and the RTP carrying the video has the best chance of getting established. The diagram below shows the “Video Call” button that appears when the Bria is switched to video mode and it is the one that should be used to place calls through sipsorcery.