December 2009

You are currently browsing the monthly archive for December 2009.

Continuing on from my rant on P2PSIP I have been doing a bit more reading of the various NAT related standards and solutions.

The most popular NAT traversal mechanism I have come across in my time in working with SIP is STUN. I thought I knew what STUN was, Simple Traversal of User Datagram Protocol (47 pages), and have even implemented a rudimentary server for that protocol as part of the SIPSorcery project. However during my reading and much to my surprise I came across a different STUN protocol, Session Traversal Utilities for NAT (51 pages) which obsoleted the previous STUN “proposed standard” (most IETF proposed standards never get to the official standard stage). To my mind it’s crazy to introduce a new standard with exactly the same acronym as an earlier standard, even adding a 2 giving STUN2 would at least gives implementors and users a way to differentiate between the two as it is there is going to be a lot of confusion. I had a good understanding of the purpose of the original STUN standard: to allow an application to determine its public IP address and an indication of the behaviour of the NAT it was operating behind. The purpose of the new STUN standard, which I will refer to as STUN2, is a lot less clear; from a quick reading it has now added NAT keep-alives, connection checking and also now includes TCP NAT handling mechanisms. My initial thought when I saw the new STUN2 standard was, great maybe a more robust solution has been found to cope with situations where STUN fails. Unfortunately that’s not the case and instead it seems to me that there are no real enhancements in STUN2 and it just handles a few more esoteric edge cases that a more of a solution looking for a problem.

The standard that is touted as the silver bullet for NAT and SIP – and that should therefore solve all one way audio issues (aside from codec incompatibility ones) – is the “expired proposed standard” Interactive Connectivity Establishment (ICE) (119 pages). ICE makes use of STUN2 and yet another proposed standard Traversal Using Relays around NAT (TURN) (81 pages). In simple terms ICE states that an attempt will be made to establish a media connection using STUN2 and if that fails the media will be proxied via a TURN server. As I’ve blogged previously proxying media is a BAD BAD solution; it limits the features that can be used on a session to the lowest common denominator between the user-agents and proxy server rather than just the user-agents; it introduces latency and quality degradation into the media path; it introduces security concerns and the list goes on and it’s worth noting again that as video begins to replace voice those factors will be exacerbated. The classic example of this is Skype, they have arguably the best VoIP widely deployed protocol on the internet and they also use an ICE equivalent mechanism to deal with NAT. When a direct connection cannot be established between two Skype callers the media will be proxied through a super-node, that works well for voice but with video things are not so rosy. Anecdotal evidence (admittedly from a very small sample set of the people I know using Skype) has shown that Skype video calls almost always break or chop up after 5, 10 or 15 minutes.

The crazy thing about the whole situation with the burgeoning explosion of standards to deal with NAT for SIP are all the result of one very very big design failure that being the lack of IPv4 addresses. Of course VoIP and SIP are not the only protocols that have to deal with NAT, FTP is another protocol that has real problems and in fact there are very few application protocols that are not impacted by NAT in some way. The sequence of events has been: IPv4, with its huge design flaw, was adopted as the standard network protocol on the internet; to overcome the shortage of IPv4 addresses NAT was adopted so that ISPs could continue to sell internet connections; new application protocols (such as SIP) failed to fully accommodate NAT and were not able to work robustly with NATs; more application protocols (such as STUN) were introduced to help other application protocols overcome their NAT handling deficiencies; yet more application protocols (STUN2) were introduced to fix the earlier application protocols that failed to help the other application protocols handle their NAT handling deficiencies. It’s like an inverted pyramid with the IPv4 design flaw at the bottom and the size and effort for solutions to NAT and application protocol flaws growing wider and wider. It could just be me but everytime I think about it or see people on VoIP forums getting frustrated with one-way audio issues I struggle to comprehend how the situation has been allowed to reach this point. Sure IPv6 is a massive expense in time and effort to implement but it could actually be the silver bullet in this case.

Back to reality. After shaking my head at the thought of implementing more proposed standards in sipsorcery to solve a few more edge cases but not really help that much I went looking for some empirical data of how bad the one-way audio problem is for SIP. I found a NAT Tester Site that has managed to collate a survey of over 1360 NAT devices. If a SIP user-agent is used in conjunction with a STUN server and the NAT it is behind preserves the port (see the preserves column in the results table) than in theory one-way audio problems will not occur. Out of the 1360 devices there are 173 that are listed as not preserving the port that’s 12.7% of devices. That means ICE, STUN2, TURN and the more standards that are sure to follow are being written and implemented for approximately 1 in 10 devices! Being the lazypragmatic programmer that I am I’m very disinclined to implement new features that are only beneficial to such a small number of users.

My recommended solution to anyone experiencing persistent one-way audio issues is to forget about all the solutions except STUN (that’s STUNv1 – Simple Traversal of User Datagram Protocol). If STUN doesn’t fix the problem for you then throw away your router and get a new one. Routers are cheap enough these days that the time spent stuffing around with trying to work around a crap one is just not worth it. The ideal type of router for VoIP/SIP is a full cone, port preserving one. At all costs avoid symmetric and/or non port preserving ones. Check the survey results for a matching one AND then do a web search for the router model and “one way audio” just to make sure.

If you’re a SIPSorcery user you don’t even need to use STUN with your SIP device. The SIPSorcery application serve will automatically replace private IP addresses in SDP payloads with the IP address that the INVITE request or Ok response is received from. STUN acheives exactly the same outcome but instead of relying on a SIP server the SIP client utilises a STUN server to replace the private IP address BEFORE the INVITE request or response is sent. The SIPSorcery application server also has an additional feature that lets the mangling – mangling is a commonly used term used when doing a string replacement – be controlled from the dialplan. The ma dial string parameter allows the SIPSorcery application server manglig to be turned off which is very useful when the call is between two SIP user-agents on the same private network. That’s something you can’t do with STUN and instead you have to hope the router is clever enough to substitute the private IP addresses back in for the public IP addresses, in my experience consumer routers are not very clever.


A “what the heck are you doing it like that for?!!” post popped up on the Forums today and I thought that given I end up answering the question every few months I’d copy it here.

You are correct that your observations have been made by others quite a few times in the past 6 months. I’d also agree that the user experience offered by sipsorcery is pretty poor; the help documentation is next to nothing; the user interface is far from universal and I’m sure contravenes most good usability principles; the list could go on…

Most people have come to associate open source software with being free and generally of a reasonable quality. What is normally overlooked is that successful open source applications – for the sake of argument I’ll classify successful as a project you have had the inclination to use – generally gain a bit of momentum as they grow and pick up a few extra hands to help with the programming, documentation, web site etc. To date that hasn’t happened with sipsorcery/mysipswitch and that’s in part why there are a large number of shortcomings.

At the moment the sipsorcery project consists of:

    1. Me writing the software in my spare time,
    2. The and associated developments servers hosted on Amazon’s EC2 and Microsoft’s Azure platforms for a cost of around USD500/month,
    3. Me administering the service in my spare time. A large portion of which time goes into shutting down fraudulent users attempting to exploit SIP Providers,
    4. Packaging up what is really architected as a centralised server application into a local install for those people with a high enough pain threshold to attempt to run the software on their own machines.

(Another developer, Guillaume, was previously able to help in the mysipswitch days but his work is now too busy)

Luckily I highly enjoy all those activities and get a kick out of keeping the whole thing working.

My priorities are:

    1. To make the service highly reliable. A short sentence with a very scary amount of work involved (it’s now been over two years since mysipswitch went from a pretty solid single process application to a multi-process, multi-server application with some very difficult stability and scalability problems to solve),
    2. Provide a REST API into to make it easier for anyone so inclined to come up with an alternative user interface,
    3. Expand the Siverlight interface to become a real-time call switchboard.

In answer to you main question “…what the heck are you folks doing…”, which is actually “…what the heck are you, Aaron, doing…”, the answer is whatever most interests me. It interests me to be able to write and run a highly reliable SIP platform. Unfortunately, and I’m not being facetious, it doesn’t interest me to write javascript/HTML/AJAX based interfaces, I spent a decade doing that and just got completely sick of it.

You or anyone else are more than welcome to contribute to the project in any way you can or try and encourage another programmer to develop the features you want. I’m not going to develop every feature ever requested but if someone else develops it and it’s useful I’ll happily host it on

I’ve spent most of this week incorporating rrdtool into the software used to monitor the sipsorcery server. I have managed to get it working successfully and the new Status page currently displays the latest graphs of the SIP OPTIONS responses to the sipsorcery and the two sipwizard servers, the latter are the two test servers being used to test out redundant operation with an SQL Azure database service.

The green line on each of the graphs represents the response time in milliseconds to the OPTIONS request sent from a monitoring server in Dublin, Ireland to the SIP Proxy on each of the monitored server. The red line is used to show any instances a server is detected as being down which occurs if 3 successive responses to an OPTIONS request occur, the requests are sent at 10 second intervals.

The plan is to expand what is being monitored to include the SIP Registrars, SIP App Servers (they are what process dialplan scripts), Databases and also statistics like number of dialplans processed, number of registrar bindings etc. Monitoring of the Proxy was done first at it the most important since it is the gateway to all the other services and if it fails no other services are accessible. It’s also a good indication as to the health of the server.

As the VoIP (which these days means something like 50% SIP, 40% Skype, 5% IAX and 5% Other) provider service industry has matured over the last 5 years the providers that have manged to survive have come to the realisation that a business based on transitting voice, which is the foundation of the telecoms industry, is actually a tough business to be in. Without the advantage of owning the single wire that runs into the customer premisis VoIP providers are competing not just on a global stage but also with a product that is rapidly converging towards a cost base whereby big players can offer it for free. Google Voice is the classic example with a service that currently offers free calls to the US and Canada and undoubtedly more destinations to follow as the service picks up steam. It makes it pretty tough for other North American based VoIP providers to compete with…

What the surviving organic VoIP Providers have realised is that the most attractive segment of the market is business customers, not because they spend more on calls but because they are more likely to be interested in extra features like a hosted PBX or an IVR. Residential customers are more interested in cheap/free calls with no bells and whistles and that results in razor thin margins. At the moment smaller VoIP Providers that have their own number ranges have the advantage that in most countries porting numbers is still onerous however as regulators and technology improves number porting inertia will quickly dissolve as customer retention mechanism, which is of course a good thing.

Ultimately voice services will come to resemble email services. Traditional telcos and ISPs will bundle a basic service into their broadband products, web portal companies such as Yahoo, Google, Microsoft et al. will also offer a basic voice service that will integrate with their other offerings paid for by eyeball ownership when people check their voicemails etc. Dedicated VoIP Providers will continue to exist but will be thinned out to those offering specialist services to power users and business customers who will compete with the less nimble traditional telcos who will always be a couple of steps behind snapping at their heels.

The other thing that will happen is that a voice service won’t actually be a product at all instead it will evolve into a personalised media service starting with video which is already available on Skype, the eyeball portals via their IM networks and the more advanced SIP providers. Eventually it will reach a point where each person has multiple streams under their control and where at least one will be permanently connected. Personal streams will replace broadband connections, 99% of the population aren’t interested in IP addresses and routers, what they are interested in is being able to control the media on their TV, IP Phone, computer display etc. whether that media happens to be an interaction with another person, watching a movie, playing a game, attending a business meeting etc. is what people are interested in. Where the successor to the SIP protocol comes in will to be handle the signalling that makes switching the content of people’s streams seamless, the mechanism to place a call to talk to someone will be the same as a call to watch the latest movie rather than all the different controls and applications that currently exist.

That’s the future but how will it call come about? In the near future writing streaming media applications will become the same as writing a web application. Once that happens there will be an explosion of new voice/media applications, beyond click-to-call and video blogs, and VoIP Providers will be assimilated into software consultancies or vice-versa since they will be the same thing, instead of “web apps” we will have “web streams” undoubtedly coined as “Web S.0” or something equally geeky. In order for streaming media applications to reach the same level of ubiquity as static web applications new application servers are needed. The likes of FreeSWITCH, Asterisk, Wowza and Voxeo are leading the way – the likes of Sun, IBM and Microsoft also use all the SIP buzzwords in their niche products – but at the moment they and similar products require a higher level of expertise than the average web developer possess and more importantly they are not suitable for web hosting providers to deploy in their farms. Once the latter problem is solved the former will closely follow and when it does internet applications will break out of their browsers and expand to include, IP Phones, fax machines, the PSTN, mobiles and any other digital device or analogue device that is worthwhile enough for someone to have produced an Analog-to-Digital converter for.

While these streaming web application servers are gestating a bunch of specialised but limited services have sprung up to attempt to fill the void.

And there are undoubtedly more similar services around and I’m always interested to hear about them if anyone knows of any.

All of the services listed are limited in the types of web streaming applications that can be developed due to the tight integration between the development environment, signalling platform and media gateway. In addition the business model employed in a number of cases is too restrictive, for example forcing applications to adopt a per minute charge for users severely restricts the appeal to developers and in turn the users they are developing for who are all used to the much more flexible web application models: freemium, content, subscription etc.

The experience from mysipswitch/sipsorcery which due to the flexibility of Ruby dialplans are a type of streaming media application server has demonstrated that the key to such applications is to separate the signalling from the media which surprisingly is something none of the above services do, well it’s not so suprising given the business models employed, if you’re charging applications by the minute it’s the media you’re billing for not intelligent signalling. There are two huge advantages to a streaming application platform that has separated the signalling and media.

  • Media capabilities are limited by end-user devices rather than the application server. Softphones, IP Phones and smartphones such as the iPhone advance at a rapid rate and will invariably introduce media related features that are not supported by an application server. The signalling layer tends to be more stable, there are only so many ways to initiate, transfer and hangup a call.
  • Advanced media service providers can be cherry picked. Different service providers offer specialist services: text-to-speech, face recognition, speech transcription etc; and an application developer would benefit enormously from being able to use different services in their application rather than being constrained to the offerings or lack thereof from a single service provider.

All in all it’s an exciting time to watch the evolution of the streaming web.