sipsorcery's blog

Occassional posts about VoIP, SIP, WebRTC and Bitcoin.

sipsorcery.com response times SIP Sorcery Last 3 Hours
daily weekly
sipsorcery.com status

SIP and Audio

I’ve created a short guide on how SIP manages audio streams and the sorts of things that go wrong when those streams traverse NATs. The full guide can be read at SIP and Audio Guide.

To complement the guide I’ve whipped together a diagnostics tool.

SIPSorcery RTP Diagnostics Tool

In an attempt to help people diagnose RTP audio issues I have created a new tool that provides some simple diagnostic messages about receiving and transmitting RTP packets from a SIP device. The purpose of the tool is twofold:

  1. On a SIP call indicate the expected socket the RTP packets were expected from and the actual socket they came from,
  2. On a SIP call indicate whether it was possible to transmit RTP packets to the same socket the SIP caller was sending from.

To use the tool take the following steps:

  1. Open http://diags.sipsorcery.com in a browser and click the Go button. Note that the web page uses web sockets which are only supported in the latest web browsers, I’ve tested it in Chrome 16, Firefox 9.0.1, Internet Explorer 9,
  2. A message will be displayed that contains a SIP address to call. Type that into your softphone or set up a SIPSorcery dialplan rule to call it,
  3. If the tool receives a call on the SIP address it will display information about how it received and sent RTP packets.

The tool is very rudimentary at this point but if it proves useful I will be likely to expend more effort to polish and enhance it. If you do have any feedback or feature requests please do send me an email at aaron@sipsorcery.com.


SIP Password Security – How much is yours worth?

SIP uses a cryptographic algorithm called MD5 for authentication however MD5 was invented in 1991 and since that time a number of flaws have been exposed in it. The US Computer Emergency Readiness Team (US-CERT) issued a vulnerability notice in 2008 that included the quote below.

Do not use the MD5 algorithm
Software developers, Certification Authorities, website owners, and users should avoid using the MD5 algorithm in any capacity. As previous research has demonstrated, it should be considered cryptographically broken and unsuitable for further use.

Does that mean SIP’s authentication mechanism is vulnerable? While not necessarily so, at least in relation to the MD5 flaws, the real answer is it depends on how much your password is worth to an attacker? For example if your SIP password only uses alphabetic characters and is 7 characters or less in length it can be brute forced for less than $1!

Read the full article here.


Redirect Processing

Recently I had a request from a user on how to get redirect processing working with their SIPSorcery dial plan. Redirects are when the destination of a call responds with a specific type of response, appropriately called a redirect response, that indicates that the called destination is unavailable and that the caller should instead try and call an alternative destination. The alternative destination is specified in one of the SIP headers on the redirect response. The most common scenario for redirects is the do not disturb (DND) or call forward buttons on IP phones. IP phones typically allow a user to press the DND button, enter a call forwarding destination and then have the phone redirect all that calls to that destination.

Redirect responses are potentially very dangerous if appropriate precautions are not taken. The reason being that the alternative destination specified in the response could be anything including a premium rate number in some far away country. For that reason redirects are disabled by default within SIPSorcery dial plans and it is only if they are explicitly enabled using a dial string option that they will be acted on.

After a redirect response is accepted the next question is how to process it? A well as blocking undesirable numbers the alternative destination could be a safe PSTN number and the caller will want to make a decision about which of their providers to use for the call. This presented a bit of a quandary for a while as a dial plan would potentially need to contain the same call processing logic in multiple locations, once in the main dial plan and then everywhere a redirect response was being accepted. The solution to that problem was to allow a new second instance of a dial plan to be executed for a redirect response. However while that approach provided for the most flexibility it was also a bit complicated so a simpler approach that did allow the redirect response to be processed within the same dial plan instance was also implemented. The two different approaches are outlined below.

Approach 1 – Inline redirect processing

This is the simplest of the two approaches and allows redirect responses to be processed inline within the currently executing dial plan. It also does not require a redirect option to be set in a dial string since specific dial plan logic is required to employ it. An example of this approach is shown below.

if sys.Dial("myaccount@sipsorcery").to_s == "Redirect" then
  sys.Log("Redirect was requested to #{sys.RedirectURI.ToString()}.")
  sys.Dial("#{sys.RedirectURI.User}@someprovider")
end

In the example the key point is that the sys.Dial method will return a result of “Redirect” if one of the call legs within it receives a redirect response. At the same time the alternative destination will be set in sys.RedirectURI (which is a SIP URI object the same as req.URI).

Approach 2 – New dial plan instance redirect processing

The second approach causes a new instance of the current dial plan to be executed for the redirect destination. Some additional variables are set in the new dial plan execution which are sys.Redirect, sys.RedirectURI and sys.RedirectResponse. The sys.Redirect property is a boolean that gets set to true for a dial plan instance initiated by a redirect, sys.RedirectURI property holds the alternative destination set in the redirect response and sys.RedirectResponse holds the full response.

if sys.Out
  sys.Log("Out call")
  sys.Dial("someuser@local[rm=n]")
elsif sys.Redirect
  sys.Log("Redirect call")
  case sys.RedirectURI.User
    when / ^ 300$ / then sys.Dial("#{sys.RedirectURI.User}@someprovider")
    else sys.Log("Sorry, redirect destination not permitted")
end
else
  sys.Log("In call")
end

In the example above the dial plan has separate logic for In, Out and Redirect calls. The rm=n dial string option translates as redirect mode should be processed with a new dial plan.


Demo SIP to XMPP application

I’ve been half heartedly seeing if any softphone makers are interested in supporting the pseudo ICE mechanism that GTalk uses when setting up the media on an XMPP call. It doesn’t look promising at this stage. Google did make a change around the start of this month to their XMPP call set up mechanism, which broke the ability for Asterisk 1.8 to place outgoing calls through GTalk, so maybe they are working on the service and will have full Jingle support soon which in theory would allow ICE compatible phones such as Counterpath’s softphone range and others to be able to place SIP calls through sipsorcery and have them terminated via GTalk/Google Voice’s XMPP/Jingle service. That would be neat as it would be a validation of SIP and XMPP signalling working together and interconnecting two technologies which both support a large number of users.

However Jingle for GTalk isn’t here yet and in the interests of encouraging any developers that are involved with writing softphones to look at supporting the Google STUN requests on the RTP sockets I’ve created a prototype application that shows how to do it. The application is written in C# and hosted on codeplex here. What the application does is listen on a socket for a SIP INVITE request and when it gets one translates it to an XMPP request which it sends off to GTalk. As well as handling the SIP and XMPP signalling the prototype application also fires up two media sockets, one that talks to the Google XMPP end and one that talks to the SIP phone end. The media sockets are needed so that the STUN requests and responses required by the Google XMPP end can be handled correctly and that’s the bit that’s missing from the softphones.


Google Voice Call Diagram

There’s been a few people asking around the place for a diagram that shows how the SIP Sorcery GoogleVoiceCall dialplan application works. Below is one I whipped up using the excellent Tech-Invite examples as inspiration.

The full sized diagram is available on the SIP Sorcery Codeplex site.

GoogleVoiceCall Diagram


Free hosted Asterisk service

This morning I stumbled across a hosted Asterisk service that’s being offered for free by a company called Aretta. To sign-up for the free service called, NetPBX Free, you need to visit this link https://www.aretta.com/free/. The free service doesn’t seem to be advertised on the main Aretta site so it looks like they are using word of mouth to let people know about they offering.

As far as using the service goes the account set up was painless and I was up and running with a dedicated Asterisk instance in about 5 minutes. However after that things weren’t quite so easy. The NetPBX Free admin interface is fairly comprehensive and also includes a FreePBX install, which is a web management portal that sits on top of Asterisk. Anyone not very familiar with Asterisk is going to really struggle as the configuration options are bamboozling and the FreePBX interface means there are two ways to get most things done. That’s not a deficiency of the NetPBX product that’s just how Asterisk is. Anyone who thinks sipsorcery is difficult to get a grasp on would be in for a rude awakening with Asterisk. Even the basics of setting up a SIP account and configuring the dialplan to allow inbound and outbound calls can be a tricky exercise.

The main problems I had with the Aretta NetPBX product was the lack of visibility to the Asterisk console, which is absolutely essential with such a complex piece of software, and likewise the lack of visibility to the configuration files. There is a Logging tab available that dumps the Asterisk log file to a web page but it seems to be using a cron job or something as it only refreshes every 10 or 15 minutes. As far as the system configuration files go I wanted to set up an extension to allow my default SIP account to dial into music on hold. After setting up the dialplan I dialled into the extension but the call failed with a declined error response. The likely cause of the failure was a problem with the music on hold set up but it’s very hard to diagnose that sort of thing with no console and no access to the musiconhold.conf file. The FreePBX interface does have a music on hold section and after fiddling around in there for a while and doing a few restarts I was able to get it kind of working but it took a lot longer than it would have normally.

To be fair to Aretta it’s a free service and SSH access is available on the paid versions. The NetPBX product may be good for someone already familiar with Asterisk and with lots of time and patience on their hands to play around with but I wouldn’t recommend it for someone wanting to learn Asterisk. A better option for that would be to use one of Voxilla’s EC2 AMI’s and fire up an instance. I’ve used the Voxilla images a few times to check something on Asterisk or when I wanted to do some tests with a media server and sipsorcery.


The bullet proof solution to one way audio: buy a new router

Continuing on from my rant on P2PSIP I have been doing a bit more reading of the various NAT related standards and solutions.

The most popular NAT traversal mechanism I have come across in my time in working with SIP is STUN. I thought I knew what STUN was, Simple Traversal of User Datagram Protocol (47 pages), and have even implemented a rudimentary server for that protocol as part of the SIPSorcery project. However during my reading and much to my surprise I came across a different STUN protocol, Session Traversal Utilities for NAT (51 pages) which obsoleted the previous STUN “proposed standard” (most IETF proposed standards never get to the official standard stage). To my mind it’s crazy to introduce a new standard with exactly the same acronym as an earlier standard, even adding a 2 giving STUN2 would at least gives implementors and users a way to differentiate between the two as it is there is going to be a lot of confusion. I had a good understanding of the purpose of the original STUN standard: to allow an application to determine its public IP address and an indication of the behaviour of the NAT it was operating behind. The purpose of the new STUN standard, which I will refer to as STUN2, is a lot less clear; from a quick reading it has now added NAT keep-alives, connection checking and also now includes TCP NAT handling mechanisms. My initial thought when I saw the new STUN2 standard was, great maybe a more robust solution has been found to cope with situations where STUN fails. Unfortunately that’s not the case and instead it seems to me that there are no real enhancements in STUN2 and it just handles a few more esoteric edge cases that a more of a solution looking for a problem.

The standard that is touted as the silver bullet for NAT and SIP – and that should therefore solve all one way audio issues (aside from codec incompatibility ones) – is the “expired proposed standard” Interactive Connectivity Establishment (ICE) (119 pages). ICE makes use of STUN2 and yet another proposed standard Traversal Using Relays around NAT (TURN) (81 pages). In simple terms ICE states that an attempt will be made to establish a media connection using STUN2 and if that fails the media will be proxied via a TURN server. As I’ve blogged previously proxying media is a BAD BAD solution; it limits the features that can be used on a session to the lowest common denominator between the user-agents and proxy server rather than just the user-agents; it introduces latency and quality degradation into the media path; it introduces security concerns and the list goes on and it’s worth noting again that as video begins to replace voice those factors will be exacerbated. The classic example of this is Skype, they have arguably the best VoIP widely deployed protocol on the internet and they also use an ICE equivalent mechanism to deal with NAT. When a direct connection cannot be established between two Skype callers the media will be proxied through a super-node, that works well for voice but with video things are not so rosy. Anecdotal evidence (admittedly from a very small sample set of the people I know using Skype) has shown that Skype video calls almost always break or chop up after 5, 10 or 15 minutes.

The crazy thing about the whole situation with the burgeoning explosion of standards to deal with NAT for SIP are all the result of one very very big design failure that being the lack of IPv4 addresses. Of course VoIP and SIP are not the only protocols that have to deal with NAT, FTP is another protocol that has real problems and in fact there are very few application protocols that are not impacted by NAT in some way. The sequence of events has been: IPv4, with its huge design flaw, was adopted as the standard network protocol on the internet; to overcome the shortage of IPv4 addresses NAT was adopted so that ISPs could continue to sell internet connections; new application protocols (such as SIP) failed to fully accommodate NAT and were not able to work robustly with NATs; more application protocols (such as STUN) were introduced to help other application protocols overcome their NAT handling deficiencies; yet more application protocols (STUN2) were introduced to fix the earlier application protocols that failed to help the other application protocols handle their NAT handling deficiencies. It’s like an inverted pyramid with the IPv4 design flaw at the bottom and the size and effort for solutions to NAT and application protocol flaws growing wider and wider. It could just be me but everytime I think about it or see people on VoIP forums getting frustrated with one-way audio issues I struggle to comprehend how the situation has been allowed to reach this point. Sure IPv6 is a massive expense in time and effort to implement but it could actually be the silver bullet in this case.

Back to reality. After shaking my head at the thought of implementing more proposed standards in sipsorcery to solve a few more edge cases but not really help that much I went looking for some empirical data of how bad the one-way audio problem is for SIP. I found a NAT Tester Site that has managed to collate a survey of over 1360 NAT devices. If a SIP user-agent is used in conjunction with a STUN server and the NAT it is behind preserves the port (see the preserves column in the results table) than in theory one-way audio problems will not occur. Out of the 1360 devices there are 173 that are listed as not preserving the port that’s 12.7% of devices. That means ICE, STUN2, TURN and the more standards that are sure to follow are being written and implemented for approximately 1 in 10 devices! Being the lazypragmatic programmer that I am I’m very disinclined to implement new features that are only beneficial to such a small number of users.

My recommended solution to anyone experiencing persistent one-way audio issues is to forget about all the solutions except STUN (that’s STUNv1 – Simple Traversal of User Datagram Protocol). If STUN doesn’t fix the problem for you then throw away your router and get a new one. Routers are cheap enough these days that the time spent stuffing around with trying to work around a crap one is just not worth it. The ideal type of router for VoIP/SIP is a full cone, port preserving one. At all costs avoid symmetric and/or non port preserving ones. Check the survey results for a matching one AND then do a web search for the router model and “one way audio” just to make sure.

If you’re a SIPSorcery user you don’t even need to use STUN with your SIP device. The SIPSorcery application serve will automatically replace private IP addresses in SDP payloads with the IP address that the INVITE request or Ok response is received from. STUN acheives exactly the same outcome but instead of relying on a SIP server the SIP client utilises a STUN server to replace the private IP address BEFORE the INVITE request or response is sent. The SIPSorcery application server also has an additional feature that lets the mangling – mangling is a commonly used term used when doing a string replacement – be controlled from the dialplan. The ma dial string parameter allows the SIPSorcery application server manglig to be turned off which is very useful when the call is between two SIP user-agents on the same private network. That’s something you can’t do with STUN and instead you have to hope the router is clever enough to substitute the private IP addresses back in for the public IP addresses, in my experience consumer routers are not very clever.

[source language=”Ruby”]
sys.Dial("123@onmy.net[ma=false]&456@outside.net")
[/source]


P2PSIP: or how to write a thousand RFC’s and still not solve the problem

As a consequence of being stuck inside on a miserable Dublin December day and being away from my development machine I ended up spending the last few hours looking over some of the proposals that are around for Peer-to-Peer SIP (P2PSIP). My interest is mainly academic, I don’t have any work planned for sipsorcery in the area, but does partly derive from interest in contemplating how Goozimo will go about it in order to compete with Skype. One thing’s for sure they won’t be wasting too much time with the proposed P2PSIP enhancements that are floating around in the IETF space which are taking a bad situation, the existing bloated SIP standard and making it much, much worse!

What’s my problem with the P2PSIP efforts? It’s building a house on foundations of sand. The efforts are relying on a bloated set of SIP standards coupled with relying on hacks, and even hacks to hacks to overcome shortcomings in the original SIP standard in dealing with NAT.

The core SIP standard document is 269 pages long and deals with six request types: ACK, BYE, CANCEL, INVITE, OPTIONS and REGISTER; there are additional standards to deal with what are considered core SIP functionality REFER (aka transfers) (23 pages), INFO (9 pages), NOTIFY (38 pages) and more. Then there are the enhancement standards to fix the things the original SIP standard stuffed up rport (13 pages) and PRACK (14 pages). And only then do the extensions and options, including P2PSIP, come in. The size and complexity of the core SIP standard and the excess of addon SIP standards translates into big problems for implementors. A classic example is the SIP stack in the most popular VoIP server around, Asterisk, it’s taken a massive monolithic 27,000+ line C file to implement and has had some serious difficulties with even the core features; the best example is the 3+ years it took to write the SIP TCP implementation.

The P2PSIP efforts are taking a bad situation, the existing SIP standards and implementation difficulties, and building on top of it to make things worse. It’s not only server implementations like sipsorcery and Asterisk that are implementing SIP stacks but also the hundreds of SIP phones, ATAs and softphone manufacturers. In order to make SIP features work a majority of implementors must put the effort in to implement and test them. As the effort required continues to snowball two things are likely to happen: alternative standards will be developed (Skype’s proprietary protocol is an example); SIP device manufacturers will decide it’s all too hard and restrict themselves to the core standard and/or cherry pick SIP add-on standards thus creating big interoperability problems.

The other problem is that even with the reams of SIP and associated standard documents the NAT problem for even the simplest call scenario has not been solved, a very good and succinct explanation of the problem can be found here. As a consequence a further set of standards has sprung up to help SIP (or more correctly the media streams being initiated by SIP) to cope with NAT: STUN, TURN, and ICE being the most popular ones. The paradox is that in the most prevalent type of SIP calls on the internet today, the call between an end-user and a SIP Provider, the NAT mechanisms are often completely ignored and instead a SIP Provider will reflect the media stream back to the socket the client sends from, not a particularly secure mechanism but a pretty robust one and certainly a far superior one to those offered by STUN, TURN and ICE. So all these NAT standards really do is add to the implementation effort for SIP device manufacturers and while they help in some cases they don’t definitively solve the problem and therefore end up creating more confusion for poor users trying to ascertain why their calls have one way or no audio. A side effect I have observed of the failure of the NAT coping mechanisms is that public forums dealing with SIP services have people suggesting STUN settings for completely unrelated problems such as things like callerid.

The foundations of sand are constituted firstly by an overbloated set of SIP standards that are difficult and error prone to implement and secondly by a set of standards to deal with NAT that are not required in the majority of cases and fail to work in a large number of the remaining cases.

Those are the foundations the P2PSIP efforts are proposing to build on. P2P networks are difficult to design, the biggest problem is how to bootstrap peers into the network. To overcome that problem most P2P designs are actually hybrid P2P networks that rely on a central server for a number of critical functions. Napster is the original example of the hybrid P2P network and the failure of it. Napster facilitated mp3 file sharing, largely illegal sharing as it turned out, between peers in the network and because it relied on a central server to allow peers to join the network the authorities were able to easily shut it down. The networks that followed on from Napster, the likes of Gnutella operated without a central server. Without a central server the peer-to-peer networks are very difficult to shut down which is why these type of networks are still around today and largely immune to authorities.

The P2PSIP documents propose a new type of hybrid P2P network that relies on a central server for boot strapping. The P2P network in P2PSIP is primarily a storage layer utilising a Distributed Hash Table (DHT) approach. The DHT replaces the function of a SIP Registrar and SIP Proxy in a traditional “client-server” SIP network (the inverted commas are because SIP is not a client-server protocol but user agents assume client or server roles for certain operations) and the DHT is used to store the contact details of peers that have joined the network. In theory it’s not a bad idea, SIP registrations are a big burden on traditional SIP networks and offloading them to a peer-to-peer network would seem to have merit. It comes down to a trade-off between the server load for SIP registrations and the complexity of implementing yet another new SIP standard in hundreds of SIP devices. If the P2PSIP proposals were restricted to SIP softphones which have the advantage of operating on flexible general purpose hardware then it would be a debate with merit but the strength of SIP is its universality and unless a new proposal is practical IP phones, ATAs, mobile device softphones etc. then it should not even be considered. Another point is a DHT even the best way to scale the storage SIP location information? A standard called ENUM already exists that utilises DNS, a proven scalable storage service, for location information. SIP user agents already need more sophisticated than normal DNS stacks in order to process SRV records that yet another supplementary SIP standard A DNS RR for specifying the location of services (DNS SRV) relies on and in this case one that is already well supported by existing implementations.

Going back to the original question pondered about how Goozimo will implement a peer-to-peer SIP mechanism in order to compete with Skype my guess is that they won’t go near the current P2PSIP efforts with a barge pole. Scaling server side is something Google are experts at so they’ll handle the SIP registrations using the existing mechanism. How they will deal with media and NAT is the big question. In fact it’s always the question when it comes to SIP and NAT. The paradox in this case is that the solution won’t be found by only considering SIP and NAT: NATs are already deployed everywhere and have to be considered a set fixture; as discussed above SIP is already complicated enough. Instead the media layer, in SIP’s case this is usually RTP, has to get smarter. The solution is not TURN which involves proxying the media or even a Skype like mechanism that uses a more scalable approach with super nodes doing the proxying. The media streams are only getting larger with video and conferences and it’s not scalable to proxy it and also not desireable as every extra hop the media goes through adds latency and potential degradation. The solution is to introduce a mechanism into the media carrying protocols that makes them able to cope with NAT instead of ignoring it. It will may mean the media protocol has to become aware of the signalling protocol, or at least some services offered by it, something which is undesirable from a design point of view with a clean separation of layers between protocols. However if it’s a means to fix the problem where all previous attempts have failed than violating a design principle is worth it. Apart from the time it has taken to write this paragraph I have not put any thought into what such a solution would even look like, perhaps some kind of broker service offered by the signalling layer where the media protocol could send a single rendezvous packet to the signalling server so that the public media socket is known and then can be used in the call request. Perhaps the signalling and media protocols can be multiplexed over a single socket although that would be a big change and I suspect there would be a portion of NATs that would fail to cope properly with a single private socket mapping to multiple public sockets. Fingers crossed the engineers at Goozimo will come up with not just a solution but a good solution and then use Google’s muscle power to prevail it on the industry and solve the abysmal vision of the SIP designers.


NAT, RTP and Audio Problems

The very first thing to note is that SIP was NOT designed to work with NAT. There are subsequent standards, hacks, workaround, kludges etc. to try and make it work but the original SIP designers somehow deemed it beneath them or put it in the too hard basket to bother coming up with a proper solution (there is not one instance of the string “NAT” in the whole SIP RFC).

“So what” you may be saying. Well if you’re bothering to read this it’s probably because you are either having or have had audio problems with your SIP VoIP phone. That’s purely down to this massive oversight by the SIP designers. If you look at Skype on the other hand their proprietary protocol has a much smaller incidence of audio problems. The Skype protocol designers went to great lengths to come up with a pragmatic design (of course it was in their interests since they were aiming to make a profit). They even went so far as to enable their traffic to be tunneled through HTTPS proxies so that calls would have a good chance of working behind a corporate firewalls; no chance of that with SIP even now. The SIP designers in their wisdom didn’t even bother to cope with the average home broadband connection.

If someone was to add up the cost in engineering man hours, user frustration and faulty VoIP calls as a consequence of the SIP standard it would be astronomical. As someone who has run a VoIP company in the past I’d estimate that somewhere between 30 to 50% of all support issues are due to one way audio or other NAT related problems.

Right that’s the rant out of the way now on to explain the technicalities of the problem particularly in relation to the sipsorcery service.

The first thing to look at is how a VoIP SIP call is supposed to work in an ideal scenario (which is the only one the SIP standard bothers to accommodate).

SIP Call - Ideal Scenario

SIP Call - Ideal Scenario

In the above diagram the end user SIP device and the SIP server are both on public IP addresses and everything is fine and dandy. To understand the diagram and subsequent ones the legend is:

  • The grey boxes on either side represent the Session Description Protocol (SDP) payloads that are carried in the SIP INVITE requests and responses,
  • The red circles over the grey boxes highlight the critical information within the SDP which is the IP address and port number that the sending device is going to be using for sending and receiving its RTP,
  • The blue lines represent a SIP transmission,
  • The green line represents an RTP stream,
  • A red line represents an RTP stream that could not be established,
  • Public or Private indicate the type of IP address the server or user agent are using.
  • Also as some of the diagrams used in this post get fairly wide and I haven’t spent the time to work out how to widen the columns in the blog software a larger version of the images is available here.

    In the ideal scenario both ends of the SIP call place a publicly accessible IP socket in their SDP and the device at each end of the call has no issues sending and receiving to and from the other’s socket and all is good.

    About the only time you come across the ideal scenario shown above is for SIP trunks between two VoIP Providers. The average residential and business internet connection uses a NAT and that changes the landscape for a SIP call subtly in appearance but dramatically in effect.

    NAT Scenario Basic

    NAT Scenario Basic

    The key point now is that the SIP Phone on the left is operating on a private IP address and that’s what it has placed in its SDP. The call proceeds the same as in the ideal scenario but when the SIP device at the other end, in this case a SIP Softswitch, attempts to send RTP to the phone it can’t because the SDP contains a private address which is not routable on the public internet.

    This diagram represents the classic one-way audio situation. The person on the IP Phone can’t hear the person on the other end of the call. The person on the server side can hear the person on the IP Phone since the phone is happily sending RTP to the server’s public SDP socket.

    You may want to take a break or grab a coffee at this point. If you thought it was hard understanding things so far it only gets worse!

    For SIP to get used in the real World it obviously had to overcome the NAT problem shown in the previous diagram (I probably shouldn’t say obviously as it doesn’t appear to have been that obvious when SIP was being devised). There are actually a number of different ways that NAT can be overcome with SIP but they fall into two categories:

  • The first category is where SIP devices on private IP addresses attempt to determine their public IP address and then use that in their SDP instead of their private address. STUN is one protocol designed for this purpose. Some devices let the user manually specify the public IP address and there are other mechanisms. It doesn’t matter so much how the SIP device gets its public IP address just that it places it into the SDP when making or answering a call,
  • The second category is where SIP Servers will attempt to cope with clients sending them SDP packets with private IP addresses normally be replacing the private address with the public address the packet came from.
  • The important thing to realise is that neither of these mechanisms is foolproof. And that’s worth repeating: there is no 100% foolproof mechanism that can guarantee a SIP call can cope with NAT. Although most of the time it can. The reason there isn’t a guaranteed mechanism is because of the nature of NATs and more specifically NATs using Port Address Translation (PAT). It’s explained a bit more further on but if a NAT translates the port on the outgoing RTP stream of SIP device then it means the port that was set in the SDP is now wrong and sending RTP to the requested socket will fail and result in one-way audio.

    Lets look at one of the mechanisms from the second category that a SIP Server can use to cope with a call from a private device.

    NAT Handling - Server Mangling

    NAT Handling - Server Mangling

    In this diagram the small unreadable text on the right is explaining how the SIP Server is configured to recognise private IP addresses in the SDP and replace them with the IP address the request was received on. The sipsorcery server does exactly that. The problem is that it’s not a particularly robust mechanism. If for example a SIP Proxy is in between the end device and the SIP Server doing the mangling then the public IP address of the Proxy will be placed into the SDP and the RTP will never reach the end device. Or as occasionally happens a faulty NAT will actually leave the source IP address of the packets it transmits as a private IP address giving the SIP Server the choice between a private SDP address and a private origination address (which are probably the same so no choice really).

    The other more common thing that breaks a SIP Server’s attempt at using the request origination address for RTP is the one alluded to previously, PAT.

    NAT Mangling - Broken by PAT

    NAT Mangling - Broken by PAT

    In the diagram above the SIP Server has correctly detected the phone’s public IP address and has attempted to send its RTP packets there. However because the NAT in front of the phone has performed a port translation on the phone’s RTP stream the NAT has no mapping for the socket the Server is attempting to send to and simply drops the packets. The result is again one-way audio.

    So mangling does help and is better than nothing it doesn’t always work depending on what type of NAT device is in front of your phone. Because the sipsorcery service only deals with SIP, and not RTP, mangling is the ONLY thing it can do. There is no other magic it can do to try and get the RTP streams connected up. The best advice for one-way audio and using sipsorcery is to try and set up your router to NOT do port translations on the range your phone uses for RTP.

    The next mechanism a SIP Server can use to cope with NAT is to forget about packet mangling and reflect back RTP to whichever socket it receives on.

    NAT - RTP Reflection

    NAT - RTP Reflection

    In the above diagram the SIP Server will start off sending RTP to whatever socket is specified in the call request’s SDP irrespective of whether it’s a private address or not. Then as soon as it receives an RTP packet from the other end it will assume that is the socket it should be sending its own RTP to and switch to that. This is the mechanism Asterisk uses when you set nat=yes on a SIP account. It does pose a potential security hole in that an attacker could monitor the SIP traffic and then try and get an RTP packet to the SIP Server before the genuine device and thus hijack the RTP stream. In practice there are easier ways to break into SIP systems so it’s unlikely an attacker would bother with that approach under normal circumstances.

    This reflection mechanism is better than mangling because it gets around any port translation the NAT in front of the end user’s phone may have done. As mentioned above the sipsorcery server cannot use this mechanism since it never sees any RTP however when the SIP Server at the destination end of the call is using this mechanism the sipsorcery server doesn’t need to do any NAT handling anyway. If you’re having one-way audio problems it’s not a bad idea to try and find a SIP Provider that has their servers configured to use the RTP reflection mechanism. It saves you having to fiddle with your router and in practical terms is going to cope with most NATs.

    Generally speaking where one end of the call is a SIP Server on a public IP address one-way audio problems should be resolvable. The cases where they are not usually involve either a faulty NAT (there are more around than you would think) or where the a phone is behind multiple NATs. The latter can occur when ISPs run transparent NATs on their network because they are short of IP addresses or for some other reason. In theory multiple NATs should also be coped with by the RTP Reflection handling but in practice as the number of NATs on the audio path increases above one the risk of audio problems seems to rise exponentially!

    The other common situation with sipsorcery users is where there is no SIP Server in the call and instead the call is between two end user devices. Most people think that this will be a simpler situation and their should be less chances of audio problems but that’s not the case and in fact it’s the opposite. Now instead of having one device on a private IP address their will generally be two.

    NAT - User Agent to User Agent

    NAT - User Agent to User Agent

    The above diagram illustrates a call between two sipsorcery users where each user’s phone is on a private network. In this case both user’s have NATs that are doing port translation and neither of the RTP streams get through so neither user hears anything. More common is that only one of the user’s NATs do port translation so one of them will get audio and the other won’t. In this situation the success of the call depends on both the sipsorcery server being able to mangle the SDP so it contains the public IP address and also that the NATs involved do NOT do port translation. If either of those conditions are not met then one or both the RTP streams and therefore audio streams will fail.

    I do have more diagrams and explanations for NAT scenarios around locally installed versions of a sipsorcery server which are even more complicated since now not even the server is on a public IP address. However I’ll leave them for the next post.

    The best advice I can give to anyone having consistent audio issues on VoIP calls is to google their router model and see if there are any other people having the same issue and if they were able to fix it. If that doesn’t yield anything try and borrow a different router from a friend and see if the audio is better with that. If it is I’d personally replace the router as the long term frustration of audio problems on calls far outweighs $100 or less on a new router.

    Aaron