August 2009

You are currently browsing the monthly archive for August 2009.

At times it seems like the mysipswitch/sipsorcery effort becomes overwhelming. Certainly the challenges are far above what I encounter in my 9-to-5 programmer’s job. I guess that’s part of the allure as well; all programmers like a challenge. The biggest of them at the moment are stability and load. The memory leak problem has been overcome but the “long running dialplan” is still around and a new problem with the Amazon EC2 instance deciding to drop offline cropped up twice last week. Another one has been with the incorporation of the DbLinq library to enable database agnosticism – the users of local versions particularly wanted this feature – has resulted in higher than acceptable loads particularly on database updates. I’ve got ideas on how to solve all these problems and hope to do so over the coming weeks. It’s tough at times to get motivated when the project becomes just an engineering effort week after week (and yo-yo comments on the forums site don’t help either even if they do have an inkling of merit).

To get a fresh injection of enthusiasm I’ve grabbed one of the applications that I’ve been wanting to re-implement for a long time. In this case it’s one that poked it’s head up right at the start of mysipswitch but then disappeared shortly afterwards due to the AJAX interface proving too unwieldy. The feature was the virtual switchboard that allowed calls to be managed in realtime through a browser, kind of like being able to make the dialplan up on the spot rather than having to program it in advance.

One of the big things with a Silverlight interface is that the difficulties encountered with providing a switchboard in AJAX can be overcome. Not only that but because it’s possible to run a SIP stack and IronRuby engine within Silverlight it’s possible to go way beyond what’s possible with AJAX and javascript.

For the last few days I took a break from engineering and commenced work on the sipsorcery switchboard. Most of the work has been to get the SIP stack working properly in Silverlight, it required a bit of massaging as Silverlight does not run the full .Net runtime. That’s done now and today I was able to use a Silverlight based swtichboard to receive a call from my Cisco 7960 and then click a button to forward it on to my Polycom IP500. On the surface that’s nothing spectacular and I could have dialled between the two phones directly but it is interesting if you consider that the call was manipulated by using SIP requests to and from a web browser.


The above interface does a poor job of showing what’s going on but it’s better than nothing. The sequence of events was:

  • Log on the sipsorcery server through the Silverlight GUI,
  • Click the switchboard link which causes the Silverlight client to register with the sipsorcery server by sending a SIP REGISTER request over a TCP connection
  • Dial an extension from my Cisco phone which is connected to the same sipsorcery server and which gets forwarded to the Silverlight GUI,
  • The Silverlight GUI receives the call and extracts the From header and writes it to a text box so I can see who is calling. The other information presented such as the name, company and image could be pulled down from a CRM or web service based on the information in the From header,
  • Now that I know who the caller is I can click a button to direct them to the phone I want to take the call on and in this case I clicked the aaronpolycom button to get the call to my Polycom IP Phone,
  • The call arrives at my Polycom and when I answer it I have audio between the Cisco and it.
  • It’s a little bit to wrap your head around and that’s without even considering that instead of a button click just forwarding a call it could run a Ruby script locally within the Silverlight GUI. The first time I was able to run the SIP stack within the Silverlight GUI was a big moment and today was the first time I was able to do something useful with it. It does open the possibility for some very powerful applications.

    The Silverlight UI that has been employed on and which replaces the AJAX UI on has caused some serious gnashing of teeth. The two reasons I have been able to distill for the frustration seem to be no Silverlight plugin for browser xyz or OS xyz, which is a fair point, or secondly a dislike of anything Microsoft and the hassle of downloading another plugin. Both those arguments and lots more about the pros and cons of different browser technologies are prolfigate all over the web so I won’t bore you with my own.

    The purpose of this short post is instead to explain why the AJAX interface was replaced by Silverlight. There are two reasons:

  • I really really dislike javascript/DHTML programming. It’s incredibly frustrating to switch from a sophisticated IDE and compiled code language such as C# (or Java if you’re that way inclined) back to fiddly little HTML tags and a hodge podge of javascript libraries and browser hacks which is otherwise know as AJAX (Asynchronous Javascript and XML). Some programmers thrive on AJAX, I’m not one of them.
  • Silverlight has this massive thing under the hood called the Common Language Runtime (CLR). The CLR is what runs the latest version of software developed on Microsoft’s .Net platform. The Silverlight CLR that runs in a browser is a cut down version of that runs on a desktop but it’s still suprisingly comparable. In contrast to AJAX development I find C# development to be the bee’s knees and makes programming fun rather than like putting hot pokers in my eyeballs. Because of the CLR a Silverlight application can also share code with non-Silverlight applications. In the case of sipsorcery the really big thing is that the SIP stack which drives all the servers can actually run in the browser. What that means is some very cool SIP applications can be developed.
  • In answer to a question about whether the sipsorcery UI could be targetted to the Silverlight 1.0 runtime so that it would run with Moonlight (the Linux port of Silverlight) the answer is unfortunately no. Version 2 of Silverlight is the first one that included the CLR and that’s the whole point of sipsorcery using Silverlight.

    It’s now been 3 weeks since the Isolated Process dial plan processing mechanism was put in place on the sipsorcery service. The news on it is good and while there were a few tweaks required in the first couple of weeks, which were more down to preventing some users initiating 20+ simultaneous executions of their dialplans, in the last week there have been no software updates or restarts required. During that time the sipsorcery application server, which processes the dial plan executions and has been the trouble spot, operated smoothly with no issues.

    As discussed ad-nauseum in the past the root cause of the reliability issue on the services is a memory leak either in the Dynamic Language Runtime (DLR) or in the integration between sipsorcery and the DLR. The solution has been to isolate the processing of the dialplans in separate process and perioidcally recycle those processes.

    I now feel pretty comfortable about the reliability of the sipsorcery application server and am reasonably confident that a solution to the instability issue that has plagued mysipswitch and sipsorcery has been found, at least for sipsorcery. As also mentioned previously the mysipswitch service cannot be easily updated anymore since the code has diverged significantly since it’s last upgrade in November of last year. I would now recommend that people migrate from mysipswitch to sipsorcery for greater reliability. There were two cases where the mysipswitch service needed to be restarted in the last week due to the “Long Running Dialplan” issue and a failed automated restart. On average the mysipswitch does need one restart a week. If the restart happens to coincide with times when I or Guillaume are able to access the server, which is when we are not asleep and in my case at work, it’s fine. If it’s outside those times it can be up to 8 hours.

    Update: Of course no sooner had I posted about stability there was a problem. Approximately 5 hours after posting the above the dial plan processing on the Primary App Server Worker failed with calls receiving the “Long Running Dialplan” log message. The memory utilisation of the App Server was low, around 120MB, and the process was responding normally, if it was not the Call Dispatcher process would have killed and recycled it. The thing that was failing was script executions by the DLR. This provides some new information and it now looks like there are two separate issues with dialplan processing. One is a memory leak when a process continuously executes DLR scripts. The second is a bug in the DLR that causes it to stop processing scripts altogether and possibly the result of an exception/stack overflow in a script. The memory leak issue has been resolved by recycling the App Server Workers when they reach 150MB. An additional mechanism is now needed to recycle the process if script executions fail.

    The version 1.1 release of sipsorcery has been made and can be downloaded from codeplex. While there have been a month’s worth of minor fixes since the v1.0 release the main reason for the v1.1 release is so anyone interested can use the GoogleVoiceCall application in their dialplans.

    One thing that has always been at the top of the list on the mysipswitch and now sipsorcery feature request lists is the ability to initiate a callback from the dial plan by dialling in from a phone. The sequence of events would be:

    1. Set up an incoming number that terminates in a sipsorcery dialplan,
    2. Dial the number from any old phone, for example a work phone,
    3. Once the call is received by the sipsorcery dialplan somehow allow a number to be entered,
    4. Have the sipsorcery dialplan forward to the number entered in step 3.

    Basically it’s a standard calling card gateway system but with the advantage for users of being able to forward the call through their sipsorcery dialplan to take advantage of whatever rules and providers they have set up.

    The challenge for the sipsorcery service is that the feature will always require a media server for step 3. Previously a Blueface Asterisk server was configured for the task and it worked ok but it wasn’t ideal. Blueface need to look after their own customers and not get distracted with sipsorcery.

    Lately I’ve been checking out VXML (Voice XML) providers to see if there is some way to hook up a VXML application to get the number. The advantage of VXML over an Asterisk server is speech recognition (Asterisk can do speech recognition to but it requires some serious hoop jumping). A VXML application will give users the choice of entering the number using speech or using DTMF. I found two companies that offer free or eval VXML sites and was able to progress with both. The two companies were TellMe and Voxeo. I was able to get to a state where I could create a VXML application to get the number from the user. The tricky part has been how to get that number back to sipsorcery. The best mechanism would be a SIP blind transfer which would use a REFER request to indicate the number the user had entered. Unfortunately I couldn’t get any kind if transfers to work with TellMe and Voxeo will only allow Attended Transfers on their evaluation platform. I contacted Voxeo support and explained that a SIP blind transfer means it will be even less work for their server but they have strict rules in place for PSTN blind transfers, which in fairness could cost them a bundle if not policed properly, and they don’t appear able to allow blind transfers only for SIP.

    Voxeo do have an eval version of their Prophecy server which I could set up but I’m keen to avoid hosting a media server so I’m still looking around for a VXML provider that supports SIP blind transfers. If anyone knows of one I’d be very interested to hear about them.


    Update: Spurred on by pagemen’s comment I looked into the character encoding of the POST requests and I WAS making a mistake by not escaping the data fields. The reason I didn’t twig to that previously was that some accounts would work without the need to escape the data. It will come down to whether the unescaped data contains an illegal character sequence such as t. Since correcting that bug I have not seen any 500 errors from sipsorcery.

    Some people are getting a (500) Internal Server Error when attempting to place a Google Voice Call using the sipsorcery dialplan application. One person has passed along their account details so I could take a look and I’m sorry to say that 4 hours later I’m still none the wiser. As far as I can see the failing account is identical to my working account. Mine can place calls and works every time. The identical account can login and retrieve the key but fails every time with the Internal Server Error when placing the call request.

    There are people around with more perserverance than me for this kind of thing (I’m generally happier creating my own bugs than reverse engineering or fixing other people’s, no suprise there) so below is the relevant C# source code that the sipsorcery dialplan application uses to place the call. Running the console application with my own account details works everytime.

    using System;
    using System.IO;
    using System.Net;
    using System.Text;
    using System.Text.RegularExpressions;
    using System.Web;
    namespace GoogleVoiceCall {
        class Program {
            private const string LOGIN_URL = "";
            private const string GOOGLE_VOICE_HOME_URL = "";
            private const string CALL_URL = "";
            private static string m_emailAddress = "your email address";
            private static string m_password = "your password";
            private static string m_gizmoNumber = "your gizmo number"; 
            private static string m_destinationNumber = "your destination number";
            static void Main(string[] args) {
                try {
                    Console.WriteLine("Attempting Google Voice Call");
                    CookieContainer cookies = new CookieContainer();
                    // First send a login request to get the necessary cookies.
                    string loginData = "Email=" + Uri.EscapeDataString(m_emailAddress)
                          + "&Passwd=" + Uri.EscapeDataString(m_password);
                    HttpWebRequest loginRequest = (HttpWebRequest)WebRequest.Create(LOGIN_URL);
                    loginRequest.CookieContainer = cookies;
                    loginRequest.AllowAutoRedirect = true;
                    loginRequest.Method = "POST";
                    loginRequest.ContentType = "application/x-www-form-urlencoded;charset=utf-8";
                    loginRequest.ContentLength = loginData.Length;
                    loginRequest.GetRequestStream().Write(Encoding.UTF8.GetBytes(loginData), 0, loginData.Length);
                    HttpWebResponse loginResponse = (HttpWebResponse)loginRequest.GetResponse();
                    if (loginResponse.StatusCode != HttpStatusCode.OK) {
                        throw new ApplicationException("Login failed.");
                    else {
                        Console.WriteLine("Login request was successful.");
                    // Second send a request to the Google Voice home page to get a string key needed when placing a callback.
                    HttpWebRequest keyRequest = (HttpWebRequest)WebRequest.Create(GOOGLE_VOICE_HOME_URL);
                    keyRequest.CookieContainer = cookies;
                    HttpWebResponse keyResponse = (HttpWebResponse)keyRequest.GetResponse();
                    if (keyResponse.StatusCode != HttpStatusCode.OK) {
                        throw new ApplicationException("_rnr_se key request failed.");
                    else {
                        Console.WriteLine("Key request was successful.");
                    StreamReader reader = new StreamReader(keyResponse.GetResponseStream());
                    string keyResponseHTML = reader.ReadToEnd();
                    Match rnrMatch = Regex.Match(keyResponseHTML, @"name=""_rnr_se"".*?value=""(?<rnrvalue>.*?)""");
                    if (!rnrMatch.Success) {
                        throw new ApplicationException("_rnr_se key was not found on your Google Voice home page.");
                    string rnr = rnrMatch.Result("${rnrvalue}");
                    Console.WriteLine("_rnr_se key=" + rnr);
                    // Thirdly (and lastly) submit the request to initiate the callback.
                    string callData = "outgoingNumber=" + Uri.EscapeDataString(m_destinationNumber) + 
                 "&forwardingNumber=" + Uri.EscapeDataString(m_gizmoNumber) +
                 "&subscriberNumber=undefined&remember=0&_rnr_se=" + Uri.EscapeDataString(rnr);
                    HttpWebRequest callRequest = (HttpWebRequest)WebRequest.Create(CALL_URL);
                    callRequest.CookieContainer = cookies;
                    callRequest.Method = "POST";
                    callRequest.ContentType = "application/x-www-form-urlencoded;charset=utf-8";
                    callRequest.ContentLength = callData.Length;
                    callRequest.GetRequestStream().Write(Encoding.UTF8.GetBytes(callData), 0, callData.Length);
                    HttpWebResponse callResponse = (HttpWebResponse)callRequest.GetResponse();
                    if (callResponse.StatusCode != HttpStatusCode.OK) {
                        Console.WriteLine("Call request failed.");
                    else {
                        Console.WriteLine("Call request was successful.");
                catch (Exception excp) {
                    Console.WriteLine("Exception Main. " + excp.Message);
                finally {
                    Console.WriteLine("finished, press any key to exit...");

    Updated 25 Jan 2010: Adjusted for the new parameters required when using the sys.GoogleVoiceCall method in the sipsorcery dialplan.

    The “hacked” up Google Voice App on sipsorcery appears to have attracted some new people to the site. To help them I’m writing a quick tutorial about how they can quickly get up and running to place a call with the Google Voice app.

    Warning: People new to sipsorcery will probably be fine until they get to the dialplan configuration and will then start cursing in frustration: “why is this thing so *#!@ hard all I want to do is make a telephone call!”. The reason is that sipsorcery and mysipswitch before it were designed for people to be able to experiment with SIP stuff and try weird and wonderful things. The price for that power and flexibility has so far been ease of use. One day the plan is to re-create something like the dialplan wizard to make it simpler but so far it just hasn’t made it to the top of the list.

    Back to the promised tutorial.

    Following are the minimum steps you need to take to be able to place a call from sipsorcery to terminate with Google Voice.

    • 1. Login to the Google Voice site click Settings (in the top left) and then Phones on the main menu. On the Phones screen you must have a Gizmo number configured as shown below (note it doesn’t matter whether it’s ticked or not).

      Goog Voice - Phone

      Goog Voice - Phone

    • 2. The Gizmo number you have used in Google Voice MUST then be configured to forward calls to your sipsorcery account. You can do this by setting a forward at the Gizmo SIP Provider end or by registering the account from the sipsorcery end. The result in both cases is the same and it doesn’t matter which one you use. To register your Gizmo SIP Provider account from sipsorcery you need to create a new SIP Provider entry as shown below (use appropriate values for your account where I’ve blurred my own settings out).

      Important: As part of some updates made to sipsorcery for enhanced redundancy the contact registered or used with the callback SIP Provider MUST be where username is the same as the one used to login to the sipsorcery web site. If it’s not the same then there is around a 50% chance the Google Voice callback WILL NOT be matched up to the waiting SIP call that initiated it on the sipsorcery end.

      SIP Provider Details - Gizmo

      SIP Provider Details - Gizmo

    • 3. Now we’re ready to place a call. To do that you need to click on the Dial Plans menu in your sipsorcery account. I will assume that you are creating a new dialplan and will be overwriting the default one created for you.
      Google Voice - Minimal Dial Plan

      Google Voice - Minimal Dial Plan

      And so you can actually read it:

      sys.Log(&quot;starting dialplan...&quot;)
      sys.GoogleVoiceCall(&quot;;, &quot;password&quot;, &quot;1747612xxxx&quot;, &quot;1132701859&quot;, &quot;.*&quot;, 7)
      sys.Log(&quot;Sorry, Google Voice Call failed.&quot;)

      Here’s what each of the parameters mean:

      • MUST be the email address you use to login to your Google Voice account.
      • password MUST be the password for the email address.
      • 1747612xxxx MUST be your Gizmo a number that has been registered on your Google Voice account and be the same as the one shown showing on your Google Voice Phones page in step 1. This is the number that Google Voice will place the callback on so calls to it must somehow be configured to arrive back to the sipsorcery servers typically this would be by registering a sipsorcery provider binding with the SIP provider that supplies the number.
      • 1132701859 this is the destination number you wish to call and can be ANY US landline number (I don’t know whether mobiles or any others will work).
      • .* this is a regular expression pattern that will be applied to any incoming calls that arrive on your sipsorcery account for 30 seconds after a sys.GoogleVoiceCall method has been used in your dialplan. The pattern is used to decide whether the incoming call is the callback from Google Voice and to bridge it with waiting SIP call. A pattern of .* means the very next incoming call will be matched, it’s the safest option for anyone unsure about regular expressions or confused about what this parameter means.
      • 7 this is the type of phone being used for the callback from Google Voice, The Google Voice web request requires that it be specified. The range of this option seems to be 1 to 7. To date it doesn’t seem to make any difference what number is used EXCEPT that if Gizmo is the callback provider 7 must be used. If it’s not Gizmo the safest bet is to use a value of 1.

    Once you have taken those 4 steps you need to configure your ATA, IP Phone or soft phone to use the sipsorcery SIP account and then place a call. If all goes according to plan you will get the following:

    • A ring tone on your phone almost straight away. This is generated by the sipsorcery server to let you know it’s started working.
    • Somewhere between 1 and 10 30 seconds later the phone will get answered and you will have a brief pause of silence followed by another ring tone. This time the ring tone is being generated by the Google Voice server and indicates the destination number you specified is ringing.

    That’s it, Easy 🙂.

    Caveats. Google obviously didn’t intend for people to be able to hook up their SIP devices to make free calls. I assume free calls via the Callback mechanism on the Google Voice web site pays off by driving web traffic to the site. With a SIP call there is no such pay off. As such the sipsorcery solution and other solutions around the web are hacks. That means they are susceptible to breaking or being blocked it Google get annoyed with the SIP calls. On the mysipswitch Forums there are already reports of the occassional call failing with a HTTP 500 Server Error. Watching the sipsorcery logs I have seen a few calls getting the same. That error maybe because there was something about the call request Google didn’t like, you can test that by using the same values in a Callback from the Google Voice page, or it may be that the HTTP requests that are sent from sipsorcery to Google are occassionally going to get rejected for some reason (the 500 Server Error was caused by a bug in the sipsorcery code). The point is if you want to use this sort of solution on sipsorcery or elsewhere you will probably need to accept that not every call is going to work (that being said none of my own test calls have failed yet).

    Finally if you would like to get a little bit more adventurous with your dialplan and have it send the number you called in on as the destination for your Google Voice call you can use the one shown below.

    Google Voice Call - Advanced Dial Plan

    Google Voice Call - Advanced Dial Plan

    sys.Log(&quot;starting dialplan...&quot;)
    sys.GoogleVoiceCall(&quot;;, &quot;password&quot;, &quot;1747612xxxx&quot;, &quot;#{req.URI.user}&quot;, &quot;.*&quot;, 7)
    sys.Log(&quot;Sorry, Google Voice Call failed.&quot;)

    The first attempt at a new dialplan application that can place free US calls through Google Voice has been released on the sipsorcery application servers. See this post Forum Post for more information.

    The first question that will come up is whether it’s possible to establish the call without having to hangup the originating one. The answer is yes and it should be easy enough to bridge the incoming call from Google Voice with the originating one. I’ll have a look at that over the next few days as time permits.

    Update: The Google Voice app has now been updated to avoid the need to disconnect the call, see this post.



    Following on from yesterday’s post about NAT I have today applied some tweaks to the way the sipsorcery app servers do their SDP mangling (to understand what SDP mangling is all about and how it affects audio you’ll need to read the previous post NAT, RTP and Audio Problems).

    Luckily the rules for sipsorcery packet mangling are a lot simpler than the various NAT and scenarios and will take a lot less writing to explain. The rules are:

  • If an INVITE request or response arrives with a private address in its SDP the sipsorcery app server will attempt to replace it with the public IP address the request or response was received on,
  • The default mangling behaviour can be turned off in two ways:

    – Adding a dial string option of ma=false to the call leg, example sys.Dial(“[ma=false]”),

    Placing a call between two sipsorcery SIP accounts that have the same networkid. For example if a SIP account with a networkid of “home” calls a second SIP account also with a networkid of “home” then the sipsorcery app server will recognise that no SDP mangling is to take place on the call between them.

  • In general the only time SDP mangling should be turned off is when a call is being placed between two SIP accounts that are on the same private network. For example on my home network I have a Cisco IP phone and a Polycom IP Phone, if I place a call between them through the sipsorcery server and the default packet mangling is left on then the call will have no audio in either direction. That’s because the SDP sent to each phone contains my public IP address and the NAT implementation on my router won’t have a mapping between the public socket and private socket. To get around that mangling can be turned off on the call and the SDP then contains the private address of each phone and they are able to send RTP between each other.

    The ability to place a call between two phones on a private network and have the RTP get set up directly between them is pretty neat. Most SIP Servers or softswitches I have dealt with don’t have the ability to do that except by using a re-INVITE which is error prone. So what generally happens instead is that if two phones that are right next to one another call each other the RTP streams go out across the internet, are bridged on the softswitch and sent back to the phones. That takes up a nice little chunk of bandwidth, introduces latency and all that other good stuff.

    Hopefully people experiencing audio issues will be able to get a better understanding of where the problem might be and hopefully resolve them. There are undoubtedly situations that the sipsorcery servers won’t be able to cope with but in some cases there maybe an additional rule that can be applied and I’d be happy to implement them if anyone comes up with such a case.


    The very first thing to note is that SIP was NOT designed to work with NAT. There are subsequent standards, hacks, workaround, kludges etc. to try and make it work but the original SIP designers somehow deemed it beneath them or put it in the too hard basket to bother coming up with a proper solution (there is not one instance of the string “NAT” in the whole SIP RFC).

    “So what” you may be saying. Well if you’re bothering to read this it’s probably because you are either having or have had audio problems with your SIP VoIP phone. That’s purely down to this massive oversight by the SIP designers. If you look at Skype on the other hand their proprietary protocol has a much smaller incidence of audio problems. The Skype protocol designers went to great lengths to come up with a pragmatic design (of course it was in their interests since they were aiming to make a profit). They even went so far as to enable their traffic to be tunneled through HTTPS proxies so that calls would have a good chance of working behind a corporate firewalls; no chance of that with SIP even now. The SIP designers in their wisdom didn’t even bother to cope with the average home broadband connection.

    If someone was to add up the cost in engineering man hours, user frustration and faulty VoIP calls as a consequence of the SIP standard it would be astronomical. As someone who has run a VoIP company in the past I’d estimate that somewhere between 30 to 50% of all support issues are due to one way audio or other NAT related problems.

    Right that’s the rant out of the way now on to explain the technicalities of the problem particularly in relation to the sipsorcery service.

    The first thing to look at is how a VoIP SIP call is supposed to work in an ideal scenario (which is the only one the SIP standard bothers to accommodate).

    SIP Call - Ideal Scenario

    SIP Call - Ideal Scenario

    In the above diagram the end user SIP device and the SIP server are both on public IP addresses and everything is fine and dandy. To understand the diagram and subsequent ones the legend is:

  • The grey boxes on either side represent the Session Description Protocol (SDP) payloads that are carried in the SIP INVITE requests and responses,
  • The red circles over the grey boxes highlight the critical information within the SDP which is the IP address and port number that the sending device is going to be using for sending and receiving its RTP,
  • The blue lines represent a SIP transmission,
  • The green line represents an RTP stream,
  • A red line represents an RTP stream that could not be established,
  • Public or Private indicate the type of IP address the server or user agent are using.
  • Also as some of the diagrams used in this post get fairly wide and I haven’t spent the time to work out how to widen the columns in the blog software a larger version of the images is available here.

    In the ideal scenario both ends of the SIP call place a publicly accessible IP socket in their SDP and the device at each end of the call has no issues sending and receiving to and from the other’s socket and all is good.

    About the only time you come across the ideal scenario shown above is for SIP trunks between two VoIP Providers. The average residential and business internet connection uses a NAT and that changes the landscape for a SIP call subtly in appearance but dramatically in effect.

    NAT Scenario Basic

    NAT Scenario Basic

    The key point now is that the SIP Phone on the left is operating on a private IP address and that’s what it has placed in its SDP. The call proceeds the same as in the ideal scenario but when the SIP device at the other end, in this case a SIP Softswitch, attempts to send RTP to the phone it can’t because the SDP contains a private address which is not routable on the public internet.

    This diagram represents the classic one-way audio situation. The person on the IP Phone can’t hear the person on the other end of the call. The person on the server side can hear the person on the IP Phone since the phone is happily sending RTP to the server’s public SDP socket.

    You may want to take a break or grab a coffee at this point. If you thought it was hard understanding things so far it only gets worse!

    For SIP to get used in the real World it obviously had to overcome the NAT problem shown in the previous diagram (I probably shouldn’t say obviously as it doesn’t appear to have been that obvious when SIP was being devised). There are actually a number of different ways that NAT can be overcome with SIP but they fall into two categories:

  • The first category is where SIP devices on private IP addresses attempt to determine their public IP address and then use that in their SDP instead of their private address. STUN is one protocol designed for this purpose. Some devices let the user manually specify the public IP address and there are other mechanisms. It doesn’t matter so much how the SIP device gets its public IP address just that it places it into the SDP when making or answering a call,
  • The second category is where SIP Servers will attempt to cope with clients sending them SDP packets with private IP addresses normally be replacing the private address with the public address the packet came from.
  • The important thing to realise is that neither of these mechanisms is foolproof. And that’s worth repeating: there is no 100% foolproof mechanism that can guarantee a SIP call can cope with NAT. Although most of the time it can. The reason there isn’t a guaranteed mechanism is because of the nature of NATs and more specifically NATs using Port Address Translation (PAT). It’s explained a bit more further on but if a NAT translates the port on the outgoing RTP stream of SIP device then it means the port that was set in the SDP is now wrong and sending RTP to the requested socket will fail and result in one-way audio.

    Lets look at one of the mechanisms from the second category that a SIP Server can use to cope with a call from a private device.

    NAT Handling - Server Mangling

    NAT Handling - Server Mangling

    In this diagram the small unreadable text on the right is explaining how the SIP Server is configured to recognise private IP addresses in the SDP and replace them with the IP address the request was received on. The sipsorcery server does exactly that. The problem is that it’s not a particularly robust mechanism. If for example a SIP Proxy is in between the end device and the SIP Server doing the mangling then the public IP address of the Proxy will be placed into the SDP and the RTP will never reach the end device. Or as occasionally happens a faulty NAT will actually leave the source IP address of the packets it transmits as a private IP address giving the SIP Server the choice between a private SDP address and a private origination address (which are probably the same so no choice really).

    The other more common thing that breaks a SIP Server’s attempt at using the request origination address for RTP is the one alluded to previously, PAT.

    NAT Mangling - Broken by PAT

    NAT Mangling - Broken by PAT

    In the diagram above the SIP Server has correctly detected the phone’s public IP address and has attempted to send its RTP packets there. However because the NAT in front of the phone has performed a port translation on the phone’s RTP stream the NAT has no mapping for the socket the Server is attempting to send to and simply drops the packets. The result is again one-way audio.

    So mangling does help and is better than nothing it doesn’t always work depending on what type of NAT device is in front of your phone. Because the sipsorcery service only deals with SIP, and not RTP, mangling is the ONLY thing it can do. There is no other magic it can do to try and get the RTP streams connected up. The best advice for one-way audio and using sipsorcery is to try and set up your router to NOT do port translations on the range your phone uses for RTP.

    The next mechanism a SIP Server can use to cope with NAT is to forget about packet mangling and reflect back RTP to whichever socket it receives on.

    NAT - RTP Reflection

    NAT - RTP Reflection

    In the above diagram the SIP Server will start off sending RTP to whatever socket is specified in the call request’s SDP irrespective of whether it’s a private address or not. Then as soon as it receives an RTP packet from the other end it will assume that is the socket it should be sending its own RTP to and switch to that. This is the mechanism Asterisk uses when you set nat=yes on a SIP account. It does pose a potential security hole in that an attacker could monitor the SIP traffic and then try and get an RTP packet to the SIP Server before the genuine device and thus hijack the RTP stream. In practice there are easier ways to break into SIP systems so it’s unlikely an attacker would bother with that approach under normal circumstances.

    This reflection mechanism is better than mangling because it gets around any port translation the NAT in front of the end user’s phone may have done. As mentioned above the sipsorcery server cannot use this mechanism since it never sees any RTP however when the SIP Server at the destination end of the call is using this mechanism the sipsorcery server doesn’t need to do any NAT handling anyway. If you’re having one-way audio problems it’s not a bad idea to try and find a SIP Provider that has their servers configured to use the RTP reflection mechanism. It saves you having to fiddle with your router and in practical terms is going to cope with most NATs.

    Generally speaking where one end of the call is a SIP Server on a public IP address one-way audio problems should be resolvable. The cases where they are not usually involve either a faulty NAT (there are more around than you would think) or where the a phone is behind multiple NATs. The latter can occur when ISPs run transparent NATs on their network because they are short of IP addresses or for some other reason. In theory multiple NATs should also be coped with by the RTP Reflection handling but in practice as the number of NATs on the audio path increases above one the risk of audio problems seems to rise exponentially!

    The other common situation with sipsorcery users is where there is no SIP Server in the call and instead the call is between two end user devices. Most people think that this will be a simpler situation and their should be less chances of audio problems but that’s not the case and in fact it’s the opposite. Now instead of having one device on a private IP address their will generally be two.

    NAT - User Agent to User Agent

    NAT - User Agent to User Agent

    The above diagram illustrates a call between two sipsorcery users where each user’s phone is on a private network. In this case both user’s have NATs that are doing port translation and neither of the RTP streams get through so neither user hears anything. More common is that only one of the user’s NATs do port translation so one of them will get audio and the other won’t. In this situation the success of the call depends on both the sipsorcery server being able to mangle the SDP so it contains the public IP address and also that the NATs involved do NOT do port translation. If either of those conditions are not met then one or both the RTP streams and therefore audio streams will fail.

    I do have more diagrams and explanations for NAT scenarios around locally installed versions of a sipsorcery server which are even more complicated since now not even the server is on a public IP address. However I’ll leave them for the next post.

    The best advice I can give to anyone having consistent audio issues on VoIP calls is to google their router model and see if there are any other people having the same issue and if they were able to fix it. If that doesn’t yield anything try and borrow a different router from a friend and see if the audio is better with that. If it is I’d personally replace the router as the long term frustration of audio problems on calls far outweighs $100 or less on a new router.


    « Older entries