August 2009

You are currently browsing the monthly archive for August 2009.

As a some people have noticed there have been a few “improvements” going on with the sipsorcery service in the last few days. The main thrust of the improvements has been to move closer to a solution on the “long running dialplan/memory leak” issue that has plagued the mysipswitch and more recently sipsorcery services. The cause (and consequences) of the problem are discussed widely on the Forums site and also briefly in this Blog Post.

At this point I’m hopeful that the latest changes will finally solve the issue however I’ll wait for at least a week of stable behaviour before jumping to any substantial conclusions. Very briefly the new approach has been to give up on attempting to isolate the cause of the memory leak somewhere in the interaction between sipsorcery, the DLR and IronRuby and instead accept the leak but isolate it into a new process and recycle the process once it hits a certain memory utilisation.

In theory the idea doesn’t sound overly complex but it meant another round of pulling apart functions that were used to relying on being all within the same process. I’m getting fairly used to it at this stage though. The original mysipswitch service was all wrapped up nice and tightly in a single process. It was when the memory leak first cropped up that the extraction of different mysipswitch functions into different processes started and the single process application has now evolved into the system shown below.

SIPSorcery Deployment Diagram

SIPSorcery Deployment Diagram

Larger deployment diagram available here.

The trickiest thing ended up being processing calls forwarded to another sipsorcery user that need to use the called user’s dialplan, i.e. one call generating two or more dialplan instances. To be honest coding this up seriously hurt my brain. It starts off ok, one call arrives and drops into the dialplan, that dialplan calls a second sipsorcery user who has specified incoming calls go via one of their dialplans, the second dialplan calls out to a 3rd party SIP Provider. So far so good. But in order to generate the correct call detail records (CDRs) so that both the caller and called users have an accurate record the call between the two dialplans has to generate an extra two CDRs, to do that it means creating two additional SIP transactions. So that’s now 2 dialplan instances, 4 SIP calls and 4 SIP transactions. But then instead of calling an external provider the second dialplan could call another sipsorcery user who also uses an incoming dialplan. AHHHH -> Brian Pain. The flowchart of the whole things gives a bit of an idea of the complexity.

SIPSorcery State Diagram - Dialplan-to-Dialplan Call Processing

SIPSorcery State Diagram - Dialplan-to-Dialplan Call Processing

Larger state diagram available here.

So the upshot of all that is that the new isolated process mehanism is now in place and if all goes according to plan there will be no more outages caused by the dialplan processing memory leak (that’s a “hopefully” no more outages not a guarantee and only for the memory leak issue) . At the moment the dialplan worker process is set to recycle once it hits a working memory set of 150MB whenthe secondary worker process will take over until the primary one has completed the recycle. So far it’s working very well. There were a couple of minor hiccups today when the update went in. One caused the “Dial plan script engine was overloaded” error message but that was quickly resolved.

Apart from that a few other minor changes and miscellaneous points that have sprung to mind are:

  • Duplicate call destinations are now being ignored in dial strings. For some reason some users were blasting a single provider with the same call with 10 different usernames. That’s not very friendly behaviour and is now being prevented.
  • I’m working on a blog post and some diagrams to explain how the audio (RTP) streams get set up when using the sipsorcery service. There are a few people mentioning one way audio issues on the forums, which is not a new thing, and there have been some changes from mysipswitch to sipsorcery that were intended to improve the RTP set up that warrant mentioning. Suffice to say the whole audio set up process relies on two IP addresses that are contained in the body of the INVITE request and response. The only thing the sipsorcery software ever attempts to do is mangle the IP address in the body if it is private. I’ll explain more about this in a subsequent post.
  • There have been a few requests to access to CDRs from the sipsorcery system. There is a web service interface for .Net WCF clients available at https://www.sipsorcery.com/provisioning.svc, I believe other SOAP clients should also be able to access it using the WSDL from https://www.sipsorcery.com/provisioning.svc?wsdl but haven’t verified that. You need to be able to program against web services in order to be able to use it. Down the track I would hope to expose a more user friendly way of getting at this sort of data but don’t know when that will be. The provisioning service actually exposes all the sipsorcery functions and is what the Silverlight client uses to communicate with the backend. For those people that didn’t like the Silverlight interface it’s feasible that a different AJAX/HTML based UI could be built which calls that service.
  • On a less technical note the first month or so of the sipsorcery service has been a pretty hectic one. Not withstanding the prominent issues such as the database crash and the memory leak it’s been a bit of a running battle keeping up with sipsorcery users to keep the service functioning smoothly. A big difference between mysipswitch and sipsorcery is that I tried to anticipate and avoid more of the minor issues that impact the service performance. As just one example people put invalid entries in as a registration contact for a provider. That can result in the sipsorcery registration agent wasting time doing DNS lookups or trying to contact a non-existent IP address. I stopped all the obvious things: a user registering a contact back to sipsorcery which is compeletly pointless; making sure a contact host is an IP address or hostname; mkaing sure that the contact have a user portion and a host portion etc. etc. Despite all that registration contacts like sip:me@U.S.A still appeared so more rules need to get added and bad registration contacts disabled. It does take a bit of admin time keeping the sipsorcery service running over and above trying to sort the software out.
  • Phew! That will do for now.

    Aaron

    Approximately an hour ago the Amazon instance hosting the sipsorcery service froze/crashed. It happened as I was attempting to clean up the cdr’s database table by deleting the older cdr records. There were nearly half a million call records in the table which was hampering the ability to scroll through them in the Silverlight interface. It was a fairly innocuous operation but unfortunately it’s had some significant consequences.

    The bad news is that at this point it looks like all the data from the sipsorcery service has been lost. That includes all user accounts, SIP providers, dial plans etc. The reason it has been lost is that I made a mistake when setting up the MySQL database and the data directory was sitting on the C: drive and not the F: drive. The F drive in this case was an Amazon ElastiC Block Volume (EBS) which survives across instance reboots. Somehow when I installed MySQL I configured it so that only the directory for the MySQL binaries was on the F volume but the data directory was on the C drive.

    The good news is the service is back up and running (now with the MySQL database on the F: drive). It’s a relatively quick operation to recover in the event of a crash like this since the IP address and volume can be re-attached to a new instance very easily.

    Unfortunately anyone wanting to use the sipsorcery service will need to re-create their account and dialplans etc. I realise it’s a real pain and I lost all my settings as well so am also suffering.

    Sorry,

    Aaron

    Newer entries »