As a some people have noticed there have been a few “improvements” going on with the sipsorcery service in the last few days. The main thrust of the improvements has been to move closer to a solution on the “long running dialplan/memory leak” issue that has plagued the mysipswitch and more recently sipsorcery services. The cause (and consequences) of the problem are discussed widely on the Forums site and also briefly in this Blog Post.
At this point I’m hopeful that the latest changes will finally solve the issue however I’ll wait for at least a week of stable behaviour before jumping to any substantial conclusions. Very briefly the new approach has been to give up on attempting to isolate the cause of the memory leak somewhere in the interaction between sipsorcery, the DLR and IronRuby and instead accept the leak but isolate it into a new process and recycle the process once it hits a certain memory utilisation.
In theory the idea doesn’t sound overly complex but it meant another round of pulling apart functions that were used to relying on being all within the same process. I’m getting fairly used to it at this stage though. The original mysipswitch service was all wrapped up nice and tightly in a single process. It was when the memory leak first cropped up that the extraction of different mysipswitch functions into different processes started and the single process application has now evolved into the system shown below.
Larger deployment diagram available here.
The trickiest thing ended up being processing calls forwarded to another sipsorcery user that need to use the called user’s dialplan, i.e. one call generating two or more dialplan instances. To be honest coding this up seriously hurt my brain. It starts off ok, one call arrives and drops into the dialplan, that dialplan calls a second sipsorcery user who has specified incoming calls go via one of their dialplans, the second dialplan calls out to a 3rd party SIP Provider. So far so good. But in order to generate the correct call detail records (CDRs) so that both the caller and called users have an accurate record the call between the two dialplans has to generate an extra two CDRs, to do that it means creating two additional SIP transactions. So that’s now 2 dialplan instances, 4 SIP calls and 4 SIP transactions. But then instead of calling an external provider the second dialplan could call another sipsorcery user who also uses an incoming dialplan. AHHHH -> Brian Pain. The flowchart of the whole things gives a bit of an idea of the complexity.
Larger state diagram available here.
So the upshot of all that is that the new isolated process mehanism is now in place and if all goes according to plan there will be no more outages caused by the dialplan processing memory leak (that’s a “hopefully” no more outages not a guarantee and only for the memory leak issue) . At the moment the dialplan worker process is set to recycle once it hits a working memory set of 150MB whenthe secondary worker process will take over until the primary one has completed the recycle. So far it’s working very well. There were a couple of minor hiccups today when the update went in. One caused the “Dial plan script engine was overloaded” error message but that was quickly resolved.
Apart from that a few other minor changes and miscellaneous points that have sprung to mind are:
Phew! That will do for now.