Dial Plan Script Processing Problem (aka as: Long running dialplan script was terminated)

For over a year now the mysipswitch service has suffered from instability due to an issue isolated to the processing of Ruby dialplan scripts.

At the time the problem first came to my attention it took the best part of a month with lots of memory profiling to nail the issue down and then put in improvements to contain it. The symptoms of the problem were large memory consumption followed ultimately by the script processing failing to initialise and therefore going nowhere before being cancelled by the monitoring thread with the dreaded “Long running dialplan script was terminated” message.

The issue was mitigated against by separating out the mysipswitch functions into different processes and scheduling automatic restarts of the Application Server process that was responsible for the dialplans. This bought another 6 or so months of reliable use of mysipswitch but eventually as usage increased a daily auto-restart was not enough and more frequent restarts were required. Unfortunately that did not work well either as sometimes the Windows Service would fail to start properly and the restarts had to be left at every 24 hours.

The hope was that the problem in the IronRuby library would be fixed in the interim and would solve the issue on mysipswitch when it evolved to sipsorcery.

When the sipsorcery service went live the latest IronRuby source was downloaded from svn, built and deployed. It was a sinking feeling when the same memory leak was observed on the sipsorcery Application Server Process.

At that point I decided I’d have to bite the bullet and dive into the IronRuby code and see if I could find the bug responsbile for the leak. I’d avoided this in the past as I prefer to focus on adding new features or sorting out bugs in the sipsorcery code base rather than getting distracted on length forays into the 3rd party libraries.

The dive into IronRuby was launched and after 3 days of testing and scouring the code it ended up looking like the problem was not a memory leak being caused by a bug but instead was being caused by the use of Thread.Abort to cancel the dialplan scripts once a call was answered. This was a good discovery to make and at the time I thought the solution must be just around the corner, I was wrong.

Since Thread.Abort was causing the problem I had to look for an alternative way to ending the dialplan scripts. IronRuby runs on top of the Microsoft Dynamic Language Runtime so I started looking over the documentation for that and found a reference to “Interrupt Execution” which is exactly what was needed to halt the answered or long running scripts. This was another false dawn as that feature is not yet implemented in the DLR and is not on the current roadmap.

Being optimistic I thought I’d now delve into the DLR code and see if I could implement Interrupt Execution myself. Unsuprisingly it ended up being a lot more difficult than I thought but eventually I came up with a very crude mechanism where I would keep track of which script interpreters belonged to which dialplan and would then halt their execution once a call was answered. Initially that seemed to work well but then I noticed the interpreters seemed to be getting shared between multiple dialplans and halting the execution of one dialplan could end up halting another one that was running at the same time. This would also explain some behaviour that people were posting on the mysipswitch forums.

So that approach had to go and I have removed the code that was shutting down the interpreters. And…there is no And… that’s where things are right now. Currently there is no nice tidy mechanism to halt a dialplan after it has been answered. I have put a check into the sys.Dial command so that it will not run after the call has been answered but subsequent script commands will continue to be run until the end of the script.

That’s not the end of the World but it’s not ideal and also doesn’t cover the case where a user accidentally puts their script into an infinite loop. So it’s back to the drawing board and in the meantime you should check that your dialplan is ok to execute the commands after a call gets answered, most should be.

Sigh…

Aaron

  1. emoci’s avatar

    Hey Aaron,

    Hope you’re doing well….could the last part of the issue above be acting up in my case

    My Dialplan (relevant part):

    sys.Log(“Accepting Call”)

    if sys.IsAvailable() then
    sys.Log(“The ATA is online.”)
    sys.Dial(“local”)
    else
    sys.Log(“The ATA is off-line.”)
    sys.Dial(“613813@voxalot.com”)
    end

    SipSorcery is showing this on incoming calls to the account handled by that dialplan though:

    DialPlan 13:03:33:148: ***Starting New Call Event***
    DialPlan 13:03:33:352: ***Incoming Call Starting***
    DialPlan 13:03:33:352: Incoming Call From mocierion at mocierion@proxy01.sipphone.com
    DialPlan 13:03:33:352: Call is for emoci
    DialPlan 13:03:33:352: *** CID: mocierion ***
    DialPlan 13:03:33:352: *** Trunk: emoci ***
    DialPlan 13:03:33:352: Accepting Call
    DialPlan 13:03:36:570: The ATA is off-line.
    DialPlan 13:03:36:570: The call has already been answered the Dial command was not processed.

    Reply

  2. sipsorcery’s avatar

    The log message “The call has already been answered the Dial command was not processed.” means the originating call has been sent a final response. That can happen due to one of three things:

    1. A forwarded call has been answered and bridged to the originating call,
    2. The call was cancelled by the originator,
    3. The call timed out.

    In your case I’m not sure what happened as the Dial command that generated the message isn’t shown on the dialplan snippet.

    A couple of points that will help you:

    sys.IsAvailable() is the equivalent of sys.IsAvailable(“owner”, “sipsorcery.com”)

    Where “owner” is the username you login to the website with. If your SIP account is not the same as your web site name you should use:

    sys.IsAvailable(“sipaccountusername”, “sipsorcery.com”)

    Similarly with sys.Dial(“local”) if the SIP account you are intending to forward to isn’t sys.Dial(“incominguser@sipsorcery.com”) where the originating INVITE request was “INVITE sip:incominguser@sipsorcery.com” then it won’t work.

    There should have been some more messages in your trace and there does seem to be a gap of 3s. Did you cut some of them out?

    Reply

  3. emoci’s avatar

    Well,

    Tried to recreate the issue by calling in…and today everything is working fine with the exact same setup…

    Not quite sure what happened…

    If you did anything on your end…thanks very much!

    As a sidenote:
    I don’t think I cut a part of the log…but the rest of the DialPlan I was using looked like this:

    #Ruby
    sys.log(“***Starting New Call Event***”)

    #********Incoming Calls********

    if sys.In then
    sys.log(“***Incoming Call Starting***”)
    sys.log(“Incoming Call From #{req.Header.From.FromName} at #{req.Header.From.FromURI.User}@#{req.Header.From.FromURI.Host}”)
    sys.log(“Call is for #{req.URI.User}”)
    cid = req.Header.From.FromURI.User
    trunk = req.URI.User
    sys.log(“*** CID: #{cid} ***”)
    sys.log(“*** Trunk: #{trunk} ***”)

    #****Have Calls Ring ATA or Forward if Offline****

    sys.Log(“Accepting Call”)

    if sys.IsAvailable() then
    sys.Log(“The ATA is online.”)
    sys.Dial(“local”)
    else
    sys.Log(“The ATA is off-line.”)
    sys.Dial(“613813@voxalot.com”)
    end
    else

    #********Outgoing Calls********

    sys.log(“***Outgoing Call Starting***”)

    if req.URI.User.to_s =~ /^(d)(d{9})/
    sys.Dial($1 + “*” + $2 + “@MJ”)
    end

    end

    Reply

Reply

Your email address will not be published. Required fields are marked *