I’ve been doing some work this week on implementing another measure to improve the reliability of the sipsorcery dialplan processing. Specifically the measure is to cope with a call worker process becoming unresponsive and refusing to process any more dialplan executions. This issue typically gets manifested as the “Long running dialplan” log message.
I had thought the call worker process “stalls” has been a result of a memory leak in the DLR and that it had been solved by recycling the processes when they had reached a memory allocation of 150MB. However as soon as I put the last post on this blog about stability a call worker process stalled with a memory utilisation less than 150MB.
I suspect as with a lot of tricky software bugs there’s more than one issue here. Stopping the memory leak has definitely improved reliability but there is still something else that can cause a call worker stall and my suspicion is some kind of Ruby dialplan script is able to tie the DLR up in knots and render it incapable of processing any further script executions. Unfortunately I’ve never been able to produce such a script but then I don’t push the limits of Ruby with classes or recursive functions etc. in my own dialplans.
The new measure implemented today is designed to cope with a call worker process stall irrespective of its memory utilisation. So the hope is now that the sipsorcery call processing process is able to cope with anything thrown at it and is 100% reliable.
That’s not to say there are not other things that can go wrong, I’m still none the wiser as to the two incidents a few weeks ago where the Amazon EC2 instance the sipsorcery server was running on seemed to drop off the network and not respond on remote desktop or any other protocol. Thankfully apart from those two cases it hasn’t occurred again. The next reliability measure currently in progress is to have two instances running side by side so that if there is a problem with one the other one will take over.
On a related note I have had to cut off two users in the past two weeks for inappropriate use. One for trying to brute force a provider by running a dialplan script that cycled through usernames to see if any had a password of 1234. The second one wa for blasting the sipsorcery server with a constant volume of calls. The calls weren’t forwarded to a provider since user’s providers weren’t set up properly but it resulted in a 10% rise in the base CPU utilisation which has a small but noticeable impact on other user’s calls. It also means when I’m watching the sipsorcery call activity there’s a continuous stream of log messages which is painful.
With cases where an obvious hack attempt is being made the account will be terminated immediately. If you’re planning on scripting up some elaborate dialplan to hack SIP Providers with sipsorcery you will be wasting a lot of time on your dialplan development as your account will get removed. With the second case and similar ones I will attempt to notify the user via email and give them a chance to fix their account and will re-activate the account if they later respond.