October 2009

You are currently browsing the monthly archive for October 2009.

I was literally in the final stages of development to use Amazon’s SimpleDB as the storage for sipsorcery when an email drops into my inbox announcing a new Amazon Relational Database Service (RDS). The RDS service is basically a hosted MySQL database which is exactly what sipsorcery is using right now.

The problem the SimpleDB implementation was going to overcome is that the sipsorcery MySQL database is a single point of failure and to make it fault tolerant would require a minimum of 6 EC2 instances. With the new RDS service the MySQL single point of failure problem has been solved in one foul swoop. The same swoop has rendered nearly two months of work integrating SimpleDB into sipsorcery redundant. As Ned Kelly would say “such is life”.

Update: After looking into the RDS offering and playing around with it some more it looks like I was a bit hasty assuming it would be a better option than the SimpleDB. I was able to get a MySQL instance up and running and working with sipsorcery in no time at all. The issue though is that the MySQL instance provided by RDS looks to simply be a MySQL server hosted on a dedicated EC2 instance. That deployment model does not overcome the single point of failure limitation that currently exists. In addition the RDS documentation states that the MySQL instances will require a 4 hour window each week for maintenance downtime.

I was naively hoping that the RDS service provided database instances hosted on a MySQL cluster with five nines uptime and no single point of failure. It looks like that option may be coming but I supect the pricing will be high. The SimpleDB already provides a storage option with high uptime and no single point of failure and is cheap. It’s capabilities are less than a relational database but with a bit of wizardry that can be overcome in the sipsorcery server software. So it’s back to the SimpleDB approach.

The sipsorcery server had an outage yesterday. Based on the logs the outage was 5 hours long starting at 1553 UTC until 2056 UTC.

The cause of the outage was the Amazon EC2 instance that the sipsorcery servers run on seemingly losing network connectivity. This is the 3rd (possibly 4th) time this has happened and I’ll be putting a ticket in to Amazon support to see if there is any more information about it since the last ticket.

While it’s annoying and the frequency of the incidents is way too high things do go wrong and servers do crash. For the sipsorcery service to become more reliable it will need to be able to cope with losing a server instance. The work I have been doing on incorporating the Amazon SimpleDB as the sipsorcery’s data repository is with precisely that goal in mind. It will provide a scaleable, reliable (hopefully more than the EC2 instance) and shared data layer that will allow two independent sipsorcery instances to utilise. If one instance drops off for whatever reason the other one would still be available. With the EC2 cloud having expanded into Europe it would mean one sipsorcery instance could run in Amazon’s European data centre and the other in the US one which would hopefully make it very unlikely both instances would have an operating system or hardware issue simultaneously.

There was a bit of Murphy’s Law with this outage as well. I do have monitoring set up for the sipsorcery.com server and get sent an SMS if it stops responding. Last night just as I was going to bed my phone was giving those annoying beeps to indicate the battery was low and since I couldn’t be bothered to go an find the recharger turned it off until morning. Of course 3 hours later the sipsorcery instance lost its network connectivity and an SMS was sent to an off phone. Apart from that it’s debatable whether I would one hear and two get up and check an SMS that arrived at 0300 but going on recent history I probably would. I was up briefly at 0500 to give my daughter back her dummy so I would have spotted it then as well. But as it happened I didn’t become aware of the sipsorcery being down until around 0745 when I checked my office phones and saw they weren’t registered. Ten minutes and a reboot (thankfully EC2 instances can be rebooted through a web browser, there was no other way to communicate with the sipsorcery one) later all was back to normal.

The above paragraph is not what you want to read when considering the support arrangements of your VoIP service but that’s why it’s free :).

A quick update on works in progress.

SimpleDB
Prototype is up and running. The SimpleDB transaction times are definitely longer, in the range of 100 to 200ms rather than the <10ms of a relational database, but with a bit of tuning and making some dialplan mechanisms more efficient it should be possible to make any call delays impercetible.

SSH
The NSsh code has been successfully incorporated into the sipsorcery monitoring server and is up and running. The filter console works just as well over ssh as it does with telnet. To date I’ve only tested with cygwin (openssh) and Putty but there is no reason to think it wouldn’t work with any SSH client.

Adhearsion
With the sipsorcery service being relatively stable for long periods my mind has been able to wander over other ideas that pop into it from time-to-time. I got interested in playing around with the integration between an Asterisk server again, I’ve done this in the past from time-to-time. It’s very straight forward to do the integration but the tricky bit is making it manageable for multiple users. I’ve done a lot of work in the past with an Asterisk interfacing technology called FastAGI and was dwelling over ways to wire that up for sipsorcery users. I then recalled a project I’d stumbled across in the past called Adhearsion that uses FastAGI (in combination with the Asterisk Management Interface) to present a way to drive Asterisk dialplans using Ruby. That would tie in very nicely with sipsorcery dialplans which are aldo Ruby based so I’ve started feeling around how difficult that would be to integrate. Signs are promising so far although only afew hours have been spent on the whole investigation.

REST Interface
The imminent closure of the mysipswitch service has caused consternation for a few people especially those that don’t use Windows and therefore can’t use the sipsorcery Silverlight GUI. I’ve already published a SOAP interface for the sipsorcery provisioning service and am now hoping to get a REST interface published in the next few days. The idea being to provide a quick and easy way to create an alternative AJAX based interface for sipsorcery and encourage someone to undertake the effort.

September was a good month with regards the stability of the sipsorcery service and a drastic improvement on August. There have been no incidents of the application servers (they are the ones that process calls) stalling with the dreaded “Long running dialplan” message. That’s not to say that the service is five 9’s reliable there were still two incidents where administrative intervention was required to disable some user accounts that were doing very unsociable things, effectively causing a denial of service, but it’s getting there.

It’s been 2 and a half weeks since the last significant change was made to the sipsorcery service and I’ve been spending the time since then working away on two new improvements for reliability and security.

SimpleDB

The first improvement in the pipeline is to switch from using a MySQL relational database for storing all the sipsorcery data to use Amazon’s SimpleDB. The big advantage of SimpleDB is that it already takes care of the hard database things like scalability, fault tolerance, load balancing, and replication. MySQL does have a Cluster solution that would be suitable for the sipsorcery service but one it’s a lot of effort to set up and two and more significantly a redundant deployment scenario involves a minimum of 6 servers which is prohibitive for a free service like sipsorcery.

There are a some trade offs between an RDBMS like MySQL and Amazon’s SimpleDB for instance the querying capabilities of SimpleDB are much less sophisticated and in fact are limited to string comparisons and restricted to case sensitive ones at that. Also the SimpleDB only exposes services over a REST or SOAP endpoint which involves the overheads of HTTP and SSL whereas an RDBMS can use a direct TCP connection with much lower overheads. The sipsorcery service does rely heavily on its database and there are multiple queries performed for each call that gets processed. If the processing time of the queries is high then the result will be big call set up delays. So far it doesn’t look like that should be an issue the response times of SimpleDB queries have been low. I will be doing further testing but ultimately if things work out the plan is to migrate from MySQL to the SimpleDB.

SSH

The other improvement is to move from using a telnet connection for the monitoring console to an ssh one. That means the sipsorcery server will have to incorporate an ssh server and the Silverlight GUI and ssh client. Thankfully there’s an open source C# ssh server library called NSsh that is a good starting point. I’ve been doing some work with NSsh and have managed to get it working with putty and the cygwin ssh clients and am now attempting to build a client from the same library to incoporate into Silverlight.

These two improvements are plumbing and won’t offer any new features but what they will do is make the service more reliable and secure. After they are done the hope is to get back to the interesting stuff and to spend some time on SIP transfers and the switchboard GUI.

Aaron