sipsorcery's blog

Occassional posts about VoIP, SIP, WebRTC and Bitcoin. response times SIP Sorcery Last 3 Hours
daily weekly status

Outage 22nd Oct 2009

The sipsorcery server had an outage yesterday. Based on the logs the outage was 5 hours long starting at 1553 UTC until 2056 UTC.

The cause of the outage was the Amazon EC2 instance that the sipsorcery servers run on seemingly losing network connectivity. This is the 3rd (possibly 4th) time this has happened and I’ll be putting a ticket in to Amazon support to see if there is any more information about it since the last ticket.

While it’s annoying and the frequency of the incidents is way too high things do go wrong and servers do crash. For the sipsorcery service to become more reliable it will need to be able to cope with losing a server instance. The work I have been doing on incorporating the Amazon SimpleDB as the sipsorcery’s data repository is with precisely that goal in mind. It will provide a scaleable, reliable (hopefully more than the EC2 instance) and shared data layer that will allow two independent sipsorcery instances to utilise. If one instance drops off for whatever reason the other one would still be available. With the EC2 cloud having expanded into Europe it would mean one sipsorcery instance could run in Amazon’s European data centre and the other in the US one which would hopefully make it very unlikely both instances would have an operating system or hardware issue simultaneously.

There was a bit of Murphy’s Law with this outage as well. I do have monitoring set up for the server and get sent an SMS if it stops responding. Last night just as I was going to bed my phone was giving those annoying beeps to indicate the battery was low and since I couldn’t be bothered to go an find the recharger turned it off until morning. Of course 3 hours later the sipsorcery instance lost its network connectivity and an SMS was sent to an off phone. Apart from that it’s debatable whether I would one hear and two get up and check an SMS that arrived at 0300 but going on recent history I probably would. I was up briefly at 0500 to give my daughter back her dummy so I would have spotted it then as well. But as it happened I didn’t become aware of the sipsorcery being down until around 0745 when I checked my office phones and saw they weren’t registered. Ten minutes and a reboot (thankfully EC2 instances can be rebooted through a web browser, there was no other way to communicate with the sipsorcery one) later all was back to normal.

The above paragraph is not what you want to read when considering the support arrangements of your VoIP service but that’s why it’s free :).