I’m not having much luck getting the counterpath softphones, either Bria 3 or XLite 4, to work with the Google Voice media server.
The problem is the counterpath softphones don’t seem to be implementing ICE properly or perhaps are implementing an earlier version. The specific issue is they don’t send a STUN binding request on the RTP socket and instead launch straight into sending their own RTP stream. The Google Voice server isn’t interested in RTP until it’s done the STUN binding request exchanges. Update: it was my own malformed SDP packet that stopped the XLite from working properly. Once that was fixed the XLite did send a STUN request as part of the media initialisation. The problem then became the Google Voice server not recognising the STUN request because it implements an earlier version of STUN/ICE and it doesn’t recognise the STUN attributes set by the XLite.
I tried out the free version of the Zoiper softphone as well but it doesn’t look like it support any version of ICE so I didn’t get anywhere with it either.
If anyone is aware of a SIP softphone that supports ICE please let me know.
Update: Thanks to Avi Marcus I tried out the Blink softphone which like the XLite also supports ICE. It also send the STUN binding request to initialise the media and as with the XLite it’s requests were rejected as malformed by the Google Talk/Voice XMPP server due to the newer STUN attributes.
Unless something else crops up I think that’s pretty much the end of any attempt to integrate Google Voice XMPP calls with sipsorcery for a while. To work around the mismatch in STUN/ICE versions between softphones that do support ICE and Google Talk/Voice would mean proxying the media and that’s not something the sipsorcery service is geared for. The Google Talk developer document does state they intend to implement the latest versions of Jingle and that would mean also supporting the latest version of ICE. When that happens there will be more options to translate between SIP and XMPP with Google Talk/Voice.
The STUN requests and responses needed to get the Google Voice server to send RTP ended up being very basic, essentially just reflecting the STUN username attribute back is all that was required.
I’ve now been able to get RTP audio flowing.
The gotcha with this is that if sipsorcery is ever able to incorporate the XMPP approach to setting up Google Voice calls it’s only going to work for SIP user agents that support ICE. The version of counterpath Bria softphone I use does support a “version” of ICE but I’ve yet to get it to get the RTP flowing. I’ve got some more messing around to do with the SDP that I’m supplying to the bria and hopefully there’s a way to get it to work.
I’m off to do manual labour in the country for the rest of this weekend so fingers crossed early next week I’ll successfully get a SIP call from a Bria softphone receiving audio from an XMPP initiated Google Voice call.
A week ago I got motivated to build a basic XMPP stack for sipsorcery in order to see how SIP-to-XMPP calls would go with Google Voice. In my last blog posts I’d been successful in getting my phone to ring using a quick and dirty application that sent a pseudo-XMPP request to Google Voice. The pseudo stems from the fact that the requests being sent were based on manual inspection of an Asterisk 1.8 to Google Voice communication rather than having any software that actually understood XMPP.
After a couple of days I had the XMPP stack pretty much complete, for client side operation at least, and was again able to get my phone to ring. I incorporated the stack into the sipsorcery application server and proceeded place a call through it from one of my IP Phones eagerly anticipating the RTP stream arriving from Google Voice. Silence. Hmm I must have wired things up incorrectly. After double checking it all looked good and I eventually determined there were some additional steps required to get the RTP started after the signalling had completed its work.
The XMPP from Google Voice supplies RTP endpoints as a series of candidates in an ICE type format. ICE is the replacement for STUN and is a standard that details a set of steps to try and deal with NAT. Confusingly ICE uses a new version of STUN which also goes by the name of STUN – seriously somebody should be tarred and feathered for that one; create a new standard that uses the same acronym as the one it replaces creating a never ending source of confusion about which STUN STUN stands for!
Back to XMPP. There is a standard that deals with setting up RTP via XMPP called Jingle ICE-UDP Transport Method. Unfortunately as with the other Jingle standards I’ve perused the Google Voice XMPP server does not support them or supports an earlier or custom version. I wasted some time trying to send a STUNv2 (that’s the STUN from RFC5389) request to the RTP candidate socket for a few days before going back to my Asterisk 1.8 server and looking at what it was doing. It was sending a STUNv1 (that’s the STUN from RFC3489) request with a username attribute. I’d been attempting a STUNv2 request with username, message integrity and fingerprint attributes. I’m now getting a STUNv1 response from the Google Voice end which means hopefully I’m not too far away now from getting the RTP flowing.
I’m still very curious as to where the FreeSWITCH/Asterisk 1.8 developers got the info on the mechanisms to integrate with Google Voice over XMPP. Everytime I get stuck and go and look at a standard I end up finding it’s not being used. I found that with Jingle and now with ICE even though the Google Voice XMPP is sort of using them both.
I changed a single setting in my Asterisk 1.8 gtalk.conf and bingo my Google Voice call worked. The setting was to specify a stunaddr rather than specifying an externip. It’s probably got something to do with the Amazon EC2 instance my test Asterisk server is running on being behind a NAT. With the call working I was finally to see the raw XMPP packets being sent and copy them into my own C# app and have it initiate a Google Voice call over XMPP.
The code is pretty useless in its current form, all it can do is get a phone to ring. No audio is available. However in theory that’s the hardest part out of the way and it shouldn’t be difficult to translate an incoming SIP request into an outgoing XMPP request.
For anyone curious the XMPP packets that need to be sent to initiate a Google Voice call are shown below. Whatever is sending those packets will need to be authenticated prior to sending them but at least that part of it follows the XMPP standards.
<iq firstname.lastname@example.org/Talk-12312' id='1234' email@example.com/srvres' type='set'> <session xmlns='http://www.google.com/session' type='initiate' id='abcdef12344' firstname.lastname@example.org/Talk-12312'> <description xmlns='http://www.google.com/session/phone'> <payload-type id='0' name='PCMU' bitrate='64000' clockrate='8000' /> <payload-type id='100' name='EG711U' bitrate='64000' clockrate='8000' /> <payload-type id='101' name='telephone-event' clockrate='8000' /> </description> </session> </iq> <iq email@example.com/Talk-12312' id='1234' firstname.lastname@example.org/srvres' type='set'> <session xmlns='http://www.google.com/session' type='candidates' id='abcdef12344' email@example.com/Talk-12312'> <candidate name='rtp' address='121.223.xxx.xxx' port='10202' username='asdasdoas' password='dshasjja84' preference='1.0' protocol='udp' type='stun' network='0' generation='0' /> <transport xmlns='http://www.google.com/transport/p2p'/> </session> </iq>
The parameters caught by Firebug for a POST request that was generated when placing a call from Gmail:
count 3 ofs 41 req0__sc c req0_jid c1712561387 req0_json ["jmistart","+firstname.lastname@example.org","c1712561387","Thu Nov 04 10:04:12 2010","a"] req0_type j req1__sc c req1_jid c1712561387 req1_json ["jc","+email@example.com","c1712561387",[["10.1.1.2","57208","rtp","gNrtqsy8IWpSbE6e","ZBX31w0dgOdbjkfw","1","udp","0","local","0"]]] req1_type j req2__sc c req2_jid c1712561387 req2_json ["jc","+firstname.lastname@example.org","c1712561387",[["121.223.xxx.xxx","57209","rtp","Yn/0kLMEYxKhZq9w","om4GTLNgDGXmfsJK","0.9","udp","0","stun","0"]]] req2_type j
The Gmail dialler also loads a flash plugin to handle the audio and video.
I placed the call from Firefox 3.6.12 which doesn’t support the new HTML 5 web sockets. It’s possible that in browsers that support HTML5, such as Chrome and Firefox 4, the Gmail dialler does use XMPP. That would mean that Google Voice would have two different code bases for the dialler. For most companies that wouldn’t be feasible but Google has more programmers than most companies so it’s entirely feasible.
I also just had a quick trawl through the Asterisk mailing list and Google Voice support in 1.8 still seems to be a bit hazy. The source code comments indicate a definite intent to support calls over XMPP, no mention of jingle. But when someone posted a question about how to acheive Google Voice calling with Asterisk 1.8 most of the answers describe various HTTP callback mechanisms, the same as what sipsorcery currently does. Some people do seem to be having success with originating Google Voice with Asterisk 1.8 and XMPP calls but despite following the same steps it didn’t work for me.
After a couple of days of not getting past the error message from the Google XMPP server when attempting to place a PSTN call out through Google Voice I decided to fire up Asterisk 1.8 and capture the jingle messages it used so I could copy them.
Amazon’s EC2 is prefect for this sort of job (EC2 is great for Linux, crap for Windows if you haven’t read my previous post on that topic). So I fired up the stock standard Amazon Linux instance only to find it didn’t have the developer tools installed. Shut it down and fired up a different CentOS one which this time did have the developer tools. Pulled down the Asterisk 1.8 sources and kicked off the install. I was happy that I could remember all the steps to build Asterisk and install the dependencies that were invariably missing. I should be able to remember the steps as I’ve probably performed them over a 100 times in the past 6 years but it’s probably been 1 or 2 years since I last did it.
Anyway I finally got Asterisk 1.8 up and running and followed the instructions on the wiki to get Google Voice set up. That all worked smoothly enough and I started getting the raw XMPP messages showing up in the Asterisk console. Perfect they are exactly the messages I want to copy. Configured my extensions.conf to dial out to one of my US DIDs, dialled in with my xlite softphone, got a ring tone (generated by Asterisk) and then right at crunch time I got exactly the same error message as my quick and dirty code. Bummer. I tweaked the format of the DID in extensions.conf and also tried a different one just to make sure it wasn’t a format or number specific issue but all attempts resulted in the same error.
I don’t really follow the Asterisk mailing list anymore but I wonder how many people get the same thing I got and whether anyone has got Google Voice calls to the PSTN working through Asterisk 1.8?
The exact error message I get is:
<pho:recipient-unavailable xmlns:pho="http://www.google.com/session/phone">Session timed out</pho:recipient-unavailable>
Calling the same number through the gmail popup dialler does work and it apparently uses Jingle. So that would seemingly rule out my Google Voice account being the problem.
My choices now are to try the same exercise with FreeSWITCH which at least one person has reported is working for them or alternatively to try and figure out a way to intercept the traffic from the gmail dialler. I did try the latter quickly with fiddler but it stopped the dialler working. It’s always a tricky trying to transparently intercept SSL traffic. Whichever option it’s a job for tomorrow.
And I don’t mean an advert for radio.
On the 25th of August 2010 Google Voice announced calls could now be made from within the Gmail web browser interface. Fine, handy, I didn’t think much of it at the time and assumed it was just an HTTP request from Gmail to the Google Voice web service to initiate the call and that included a new trick to avoid the need for a callback.
I didn’t think anything more of it until I saw a post on the recent Asterisk 1.8 release. I’d already seen that Asterisk 1.8 supported making calls via Google Voice and again assumed it was done using HTTP. However this particular article stated that wasn’t the case and that Asterisk was using XMPP and the Jingle extension and an Asterisk wiki article on the topic seemingly confirms it.
Being able to initiate calls through Google Voice via XMPP was something I found interesting. Not because I find the existing HTTP callback mechanism clunky, although it is, I just don’t need to make any calls to the US. But it’s always fun integrating voice technologies and being able to place calls to the PSTN through a major service with XMPP was something new, at least in my experience.
There are XMPP libraries around and in fact sipsorcery already uses the AG Software XMPP library to for the sys.GTalk dialplan method. For fun I decided to spend Saturday seeing how far I could get towards placing a call to Google Voice via XMPP from scratch. XMPP is a bit different from SIP and revolves around the concept of XML streams. Having worked a lot with XML over the years it wasn’t hard to grasp. By the end of the day I was able to successfully connect, authenticate and send instant messages to a GTalk client. However at the next step of initiating a call to Google Voice I ran into a bit of a dead end.
I obviously wasn’t the first programmer to try this as there was already the Asterisk 1.8 implementation that had been in existence for at least 3 weeks. And after a little bit more digging it appears the FreeSWITCH implementation was around for at least a month before that, I’m guessing the Asterisk implementation borrowed from the FreeSWITCH one. I therefore assume Google Voice must have documented or blogged about what was required to connect with them over XMPP. Although one thing I found strange was that there is no facility in Google’s own XMPP client, GTalk, to initiate a call to the PSTN, at least not for me. After a bit more searching I haven’t really made any headway and it looks like I’m going to have to fire up gdb and attach to Asterisk (I’m more capable of driving it than FreeSWITCH). The FreeSWITCH and/or Asterisk programmers got the knowledge from somewhere and I’m wondering if it was reverse engineering the GMail calls or not…
The exercise has left me wondering about Google Voice and XMPP. Given the level of activity around the various mechanisms to make Google Voice calls from SIP I would have thought that more people would have picked up on the new capability of originating calls via XMPP. To translate a SIP call to XMPP is actually very simple compared to the sort of mechanism sipsorcery uses with the HTTP request and the use of a callback provider. My main driver is curiosity but there are a lot of people out there that take it a personal challenge to get free calls as easily and conveniently as possible and it’s curious that they haven’t picked up on this yet.
Oh and if I’ve missed a reference which specifies the format for the Jingle calls to Google Voice, which is entirely possible, I’d be grateful for any information :).
Update (couple of hours after original post): I started trying the different call request formats I stumbled across in a Google Talk API reference. After a few attempts I was still getting nowhere and then I got a error message back with a redirect destination. That looked promising so I parsed that out and resent the call request. This time I got back what looked like a pending response but then after about 10 seconds I got a “pho:recipient-unavailable session timed” out error message. I tried a few different numbers and number formats just to make sure but still couldn’t get past that message or get any of my numbers to ring. Still it’s a step closer.
Voxeo have just announced a new SDK called phono which looks pretty interesting. At it’s core it seems to be a flash based softphone admittedly of which there are others around. However a fair bit of work seems to have gone into the programmability of the softphone and it’s free to download and use.
I haven’t had that much time to play with it yet but I have been able to place calls successfully with it. Entering sip:email@example.com in the call box on the Phono main page will call a demo number through sipbroker for the ever present monkeys. I’ve noticed on a couple of calls I didn’t get any audio but refreshing the browser, which would re-initialise the flash softphone, fixed that. Most likely an issue with my NAT timing out whatever sockets the softphone is using.
Each time I’m forced to shut down new sipsorcery account sign ups I get a bunch of email requests asking for a new account to be set up. In the past I’ve been able to say just wait a few days and signups will be re-enabled. However this time that’s not going to happen in a hurry as there’s no easy way to milk more capacity out of the existing sipsorcery deployment.
As such I’ve decided to see if an auction process for new accounts will gain any traction. The merits of the approach are that it provides a way to control the rate of new signups and if the accounts get auctioned for more than $0.30 Australian dollars (the eBay fee) then it will also be able to contribute towards the sipsorcery running costs and perhaps at some point even generate enough for an additional server.
To start with I’ve just put a single account up for auction put if it works out I plan to add another 10 or so accounts. After that I’ll make sure the server is handling for a small period of time and if so the auction will be repeated.
Unfortunately it had to happen at some point, the single dedicated server that sipsorcery is running on is overloaded and I have been forced to disable new accounts. In actual fact I waited too long and the service is really starting to creak a bit with call set up times sometimes taking longer than they should and registrations occasionally being dropped because the responses don’t get processed in time.
At this point there is no plan to increase the server capacity to allow new accounts to be created. If the server utilisation drops off because some existing users drop off then I will be able to allow some new accounts but apart from that at this point it’s a regretful sorry to any new users.