With WebRTC starting to gain momentum I have been doing a bit of work recently on integrating a custom .Net/C++ application with WebRTC capable browsers (at the time of writing Chrome and FireFox). It’s been challenging work and there are a lot of moving parts involved with WebRTC. The good thing is once that once the integration challenges have been overcome WebRTC streaming works very well and my experience so far of WebRTC video streams are a lot better than those I’ve had with SIP video soft phones.

The SIPSorcery code base now has all the components needed to develop prototype applications that can integrate with WebRTC browsers. An example of a WebRTC video test pattern can be found here. While it is a very rudimentary application that simply adds a timestamp to a static JPEG image it does demonstrate the network and encoding pipeline necessary to get media streams working between a .Net application and a WebRTC browser.

The good thing is that if you have a project that needs .Net and WebRTC you might be able to use the SIPSorcery code as your starting point and perhaps even contribute to it. To create your own WebRTC project you need to add the SIPSorcery and SIPSorceryMedia nuget packages. The SIPSorceryMedia package uses some native and C++ dll’s and does require the Visual C++ Redistributable Packages for Visual Studio 2013 to be installed on any machine that executes it. It’s also built for x64 only so won’t run on 32 bit systems (I can build it for other platforms if anyone ever has a need).

An example of how to use the WebRTC components is in the source code of the WebRTCVideoSample project referenced above. The main classes are WebRTCDaemon and WebRtcSession. The WebRtcSession class is where the ICE connection establishment is coordinated and the connection to the WebRTC browser is set up. The WebRTCDaemon does the web socket signalling, which is not part of WebRTC but does provide a handy way to get the session established, and also the VP8 encoding of the timestamped JPEG image. I’ve included the WebRtcSession code below to show the pretty small amount of code required.

using System;
using System.Linq;
using System.Net;
using System.Net.Sockets;
using SIPSorceryMedia;
using SIPSorcery.Net;
using SIPSorcery.Sys;
using log4net;

namespace WebRTCVideoServer
    public class WebRtcSession
        private const int RTP_MAX_PAYLOAD = 1400;
        private const int TIMESTAMP_SPACING = 3000;
        private const int PAYLOAD_TYPE_ID = 100;
        private const int SRTP_AUTH_KEY_LENGTH = 10;

        private static ILog logger = AppState.logger;

        public WebRtcPeer Peer;
        public DtlsManaged DtlsContext;
        public SRTPManaged SrtpContext;
        public SRTPManaged SrtpReceiveContext;  // Used to decrypt packets received from the remote peer.

        public string CallID
            get { return Peer.CallID; }

        public WebRtcSession(string callID)
            Peer = new WebRtcPeer() { CallID = callID };

        public void DtlsPacketReceived(IceCandidate iceCandidate, byte[] buffer, IPEndPoint remoteEndPoint)
            logger.Debug("DTLS packet received " + buffer.Length + " bytes from " + remoteEndPoint.ToString() + ".");

            if (DtlsContext == null)
                DtlsContext = new DtlsManaged();
                int res = DtlsContext.Init();
                logger.Debug("DtlsContext initialisation result=" + res);

            int bytesWritten = DtlsContext.Write(buffer, buffer.Length);

            if (bytesWritten != buffer.Length)
                logger.Warn("The required number of bytes were not successfully written to the DTLS context.");
                byte[] dtlsOutBytes = new byte[2048];

                int bytesRead = DtlsContext.Read(dtlsOutBytes, dtlsOutBytes.Length);

                if (bytesRead == 0)
                    logger.Debug("No bytes read from DTLS context :(.");
                    logger.Debug(bytesRead + " bytes read from DTLS context sending to " + remoteEndPoint.ToString() + ".");
                    iceCandidate.LocalRtpSocket.SendTo(dtlsOutBytes, 0, bytesRead, SocketFlags.None, remoteEndPoint);

                    //if (client.DtlsContext.IsHandshakeComplete())
                    if (DtlsContext.GetState() == 3)
                        logger.Debug("DTLS negotiation complete for " + remoteEndPoint.ToString() + ".");
                        SrtpContext = new SRTPManaged(DtlsContext, false);
                        SrtpReceiveContext = new SRTPManaged(DtlsContext, true);
                        Peer.IsDtlsNegotiationComplete = true;
                        iceCandidate.RemoteRtpEndPoint = remoteEndPoint;

        public void MediaPacketReceived(IceCandidate iceCandidate, byte[] buffer, IPEndPoint remoteEndPoint)
            if ((buffer[0] >= 128) &amp;amp;&amp;amp; (buffer[0] <= 191))
                //logger.Debug("A non-STUN packet was received Receiver Client.");

                if (buffer[1] == 0xC8 /* RTCP SR */ || buffer[1] == 0xC9 /* RTCP RR */)
                    // RTCP packet.
                    //webRtcClient.LastSTUNReceiveAt = DateTime.Now;
                    // RTP packet.
                    //int res = peer.SrtpReceiveContext.UnprotectRTP(buffer, buffer.Length);

                    //if (res != 0)
                    //    logger.Warn("SRTP unprotect failed, result " + res + ".");
                logger.Debug("An unrecognised packet was received on the WebRTC media socket.");

        public void Send(byte[] buffer)
                Peer.LastTimestamp = (Peer.LastTimestamp == 0) ? RTSPSession.DateTimeToNptTimestamp32(DateTime.Now) : Peer.LastTimestamp + TIMESTAMP_SPACING;

                for (int index = 0; index * RTP_MAX_PAYLOAD < buffer.Length; index++)
                    int offset = (index == 0) ? 0 : (index * RTP_MAX_PAYLOAD);
                    int payloadLength = (offset + RTP_MAX_PAYLOAD < buffer.Length) ? RTP_MAX_PAYLOAD : buffer.Length - offset;

                    byte[] vp8HeaderBytes = (index == 0) ? new byte[] { 0x10 } : new byte[] { 0x00 };

                    RTPPacket rtpPacket = new RTPPacket(payloadLength + SRTP_AUTH_KEY_LENGTH + vp8HeaderBytes.Length);
                    rtpPacket.Header.SyncSource = Peer.SSRC;
                    rtpPacket.Header.SequenceNumber = Peer.SequenceNumber++;
                    rtpPacket.Header.Timestamp = Peer.LastTimestamp;
                    rtpPacket.Header.MarkerBit = ((offset + payloadLength) >= buffer.Length) ? 1 : 0; // Set marker bit for the last packet in the frame.
                    rtpPacket.Header.PayloadType = PAYLOAD_TYPE_ID;

                    Buffer.BlockCopy(vp8HeaderBytes, 0, rtpPacket.Payload, 0, vp8HeaderBytes.Length);
                    Buffer.BlockCopy(buffer, offset, rtpPacket.Payload, vp8HeaderBytes.Length, payloadLength);

                    var rtpBuffer = rtpPacket.GetBytes();

                    int rtperr = SrtpContext.ProtectRTP(rtpBuffer, rtpBuffer.Length - SRTP_AUTH_KEY_LENGTH);
                    if (rtperr != 0)
                        logger.Warn("SRTP packet protection failed, result " + rtperr + ".");
                        var connectedIceCandidate = Peer.LocalIceCandidates.Where(y => y.RemoteRtpEndPoint != null).First();
                        connectedIceCandidate.LocalRtpSocket.SendTo(rtpBuffer, connectedIceCandidate.RemoteRtpEndPoint);
            catch (Exception sendExcp)
                // logger.Error("SendRTP exception sending to " + client.SocketAddress + ". " + sendExcp.Message);

The SIPSorceryMedia assembly utilises the Cisco libsrtp (for secure RTP), openssl (for DTLS) and libvpx (for VP8 encoding). It also makes use of Microsoft’s Media Foundation and the ffmpeg libraries for various bits and pieces.

Tags: , ,

After a brief interlude (nearly 3 years) I recently got motivated to look at using the SIPSorcery SIP stack to build a video softphone. Of course there are already a number of fully featured video softphones available so the project was for fun rather than to solve any particular problem.

Unlike my previous attempts this time I have been successful and the image below shows the video softphone prototype on a call with CounterPath’s Bria Softphone (that’s me chatting to Max).


I did end up using Microsoft’s Media Foundation (MF) for the getting samples from my web cam but I gave up on trying to use the MF H264 codec and instead used the VP8 codec from the webproject. A motivation to use the VP8 codec is that it was the initial codec proposed for WebRTC and at some point I’d like to experiment with placing calls from the softphone to a browser.

The video softphone is available in at the sipsorcery codeplex repository under the sipsorcery-softphonev2 folder. All the MF and libvpx integration is contained in the sipsorcery-media folder.

The new softphone is purely experimental and video calls do not even work with the only other softphone I tested with, jitsi, due to a VP8 framing problem. But it is a working implementation of a Windows video softphone so may prove useful for anyone who wants to do some work in those areas.


The SIPSorcery SIP server will be moving to a new virtual machine instance tomorrow, Thursday 8 Jan 2015, at 0200 PST. There will be a brief disruption to the SIP services of approximately 5 to 10 minutes  while the IP address is switched over. The reason for the migration is to move to a more up to date operating system version. No user action is required. The existing IP address will continue to be used on the new virtual machine.

The SSL certificate for the sipsorcery web server (www.sipsorcery.com) has been updated. Unfortunately GoDaddy signed the new certificate with a different intermediate certificate. This doesn’t impact browsers but anyone using the web services from Ruby may need to reference a different certificate bundle file. The new bundle can be downloaded from here.

The SIPSorcery web site and web services will be moving to a new server within the next few days. As a few people have noted the current web site hosting arrangements are not ideal with regular interruptions due to the site getting “recycled” by the host due to “excessive resource utilisation”. Apparently the issue relates to the connections from the Silverlight client Console page which repeatedly poll the SIPSorcery SIP servers for new messages.

To overcome the issue the web sites will be moved onto a new virtual machine that is physically next to the SIPSorcery SIP servers.

There should be minimal disruption as a result of the web site move since the change will be automatically propagated by DNS hence no action by SIPSorcery users is required. For any one that requires it the IP address of the new web server will be No changes are being made to the SIP services and they will be unaffected by the move.

The prices for new SIP Sorcery plans have today been increased to $69/year for a Premium plan and $199/year for a Professional plan.

Part of the increase is due to the extra features now included in the plans. Both now include the use of the online Switchboard which previously was only available as a separate add on. The Professional plan also has a new Real-time Call Control and Billing feature that allows sub-accounts to be created for managing calls on behalf of a small to medium VoIP business.

Existing customers with a PayPal subscription set up for renewing their account will be entitled to remain on the price that they signed up with.

The SIPSorcery REST provisioning service is now publicly available. More information can be found on the Provisioning Help page.

The service allows the management of SIP Account and SIP Provider resources from your favourite programming language.

The SIPSorcery server will be moving to a new host in one week on Sunday the 17th of June 2012 at 0200 PST. The move is to take advantage of some synergies with another SIP service, there will be more information on that further down the track. The consequence of the server move will be that the IP address of the service will change from to Ideally all users should be using sipsorcery.com and it’s recommended that anyone that may have configured their device with the SIPSorcery server’s IP address now switch to the host name.

For users that are using sipsorcery.com NO ACTION IS REQUIRED. For users that want to use the IP address then you will need to update your devices to use to the new IP address AFTER the 17th of June. Both the old and new servers will both work up until the the 21st of June 2012 after which the old IP address and server will be de-commissioned.

During the migration their will be a short outage of between 15 and 20 minutes while the database is migrated to the new server. The SIPSorcery Twitter account will be updated prior to and subsequent to the migration.

If anyone has any concerns regarding the migration please email admin@sipsorcery.com.


I’ve been able to sort of accomplish my goal of recording audio and video streams to the same MP4 file. The attached code sample does the job however there is something wrong with the way I’m doing the sampling as the audio and video are slightly out of sync and also the audio is truncated from the end. I though it was worth posting the sample though as it’s taken me a few days to finally get it working. The key was getting the audio’s media attributes correctly set for the writer.

// Configure the audio stream.
// See http://msdn.microsoft.com/en-us/library/windows/desktop/dd742785(v=vs.85).aspx for AAC encoder settings.
// http://msdn.microsoft.com/en-us/library/ff819476%28VS.85%29.aspx
CHECK_HR(MFCreateMediaType( pAudioOutType ), L"Configure encoder failed to create media type for audio output sink." );
CHECK_HR( pAudioOutType->SetGUID( MF_MT_MAJOR_TYPE, MFMediaType_Audio ), L"Failed to set audio writer attribute, media type." );  
CHECK_HR( pAudioOutType->SetGUID( MF_MT_SUBTYPE, MFAudioFormat_AAC ), L"Failed to set audio writer attribute, audio format (AAC).");
CHECK_HR( pAudioOutType->SetUINT32( MF_MT_AUDIO_NUM_CHANNELS, 2 ), L"Failed to set audio writer attribute, number of channels." );
CHECK_HR( pAudioOutType->SetUINT32( MF_MT_AUDIO_BITS_PER_SAMPLE, 16 ), L"Failed to set audio writer attribute, bits per sample." );
CHECK_HR( pAudioOutType->SetUINT32( MF_MT_AUDIO_SAMPLES_PER_SECOND, 44100 ), L"Failed to set audio writer attribute, samples per second.");
CHECK_HR( pAudioOutType->SetUINT32( MF_MT_AUDIO_AVG_BYTES_PER_SECOND, 16000 ), L"Failed to set audio writer attribute, average bytes per second.");
//CHECK_HR( pAudioOutType->SetUINT32( MF_MT_AAC_AUDIO_PROFILE_LEVEL_INDICATION, 0x29 ), L"Failed to set audio writer attribute, level indication.");
CHECK_HR( pWriter->AddStream( pAudioOutType, audioStreamIndex ), L"Failed to add the audio stream to the sink writer.");

RecordMP4 code sample

I haven’t made much progress since the last post except to determine that I was barking up the wrong tree by attempting to combine an the audio and video streams with the media foundation. I decided to check the RTP RFCs related to H.263 and H.264 to determine how the audio and video combination should be transmitted and it turns out they are pretty much independent. That means to start with I can use the existing softphone code for the audio side of things and use the Media Foundation to do the video encoding and decoding. I’m thinking of switching to H.263 for the first video encoding mechanism as it’s simpler than H.264 and will be easier to package up into RTP.

For the moment I will keep going with my attempt to get the Media Foundation to save a video and audio stream into a single .mp4 file as I think that will be a very useful piece of code to have around. The problem I’m having at the moment is getting the audio encoding working. The audio device I’m using returns a WAVE_FORMAT_IEEE_FLOAT stream but from what I can determine I need to convert it to something like MFAudioFormat_Dolby_AC3_SPDIF MFAudioFormat_AAC for MPEG4. I need to investigate that some more.

One added benefit of looking into how RTP transmits the audio and video streams is that I finally got around to getting a video call working between my desktop Bria softphone and the iPhone Bria version. I’ve never had much luck with video calls between Bria softphones in the past, the video would get through sporadically but not very reliably. So I was pleasantly surprised when with a few tweaks of the NAT settings on the iPhone version I was able to get a call working reliably. Although it’s still not quite perfect as the video streams will only work if the desktop softphone calls the iPhone softphone and not the other way around. Still it’s nice to see video calls supported through the SIPSorcery server with absolutely no server configuration required. That’s the advantage of a SIP server deployment with no media proxying or transcoding.

« Older entries