Capturing the and converting the video stream from my webcam to an H.264 file didn’t prove to be as bad as I thought. It did help a lot that the Media Foundation SDK has a sample called MFCaptureToFile that’s doing exactly the same thing and the vast majority I’ve used below has been copied and pasted directly from that sample.

Here’s the 5 second .mp4 video that represents the fruits of my labour.

The important code bits are shown below (just to reiterate I barely know what I’m doing with this stuff and generally remove the error checking and memory management code to help me get the gist of the bits I’m interested in).


// Initialize the Media Foundation platform.
hr = MFStartup(MF_VERSION);
if (SUCCEEDED(hr))
{
WCHAR *pwszFileName = L"sample.mp4";
IMFSinkWriter *pWriter;

hr = MFCreateSinkWriterFromURL(
pwszFileName,
NULL,
NULL,
pWriter);

// Create the source reader.
IMFSourceReader *pReader;

hr = MFCreateSourceReaderFromMediaSource(*ppSource, pConfig, pReader);

//GetCurrentMediaType(pReader);
//ListModes(pReader);

pReader->GetCurrentMediaType((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, pType);

printf("Configuring H.264 sink.n");

// Set up the H.264 sink.
hr = ConfigureEncoder(pType, pWriter);
if (FAILED(hr))
{
printf("Configuring the H.264 sink failed.n");
}

// Register the color converter DSP for this process, in the video
// processor category. This will enable the sink writer to enumerate
// the color converter when the sink writer attempts to match the
// media types.

hr = MFTRegisterLocalByCLSID(
__uuidof(CColorConvertDMO),
MFT_CATEGORY_VIDEO_PROCESSOR,
L"",
MFT_ENUM_FLAG_SYNCMFT,
0,
NULL,
0,
NULL);

hr = pWriter->SetInputMediaType(0, pType, NULL);
if (FAILED(hr))
{
printf("Failure setting the input media type on the H.264 sink.n");
}

hr = pWriter->BeginWriting();
if (FAILED(hr))
{
printf("Failed to begin writing on the H.264 sink.n");
}

DWORD streamIndex, flags;
LONGLONG llTimeSt
IMFSample *pSample = NULL;
CRITICAL_SECTION critsec;
BOOL bFirstSample = TRUE;
LONGLONG llBaseTime = 0;
int sampleCount = 0;

InitializeCriticalSection(critsec);

printf("Recording...n");

while(sampleCount < 100)
{
hr = pReader->ReadSample(
MF_SOURCE_READER_ANY_STREAM, // Stream index.
0, // Flags.
streamIndex, // Receives the actual stream index.
flags, // Receives status flags.
llTimeStamp, // Receives the time stamp.
pSample // Receives the sample or NULL.
);

wprintf(L"Stream %d (%I64d)n", streamIndex, llTimeStamp);

if (pSample)
{
if (bFirstSample)
{
llBaseTime = llTimeSt
bFirstSample = FALSE;
}

// rebase the time stamp
llTimeStamp -= llBaseTime;

hr = pSample->SetSampleTime(llTimeStamp);

if (FAILED(hr))
{
printf("Set psample time failed.n");
}

hr = pWriter->WriteSample(0, pSample);

if (FAILED(hr))
{
printf("Write sample failed.n");
}
}

sampleCount++;
}

printf("Finalising the capture.");

if (pWriter)
{
hr = pWriter->Finalize();
}

//WriteSampleToBitmap(pSample);

// Shut down Media Foundation.
MFShutdown();
}

HRESULT ConfigureEncoder(IMFMediaType *pType, IMFSinkWriter *pWriter)
{
HRESULT hr = S_OK;

IMFMediaType *pType2 = NULL;

hr = MFCreateMediaType(pType2);

if (SUCCEEDED(hr))
{
hr = pType2->SetGUID( MF_MT_MAJOR_TYPE, MFMediaType_Video );
}

if (SUCCEEDED(hr))
{
hr = pType2->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264);
}

if (SUCCEEDED(hr))
{
hr = pType2->SetUINT32(MF_MT_AVG_BITRATE, 240 * 1000);
}

if (SUCCEEDED(hr))
{
hr = CopyAttribute(pType, pType2, MF_MT_FRAME_SIZE);
}

if (SUCCEEDED(hr))
{
hr = CopyAttribute(pType, pType2, MF_MT_FRAME_RATE);
}

if (SUCCEEDED(hr))
{
hr = CopyAttribute(pType, pType2, MF_MT_PIXEL_ASPECT_RATIO);
}

if (SUCCEEDED(hr))
{
hr = CopyAttribute(pType, pType2, MF_MT_INTERLACE_MODE);
}

if (SUCCEEDED(hr))
{
DWORD pdwStreamIndex = 0;
hr = pWriter->AddStream(pType2, pdwStreamIndex);
}

pType2->Release();

return hr;
}

HRESULT CopyAttribute(IMFAttributes *pSrc, IMFAttributes *pDest, const GUID key)
{
PROPVARIANT var;
PropVariantInit(var);

HRESULT hr = S_OK;

hr = pSrc->GetItem(key, var);
if (SUCCEEDED(hr))
{
hr = pDest->SetItem(key, var);
}

PropVariantClear(var);
return hr;
}

One thing that’s missing is audio. I’ve got the video into the .mp4 file but I need an audio stream in there as well.

The next step is to get audio in and then try and check that the media file will be understood by a different video softphone, probably Counterpath’s Bria since I already have that installed.

It feels like I’ve made a lot of progress from in the last few days although reflecting on what I’ve achieved I actually haven’t got much closer to the softphone goal. My two accomplishments, which seemed exciting at the time, were:

  • Successfully get a list of all the video modes that my webcam supports,
  • Get a video stream from my webcam and save a single frame as a bitmap.

The first step was to get an IMFSourceReader from the IMFMediaSource (my webcam) I created in the part II. My understanding of the way these two interfaces work is that IMFMediaSource is implemented by a class that wraps a device, file, network stream etc. that is capable of providing some audio or video and IMFSourceReader by the class that knows how to read samples from the media source.

The code I used to list my webcam’s video modes is shown below.

// Initialize the Media Foundation platform.
hr = MFStartup(MF_VERSION);
if (SUCCEEDED(hr))
{
    // Create the source reader.
    IMFSourceReader *pReader;
 
    hr = MFCreateSourceReaderFromMediaSource(*ppSource,	pConfig, &amp;amp;pReader);

    if (SUCCEEDED(hr))
    {
        while (SUCCEEDED(hr))
       {
            IMFMediaType *pType = NULL;
            hr = pReader->GetNativeMediaType(0, dwMediaTypeIndex, &pType);
            if (hr == MF_E_NO_MORE_TYPES)
            {
                hr = S_OK;
                break;
            }
            else if (SUCCEEDED(hr))
            {
                // Examine the media type. 
                CMediaTypeTrace *nativeTypeMediaTrace = new CMediaTypeTrace(pType);
                printf("Native media type: %s.n", nativeTypeMediaTrace->GetString());
                pType->Release();
            }

            ++dwMediaTypeIndex;
        }
    }
}

The code in the snippet is just using the standard except for the CMediaTypeTrace class. That’s actually the useful class since it takes the IMFMediaType, which is mostly a bunch of GUIDs that map to constants to describe one of the webcam’s modes, and spits out some plain English to represent the resolution, format etc. of the webcam’s mode. The CMediaTypeTrace class is not actually in the Media Foundation library and instead is provided in mediatypetrace.h which is in one of the samples in the MediaFoundation directory that comes with the Windows SDK (on my system it’s in Windowsv7.1Samplesmultimediamediafoundationtopoedittedutil). As it happens the two video modes that my camera supports, RGB24 and I420, were not included in the list of GUIDs in mediatypetrace.h so I had to search around the place to find what they were and then add them in.

LPCSTR STRING_FROM_GUID( GUID Attr )
{
    ...
    INTERNAL_GUID_TO_STRING( MFVideoFormat_RGB24, 14 );	  // RGB24
    INTERNAL_GUID_TO_STRING( WMMEDIASUBTYPE_I420, 15 );   // I420
}

A full list of the modes my web cam supports are listed below.

Device Name: Logitech QuickCam Pro 9000.
Current media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 640, H: 480.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 640, H: 480.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 160, H: 90.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 160, H: 100.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 160, H: 120.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 176, H: 144.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 320, H: 180.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 320, H: 200.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 320, H: 240.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 352, H: 288.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 640, H: 360.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 640, H: 400.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 864, H: 480.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 768, H: 480.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 800, H: 450.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 800, H: 500.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 800, H: 600.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 960, H: 720.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 1280, H: 720.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 1280, H: 800.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 1280, H: 1024.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 1600, H: 900.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 1600, H: 1000.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=RGB24, FRAME_SIZE=W 1600, H: 1200.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 640, H: 480.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 160, H: 90.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 160, H: 100.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 160, H: 120.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 176, H: 144.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 320, H: 180.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 320, H: 200.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 320, H: 240.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 352, H: 288.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 640, H: 360.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 640, H: 400.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 864, H: 480.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 768, H: 480.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 800, H: 450.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 800, H: 500.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 800, H: 600.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 960, H: 720.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 1280, H: 720.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 1280, H: 800.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 1280, H: 1024.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 1600, H: 900.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 1600, H: 1000.
Native media type: Video: MAJOR_TYPE=Video, SUBTYPE=I420, FRAME_SIZE=W 1600, H: 1200.

The second thing I was able to do was to take a sample from my webcam and save it as a bitmap. To do this I took a lot some short-cuts, namely hard coding the size of the sample, which I know from my webcam’s default mode (640 x 480), and relying on the fact that that mode does not result in any padding (I’m not 100% on that and have taken an educated guess). I found someone else’s sample that created a bitmap file and blatantly copied it. Below is the code I used to extract the sample and save the bitmap.

// Initialize the Media Foundation platform.
hr = MFStartup(MF_VERSION);
if (SUCCEEDED(hr))
{
	// Create the source reader.
	IMFSourceReader *pReader;

	hr = MFCreateSourceReaderFromMediaSource(
		*ppSource,
		pConfig,
		&amp;amp;pReader);

	//GetCurrentMediaType(pReader);
	//ListModes(pReader);
				
	DWORD streamIndex, flags;
	LONGLONG llTimeStamp;
	IMFSample *pSample = NULL;

	while(!pSample)
	{
		// Initial read results in a null pSample??
		hr = pReader-&amp;gt;ReadSample(
			MF_SOURCE_READER_ANY_STREAM,    // Stream index.
			0,                              // Flags.
			&amp;amp;streamIndex,                   // Receives the actual stream index. 
			&amp;amp;flags,                         // Receives status flags.
			&amp;amp;llTimeStamp,                   // Receives the time stamp.
			&amp;amp;pSample                        // Receives the sample or NULL.
			);

		wprintf(L&amp;quot;Stream %d (%I64d)n&amp;quot;, streamIndex, llTimeStamp);
	}

	// Use non-2D version of sample.
	IMFMediaBuffer *mediaBuffer = NULL;
	BYTE *pData = NULL;
	DWORD writePosn = 0;

	pSample-&amp;gt;ConvertToContiguousBuffer(&amp;amp;mediaBuffer);

	hr = mediaBuffer-&amp;gt;Lock(&amp;amp;pData, NULL, NULL);

	HANDLE file = CreateBitmapFile(&amp;amp;writePosn);

	WriteFile(file, pData, 640 * 480 * (24/8), &amp;amp;writePosn, NULL);

	CloseHandle(file);

	mediaBuffer-&amp;gt;Unlock();

	// Shut down Media Foundation.
	MFShutdown();
}

HANDLE CreateBitmapFile(DWORD *writePosn)
{
	HANDLE file;
	BITMAPFILEHEADER fileHeader;
	BITMAPINFOHEADER fileInfo;
	//DWORD write = 0;
 
	file = CreateFile(L&amp;quot;sample.bmp&amp;quot;,GENERIC_WRITE,0,NULL,CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL);  //Sets up the new bmp to be written to
 
	fileHeader.bfType = 19778;                                                                    //Sets our type to BM or bmp
	fileHeader.bfSize = sizeof(fileHeader.bfOffBits) + sizeof(RGBTRIPLE);                                                //Sets the size equal to the size of the header struct
	fileHeader.bfReserved1 = 0;                                                                    //sets the reserves to 0
	fileHeader.bfReserved2 = 0;
	fileHeader.bfOffBits = sizeof(BITMAPFILEHEADER)+sizeof(BITMAPINFOHEADER);                    //Sets offbits equal to the size of file and info header
 
	fileInfo.biSize = sizeof(BITMAPINFOHEADER);
	fileInfo.biWidth = 640;
	fileInfo.biHeight = 480;
	fileInfo.biPlanes = 1;
	fileInfo.biBitCount = 24;
	fileInfo.biCompression = BI_RGB;
	fileInfo.biSizeImage = 640 * 480 * (24/8);
	fileInfo.biXPelsPerMeter = 2400;
	fileInfo.biYPelsPerMeter = 2400;
	fileInfo.biClrImportant = 0;
	fileInfo.biClrUsed = 0;
 
	WriteFile(file, &amp;amp;fileHeader, sizeof(fileHeader), writePosn, NULL);
	WriteFile(file, &amp;amp;fileInfo, sizeof(fileInfo), writePosn, NULL);

	return file;
}

So that was all fun but it hasn’t gotten me much closer to have an H.264 stream ready for bundling into my RTP packets. Getting the H.264 stream will be my next focus. I think I’ll try capturing it to an .mp4 file as a first step. Actually I wonder if there’s a way I can test an .mp4 file with a softphone and VLC? That would be a handy way to test if the H.264 stream I get is actually going to work when I use it in a VoIP call.

I also ordered Developing Microsoft Media Foundation Applications from Amazon thinking it might help only me on this journey only to find it available for free online a couple of days later :(.

Got the video device enumeration code working, at least well enough for it to tell me my webcam is a Logitech QuickCam Pro 9000. The working code is below.

#include &quot;stdafx.h&quot;
#include &lt;mfapi.h&gt;
#include &lt;mfplay.h&gt;
#include &quot;common.h&quot;

HRESULT CreateVideoDeviceSource(IMFMediaSource **ppSource);

int _tmain(int argc, _TCHAR* argv[])
{
	printf(&quot;Get webcam properties test console.n&quot;);

	CoInitializeEx(NULL, COINIT_APARTMENTTHREADED | COINIT_DISABLE_OLE1DDE);
	
	IMFMediaSource *ppSource = NULL;

    CreateVideoDeviceSource(&amp;ppSource);

	getchar();

	return 0;
}

HRESULT CreateVideoDeviceSource(IMFMediaSource **ppSource)
{
     *ppSource = NULL;

    UINT32 count = 0;

    IMFAttributes *pConfig = NULL;
    IMFActivate **ppDevices = NULL;

    // Create an attribute store to hold the search criteria.
    HRESULT hr = MFCreateAttributes(&amp;pConfig, 1);

    // Request video capture devices.
    if (SUCCEEDED(hr))
    {
        hr = pConfig-&gt;SetGUID(
            MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE, 
            MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUID
            );
    }

    // Enumerate the devices,
    if (SUCCEEDED(hr))
    {
        hr = MFEnumDeviceSources(pConfig, &amp;ppDevices, &amp;count);
    }

	printf(&quot;Device Count: %i.n&quot;, count);

    // Create a media source for the first device in the list.
    if (SUCCEEDED(hr))
    {
        if (count &gt; 0)
        {
            hr = ppDevices[0]-&gt;ActivateObject(IID_PPV_ARGS(ppSource));

			if (SUCCEEDED(hr))
			{
				WCHAR *szFriendlyName = NULL;
    
				// Try to get the display name.
				UINT32 cchName;
				hr = ppDevices[0]-&gt;GetAllocatedString(
					MF_DEVSOURCE_ATTRIBUTE_FRIENDLY_NAME,
					&amp;szFriendlyName, &amp;cchName);

				if (SUCCEEDED(hr))
				{
					wprintf(L&quot;Device Name: %s.n&quot;, szFriendlyName);
				}
				else
				{
					printf(&quot;Error getting device attribute.&quot;);
				}

				CoTaskMemFree(szFriendlyName);
			}
		}
        else
        {
            hr = MF_E_NOT_FOUND;
        }
    }

    for (DWORD i = 0; i &lt; count; i++)
    {
        ppDevices[i]-&gt;Release();
    }
    CoTaskMemFree(ppDevices);
    return hr;
}

The problem I had previously was a missing call to CoInitializeEx. It seems it’s needed to initialise things to allow the user of COM libraries.

The next step is now to work out how to get a sample from the webcam.

This is the first post in what will hopefully be a successful series of posts detailing how I manage to build a video capable softphone using Windows Media Foundation.

I’d never heard of Media Foundation until last week so not only do I not know how to use it I also don’t know it will be suitable for the task. I do know it is the successor to the  Windows DirectShow API but does not yet provide the same coverage so I may have to delve into the DirectShow API as well. On top of that neither API has a comprehensive managed .Net interface so that means the job needs to be done in C++. My C++ skills are severely undernourished so I’m expecting it to take a while to get up to speed before I can start really diving into the APIs.

What I do have is a basic working softphone that I can build on which means I can focus on the video side of things. My goal is to be able to place a SIP H.264 video call with my webcam to another video softphone, such as Counterpath’s Bria. Given other things going on at the moment, such as a 7 week old baby and a 3 year old, I’m estimating the project could take 2 to 3 months. As to why I’m interested in this it’s because it’s something different from both .Net and SIP. I’ve been working with both those for a long time so taking a break and playing with something different but still related is appealing.

Enough chit chat, getting started…

1. The first thing I’ve done is to install the Windows SDK for Windows 7 and take a look at the Media Foundation sample projects. The first sample I tried was the SimpleCapture project and it ran fine out of the box.

2. After looking through a few more of the samples I feel the need to get coding. Being able to get a video stream from my webcam is the obvious place to start. I’ve created a C++ Win32 console application and found an article which discusses enumerating the system’s video capture devices. I haven’t gotten very far as yet but I’m now wondering if my Logitech Webcam Pro 9000 driver supports H.264 meaning I wouldn’t need to use any of the Media Foundation H.264 codec capabilities? A quick look at the camera’s specification page and I’m pretty sure the answer is no.

3. I’ve now got the sample compiling and running but the count of my video devices is coming back as 0 🙁 so I’ve probably got some flags wrong somewhere.

The first couple of hours hasn’t got me very far yet. More tomorrow.

For some reason after being completely disinterested in doing anything with the RTP and audio side of VoIP calls for the last 5 or so years suddenly in the last month I decided to explore how well a .Net based softphone would work. Consequently I started tinkering around with a .Net library called NAudio that I’d seen mentioned around the traps. For my purposes NAudio provided a convenient way to get at the underlying Windows API calls for interacting with audio input and output devices. It took a little bit of time and effort to get things working but eventually I was able to successfully read audio samples from my microphone and write samples to my speakers through a test .Net application.

The softphone is open source and available in a binary form here and the source is availabe here in the sipsorcery-softphone project. Before going any further it should be noted that the softphone is extremely rudimentary and geared towards developers or VoIP hobbyists wanting to tinker rather than end users looking for trouble free calling. The user interface is extremely lacking and there are also crucial components missing such as echo cancellation, a jitter buffer, codec support (G.711 u-law is the only codec supported) etc.

My original verdict on using .Net as a softphone platform was that it was not particularly good. This was due to the fact that the microphone samples coming from NAudio were only capable of being delivered with a sample period of 200ms which is useless since the in practice the jitter buffer at the remote end will drop any packet over 50 or 100ms. However it turned out that a combination of some inefficient code in my RTP packet parsing and the fact that I was testing by running the softphone in Visual Studio debug mode was responsible for the high sampling latency. Once those issues were removed the microphone samples have been delivered reliably with a sample period of 20ms exactly as required. I was thinking i I ever wanted to have a usable softphone I’d have to move the RTP and audio processing to a C++ library but now I’m starting to believe that’s not necessary and .Net is capable of handling the 20ms sample period.

The other thing worth mentioning about the softphone is that it’s capable of placing calls directly to Google Voice’s XMPP gateway. I’m still surprised that none of the mainstream softphone developers have bothered to add the STUN bindings to their RTP stacks so that they could work with Google Voice. In the end I decided I’d just prototype it myself just for kicks. For a softphone that already has RTP and STUN protocol support adding the ability to work with Google Voice in conjunction with a SIP-to-XMPP gateway (which SIPSorcery coudl do) would literally be less than 20 lines of code.

Hopefully the softphone will be useful to someone. Judging on the number of queries I get about the SIPSorcery softphone project and the questions about .Net softphones on stackoverflow I imagine it will be.

 

I’ve created a short guide on how SIP manages audio streams and the sorts of things that go wrong when those streams traverse NATs. The full guide can be read at SIP and Audio Guide.

To complement the guide I’ve whipped together a diagnostics tool.

SIPSorcery RTP Diagnostics Tool

In an attempt to help people diagnose RTP audio issues I have created a new tool that provides some simple diagnostic messages about receiving and transmitting RTP packets from a SIP device. The purpose of the tool is twofold:

  1. On a SIP call indicate the expected socket the RTP packets were expected from and the actual socket they came from,
  2. On a SIP call indicate whether it was possible to transmit RTP packets to the same socket the SIP caller was sending from.

To use the tool take the following steps:

  1. Open http://diags.sipsorcery.com in a browser and click the Go button. Note that the web page uses web sockets which are only supported in the latest web browsers, I’ve tested it in Chrome 16, Firefox 9.0.1, Internet Explorer 9,
  2. A message will be displayed that contains a SIP address to call. Type that into your softphone or set up a SIPSorcery dialplan rule to call it,
  3. If the tool receives a call on the SIP address it will display information about how it received and sent RTP packets.

The tool is very rudimentary at this point but if it proves useful I will be likely to expend more effort to polish and enhance it. If you do have any feedback or feature requests please do send me an email at aaron@sipsorcery.com.

SIP uses a cryptographic algorithm called MD5 for authentication however MD5 was invented in 1991 and since that time a number of flaws have been exposed in it. The US Computer Emergency Readiness Team (US-CERT) issued a vulnerability notice in 2008 that included the quote below.

Do not use the MD5 algorithm
Software developers, Certification Authorities, website owners, and users should avoid using the MD5 algorithm in any capacity. As previous research has demonstrated, it should be considered cryptographically broken and unsuitable for further use.

Does that mean SIP’s authentication mechanism is vulnerable? While not necessarily so, at least in relation to the MD5 flaws, the real answer is it depends on how much your password is worth to an attacker? For example if your SIP password only uses alphabetic characters and is 7 characters or less in length it can be brute forced for less than $1!

Read the full article here.

Due to popular request, mainly from Voxalot refugees, a new web callback feature is now available for SIPSorcery Premium and Professional users. The feature is available on the AJAX portal. Unlike the original call manager approach (outline at the bottom of this page) which initiated a Ruby dial plan execution and did not require authentication the new mechanism DOES require authentication and sets up a call between two pre-configured dial strings rather than executing an existing dial plan.

The new mechanism is simpler to use but is not as powerful and flexible as the original approach. Hopefully the new mechanism is closer to what Voxalot refugees are used to and will allow any saved Voxalot callbacks to be used.

There is help available but the mechanism should be fairly intuitive to use.  The way it works is that you enter in two dial strings (dial strings are the same format as those that can be used in sys.Dial in Ruby dial plans and can include multiple call legs and other options) and a description. After that it’s just a matter of clicking on “place call” and the SIPSorcery server will attempt to call the first leg and if it gets an answer will then call the second leg and finally bridge the calls together with a SIP re-INVITE.

Enjoy!

A new version of the Simple Wizard is now available. The SIPSorcery Silverlight portal with the new changes is version 4.1.1.554 (the version is displayed in the top left hand corner when the portal is first loaded).

The changes are:

  • Rules can now be disabled,
  • Incoming rules can now be applied for Any call, for calls to a specific SIP account, for calls from a specific SIP provider or by a regular expression match on the called SIP URI,
  • A new Highrise Lookup command is available for incoming calls. This command allows a caller to be looked up in a 37signals Highrise instance (Highrise is a contact management application).

Rule Disabling

The disabling of rules is self-explanatory. Disabling a rule will prevent it from being used when processing a call. Re-enabling the rule will cause it to be used again.

Incoming Rule Matching

The incoming rule matching is now more powerful and flexible. The Any option will match all incoming calls (although caller ID and time matching are subsequently applied and could result in the rule not matching a call). The ToSIPAccount option requires a specific SIP account to be selected and will cause the rule to only match incoming calls to that SIP account. The Regex option allows a regular expression to be applied to the incoming call’s SIP URI (Uniform Resource Identifier, the equivalent of an email address for SIP).

The final option is ToSIPProvider and can be used to match calls that have been received from a specific SIP provider. The way this option works is that the incoming SIP URI must be in a specific format of “provider name.username@sipsorcery.com” for example “blueface.aaron@sipsorcery.com” where blueface is the provider name and aaron is the username of the SIPSorcery account. In order for the SIP URI on received calls to be in the required format it will need to be set on the Register Contact for the provider. An additional update will be coming soon which will set that format as the default on new SIP provider entries but in the meantime it will need to be set manually.

Highrise Lookups

The HighriseLookup command is a new one that will be useful to people who already manage their contacts in a Highrise instance. By setting a Highrise URL and authentication token in the Simple Wizard command parameters the SIPSorcery application server will lookup the contact in the Highrise instance and if found it will set the display name on the SIP From header of subsequent forwarded calls. The command is primarily designed to be used with a new version of the Switchboard that’s coming soon but it may also be useful for anyone with an IP Phone or softphone that has enough screen space to show the display name on incoming calls.

The Simple Wizard is the new way to create SIP Sorcery dial plans. It’s designed for people with fairly straight forward call handling requirements with one or two steps per call.

This post is an overview of how to get started with the Simple Wizard and includes a guide for the some common steps that are likely to be undertaken with it.

Step 1: Create a new Simple Wizard dial plan

The dial plan option is only available in the Silverlight portal. Once logged in select the dial plans menu option and then the Add button. A dialogue box will appear with the different dial plan options available. Select Simple Wizard, choose a name and then click Add.

Create a new Simple Wizard dial plan

Once the dial plan is created the Simple Wizard screen will appear and you are ready for step 2.

New Simple Wizard dial plan ready for rule creation

Step 2: Create speed dials for outgoing calls

A common requirement for outgoing call rules is to create some speed dials for frequently called destinations. The Simple Wizard allows any format desired for speed dials but a common way to create them is to use a * prefix, for example *100, *101 etc. For this example we will create 4 speed dials for calling family member’s mobile numbers.

  • *100 for Mum
  • *101 for Dad
  • *102 for Older Brother
  • *103 for Younger Sister

The screenshot below illustrates the creation of the first speed dial. The crucial point is to leave the Destination Match type as Exact. An exact match is as the name suggests one that matches the called number exactly without applying any substitutions, wild cards etc.

Create a new speed dial

Once all the speed dials have been entered they will be displayed in the outgoing rules grid and are immediately ready for use (remember to update the Outgoing Dial Plan on the SIP account you want to use with the dial plan).

Outgoing rules grid with speed dials

Step 3: Create outgoing call rules for international routing

Once your speed dials are set up the next thing is to create some rules for processing calls to the traditional telephone network or PSTN. For PSTN calls it’s common to use different providers for calls to different international destinations. For this example we will set up rules for 3 international prefixes.

  • Irish calls with a prefix of 011353 use provider blueface,
  • Australian calls with a prefix of 01161 use provider faktortel,
  • US calls with a prefix of 0 or a prefix that doesn’t start with 0 use provider callcentric.

The difference between setting up the international calling rules and the speed dial rules is that the Destination Match type used is now Prefix. Again as the name suggest a prefix match is only concerned about the start of the dialled number. Prefix matches can also use pattern substitution to represent commonly required number patterns.

  • X will match 0 to 9,
  • Z will match 1 to 9,
  • N will match 2 to 9.

New rule for international calls to Ireland

Once all the international rules are entered they to will appear in the outgoing rules grid and are available for immediate use.

International rules grid view

Step 4: More sophisticated outgoing call rules

The exact and prefix match rules and the default Dial command used in the above examples are just the start when it comes to creating outgoing call rules. More powerful matches can be created using the Regex destination match type, it allows full regular expressions to be utilised.

The DialAdvanced command also allows multiple stage forwards and different options to be set on the forward(s) used to process the outgoing call. The DialAdvanced command can use the same powerful dial string options that are used in the Ruby dial plans.

Step 5: Incoming rules

After the outgoing rules are successfully configured the next step is to take a look at the incoming rules. It’s not actually necessary to use a dial plan for incoming call processing with SIP Sorcery. By default all incoming calls will be forwarded to all registered bindings on the main SIP account. However if different behaviour is required such as forwarding an incoming call on one provider to a different provider or to multiple SIP devices then an incoming dial plan is required.

For this example we’ll set up two incoming dial plan rules.

  • Incoming calls from provider1 should be forwarded to mobile number 0447623322 at someprovider,
  • Incoming calls from provider3 should be forwarded to SIP accounts mine1 and mine2.

Currently an extra step is required to be able to distinguish calls by provider.

  1. Create a new incoming only SIP account for each provider that needs to be distinguished in the dial plan,
  2. For each SIP account created in step 1 set the Incoming Dial Plan to the Simple Wizard dial plan being created,
  3. On the provider’s Register Contact field use the SIP account set up for it in step 1.

Once the providers are correctly configured then to distinguish them in a Simple Wizard dial plan is as simple as selecting the corresponding SIP account in the drop down menu.

Incoming rule for calls to provider1

Once the rules have been created they will be displayed in the incoming rules grid and are available for immediate use.

Incoming rules grid

Step 6: Forward unanswered incoming calls to voicemail

The final step is to forward any unanswered incoming calls to voicemail. To achieve this it is as easy as creating a new incoming call rule that applies to Any SIP account rather than to a specific one as the rules in the last step did.

Since the SIP Sorcery service does not provide a voicemail service anyone wanting to use one will need to create an account with a 3rd party SIP provider. The instructions on how to set up a free voicemail account with Callcentric can be found in a SIP Sorcery Recipe.

Once you have a voicemail account set up the incoming call rule will be something like the one below.

Incoming voicemail rule

As with outgoing rules there are many more things that can be done with incoming call rules such as setting a time window that they should apply and filtering based on the caller ID.

« Older entries § Newer entries »