sipsorcery's blog

Occassional posts about VoIP, SIP, WebRTC and Bitcoin. response times SIP Sorcery Last 3 Hours
daily weekly status

Building a video capable softphone with Windows Media Foundation

This is the first post in what will hopefully be a successful series of posts detailing how I manage to build a video capable softphone using Windows Media Foundation.

I’d never heard of Media Foundation until last week so not only do I not know how to use it I also don’t know it will be suitable for the task. I do know it is the successor to the  Windows DirectShow API but does not yet provide the same coverage so I may have to delve into the DirectShow API as well. On top of that neither API has a comprehensive managed .Net interface so that means the job needs to be done in C++. My C++ skills are severely undernourished so I’m expecting it to take a while to get up to speed before I can start really diving into the APIs.

What I do have is a basic working softphone that I can build on which means I can focus on the video side of things. My goal is to be able to place a SIP H.264 video call with my webcam to another video softphone, such as Counterpath’s Bria. Given other things going on at the moment, such as a 7 week old baby and a 3 year old, I’m estimating the project could take 2 to 3 months. As to why I’m interested in this it’s because it’s something different from both .Net and SIP. I’ve been working with both those for a long time so taking a break and playing with something different but still related is appealing.

Enough chit chat, getting started…

1. The first thing I’ve done is to install the Windows SDK for Windows 7 and take a look at the Media Foundation sample projects. The first sample I tried was the SimpleCapture project and it ran fine out of the box.

2. After looking through a few more of the samples I feel the need to get coding. Being able to get a video stream from my webcam is the obvious place to start. I’ve created a C++ Win32 console application and found an article which discusses enumerating the system’s video capture devices. I haven’t gotten very far as yet but I’m now wondering if my Logitech Webcam Pro 9000 driver supports H.264 meaning I wouldn’t need to use any of the Media Foundation H.264 codec capabilities? A quick look at the camera’s specification page and I’m pretty sure the answer is no.

3. I’ve now got the sample compiling and running but the count of my video devices is coming back as 0 🙁 so I’ve probably got some flags wrong somewhere.

The first couple of hours hasn’t got me very far yet. More tomorrow.