To be fair to myself I was running XP at the time on a Desktop PC with a usb webcam.

The program worked fine under these circumstances. It did what it said on the tin.

Then Came Vista, Windows 7 and finally Windows 8.

This muddied the waters somewhat as VFW was deprecated and performance was patchy
depending on your machine setup especially on Laptops.

Microsoft did their best to keep up with changing Technology they introduced DirectShow,
with GraphBuilders and FilterGraphs etc. It was possible to make a similar webcam program
with this technology that I had made with VFW but it was much harder to understand and get to grips with.

Then this technology was deprecated by Microsoft in favour of a new API called Windows Media Foundation (WMF).
This technology was much harder to understand than DirectShow it dealt with streams in a very different way than DirectShow and talked about topologies and sinkwriters. There was a book produced, I must say a very hard to understand book called "Developing Microsoft Media Foundation Applications" by Anton Pollinger which eased the pain of understanding the new API a little.

There was an easy to understand and implement side of WMF which was called MFPlay in which again I could write a similar program which did the same things I had implemented under VFW.

Then Microsoft deprecated MFPlay.... yes even though it was a brand new api.

So I have had a problem watching all this and considering all the options and what best course of action to take in developing a new Webcam Tutorial I have decided the best course of action is to take the newest still not deprecated route which is DirectX 3D rendered from WMF technologies, this currently seems to be Microsoft's preferred option and it does have a lot to give it merit as you will see you can do a lot more with WMF and DirectX 3D than you ever could with VFW.

However, this means this tutorial will be hard to understand, this once simple subject is now made very complicated.

This sets up some defines for the resource.rc file the menu, the dialog box and finally the mystery image.

Before I go any further it would be wise to explain a little about WMF.

WMF uses COM extensively but it is not a pure COM API instead MF is a mix of COM and normal objects but because it does use COM you must at the beginning of your program initialize COM by calling CoInitializeEx() you must also initialize WMF by calling MFStartup() this means on exiting your application you must shutdown these objects.

Since the headers and libs I will be using are not available outwith the MS compilers this build will not as usual from my tutorials run on any MinGW compilers I am truly sorry for this but there was no way to make it so.

I will be however aiming for this to run on any Express versions of the MS compiler with the Windows 7 SDK and .net framework 4.0. installed. Since we are using MS compilers only I have decided to make this build UNICODE.

The DirectX 3D version that I will use will be 9. This is for backwards compatibility. Those who wish to reprogram it for DirectX 3D version 11 can do so and it is perfectly possible to translate this code to do so.

Most of the common headers I have included in a file called imaginatively 'Common.h'. And here it is.

I define UNICODE then activate the v6 common controls, this enables visual styles available in ComCtl32.dll version 6 or later.

I introduce pragma comments to load all our libs.

I set up a structure which holds an array of IMFActivate objects this is for choosing a webcam.

I set up some functions, windows message handlers and global variables.

I introduce a WinMain function this is a departure from my usual window structure.

I set up a Windows callback procedure WindowProcedure.

I initialize the application.
I introduce a cleanup function.

I then introduce a window initializing function and several windows message handling functions.

In our OnCommand function which handles selections from the Menu.
This has two options a device chooser and a routine which sets a 'saveframe' variable which activates a routine which grabs a still image from the video stream and saves it to disk to a file called 'Capture.jpg'.

The rest is made up of mainly Dialog functions associated with choosing the webcam.

The constructor and destructor creates and destroys a critical section respectively what this means is
it waits until it is granted ownership of the critical section in this case the video stream from the webcam.

Initialize simply initializes the drawing surface through the DrawDevice class.
CloseDevice releases all the resources used by the preview player.

The IUnknown methods you dont have to worry about apart from you should know they cannot be changed.

The IMFSourceReaderCallback methods is where it all happens OnReadSample gets a frame for the video buffer and passes it to the DrawDevice class for drawing and on certain conditions saving to a file.
TryMediaType works out which media type the camera hardware supports this can be one of many formats YUV uncompressed RGB etc.

This function once it has negotiated a format sets the video type in the DrawDevice class.

SetDevice basically creates a source reader from a media source i.e. in this case a webcam it does this through the function MFCreateSourceReaderFromMediaSource once it has this it tries to negotiate a suitable output type
in this case probably RGB-32.
ResizeVideo simply calls ResetDevice from the DrawDevice class.
CheckDeviceLost checks to see if the webcam has been unplugged if it has it sets pbDeviceLost to TRUE.

At this point it would be good to introduce my image file text.png this could literally be anything to overlay on the video device in fact it's a transparent image of course with the word Snoopy on it. But it could be a handle bar moustache a cowboy hat or anything that you can imagine.

Yes later on I will be using this image to overlay on the webcam video... I know exciting and something that you couldn't do with VFW.

MF is capable of much more however you can create audio/video streams that can be encoded to .wmv, .mp4 etc. It is capable of streaming audio and/or video files across the internet too.

However I will just be grabbing a single frame and saving it to disk as there is more information on how to create .mp4 files out there than grabbing a single frame... I know but its the way it is.

First we set the number of back buffers a blue screen and a surface for the webcam video information which is 2.

Then we have a set of image transforms for TransformImage_RGB to TransformImage_NV12.

This translates the color information from one format to another.

The first really interesting bit after that is GetFormat this sets the pSubType GUID.
IsFormatSupported returns true or false depending on whether the format is supported.

CreateDevice creates the DirectX 3D 9 device which we will use to draw stuff.
SetConversionFunction works out which Conversion function to use to translate the video.
SetVideoType uses the conversion function to work out and set the video format.
UpdateDestinationRect The destination rectangle is letterboxed to preserve the aspect ratio of the video image.
This ensures a good image even when the image is resized.

CreateSwapChains this creates the D3D swapchains a little explanation is required however on exactly what a swapchain is and why its so important. A swap chain is a collection of buffers that are used for displaying frames to the user. Each time an application presents a new frame for display, the first buffer in the swap chain takes the place of the displayed buffer. This process is called swapping or flipping.

A graphics adapter holds a pointer to a surface that represents the image being displayed on the monitor, called a front buffer. As the monitor is refreshed, the graphics card sends the contents of the front buffer to the monitor to be displayed. However, this leads to a problem when rendering real-time graphics. The heart of the problem is that monitor refresh rates are very slow in comparison to the rest of the computer. Common refresh rates range from 60 Hz (60 times per second) to 100 Hz. If your application is updating the front buffer while the monitor is in the middle of a refresh, the image that is displayed will be cut in half with the upper half of the display containing the old image and the lower half containing the new image. This problem is referred to as tearing.

Direct3D implements two options to avoid tearing:

1. An option to only allow updates of the monitor on the vertical retrace (or vertical sync) operation. A monitor typically refreshes its image by moving a light pin horizontally, zigzagging from the top left of the monitor and ending at the bottom right. When the light pin reaches the bottom, the monitor recalibrates the light pin by moving it back to the upper left so that the process can start again. This recalibration is called a vertical sync. During a vertical sync, the monitor is not drawing anything, so any update to the front buffer will not be seen until the monitor starts to draw again. The vertical sync is relatively slow; however, not slow enough to render a complex scene while waiting. What is needed to avoid tearing and be able to render complex scenes is a process called back buffering.

2. An option to use a technique called back buffering. Back buffering is the process of drawing a scene to an off-screen surface, called a back buffer. Note that any surface other than the front buffer is called an off-screen surface because it is never directly viewed by the monitor. By using a back buffer, an application has the freedom to render a scene whenever the system is idle (that is, no windows messages are waiting) without having to consider the monitor's refresh rate. Back buffering brings in an additional complication of how and when to move the back buffer to the front buffer.

It is option 2 back buffering that we will be using to display our webcam image.

DrawFrame is next and this is where most of the fun things are done.

We set up some IDirect3DSurface9's and lock the video buffer.
We get the swapchain surface and lock it.
We convert the frame. This also copies it to the Direct3D surface.
This is a frame from the webcam video stream.
We color the back buffer 'Blue'.

We set up another D3D9 surface this will hold our text.png.
We set up a D3D9 texture this is for drawing our textSprite.
We use or D3D9 device to begin a new scene.
We create a sprite, create a blank texture then load from a resource our text.png into our blank texture.
We call begin on textSprite, set the x,y,z position of our sprite.
We draw the textSprite onto our texture.
We then call end on our textSprite.
We set the render target to pSurf and set up a POINT object called p.
We call endScene on our device.

We then update the surface.

If saveframe is true we save the file using D3DXSaveSurfaceToFile.
We then Present the Frame.

The rest of the functions are pretty much self explanatory up until the Image Transforms
these functions should not be altered they cannot be guessed at and have to be learnt it is the type of function you will use over an over in your WMF apps.

The libs are linked in by pragma comments so I will not go over the libs req'd to run this program as it is unnecessary.

What the program does allows you to connect to a webcam through a dialog displays that webcam image in a window overlays a graphic over it and allows you to save the composite image to a file called Capture.jpg.

The code base comes from Anton Pollinger's book

Developing Microsoft
Media Foundation
Applications

however that code and book does not show you how to save an image to disk or overlay a graphic it does however show you how to display a webcam image in a window.
Also the code on msdn and in Pollinger's book is liberally sprinkled with goto's which I have removed and redesigned the
class structures somewhat to accomodate for this.

The saving of the image and graphic overlay comes from my own understanding and I believe the Tutorial to be unique in the world of C++ Tutorials.

Just my two cents worth. Its snoopy11's choice and his tutorial... But I wouldn't.

One aspect of a tutorial is to make the reader DO - as part of learning. I've got more than a couple tutorials on here as well as my blog on Xamarin app development and something I state everywhere is that making the reader work through the tutorial, making them have to read the code and retype it, making them have to find the menu choices I'm showing... Is all part of the learning. I personally find that giving someone a big ready-to-go solution just means they don't learn a thing. On my site I go so far as to provide screen captures of the code so the reader can't copy/paste from the browser.

I can't speak for snoopy11, but I think most of us write these to help the next generation learn - not to help them copy/paste their way into working program they can try to monetize with little effort of their own.

Exactly I will not be zipping any of my projects up apart from required graphic files

It is essential that you read the tutorial and work your way through it even though this one is pretty heavy going !

It wouldn't have helped the poster anyway as you still have to uninstall the 2010 runtime and reinstall the 2010 runtime service pack 1.

I recently rebuilt this on my new win10 machine and can report that if you follow the instructions it still works perfectly fine. You also have to download the 2010 direct x SDK as it does use some legacy headers and libs. This was a deliberate decision to allow for backwards compatibility so it would run on as many OS versions as possible.