How To Get Data from a Microsoft DirectShow Filter Graph

Eric Rudolph
Microsoft Windows Digital Media Division

Updated October 2003

Summary: This document describes how to retrieve data from a media stream in Microsoft DirectShow, using your own custom sample-grabber filter. It uses the GrabberSample Filter Sample from the Microsoft DirectX 8.1 SDK as a starting point; this code works equally well with later versions of DirectX. (23 printed pages)

Introduction

A common question that developers have about Microsoft® DirectShow® is: "How do I get data out of DirectShow and into my application?" Because DirectShow uses a plug-in architecture, there are several ways you can solve this problem. In order of increasing complexity, you can:

Use the Multimedia Streaming APIs. These are simple and synchronous, and they do work.

Write a simple DirectShow filter that captures the data. Process the data from within your application.

Write a DirectShow filter that does all of the processing. In this scenario, the filter encapsulates most of the complexity, and the application does very little work.

This paper describes the second approach. The third approach is the most general of the three because it enables other applications to use your filter; however, it is also the most difficult.

Note For more information about the Multimedia Streaming APIs, see Multimedia Streaming in the DirectShow documentation.

The DirectShow Basics You Need to Know

Let's start with a quick summary of DirectShow and how it works. DirectShow is a streaming architecture than enables applications to stream data through a series of connected objects, called filters. A collection of DirectShow filters is called a filter graph.

DirectShow filters fall into three broad categories: source filters, transform filters, and renderers. Source filters create data and push it to the next filter. Transform filters receive data and transmit data, sometimes on more than one thread. Renderers only receive data.

Every DirectShow filter has at least one connection point, called a pin. Filters connect to other filters at their pins. Media data moves between filters across pin connections.

Graph States

A filter graph has four possible states: stopped, paused, running, and transitional. In the transitional state, the graph is changing from one state to another, but has not yet completed the change due to the multithreaded nature of DirectShow.

For most filters, the paused and running states are identical: source filters produce new data, and transform filters accept new data for processing. The exceptions to this rule are live capture filters and renderer filters. Live capture filters only send data while running, and do not send data while paused. Renderer filters stop rendering data while paused, and do not accept any new data.

When a filter stops, it no longer processes data or accepts new data. It shuts down worker threads and releases any other resources in use.

Filters must obey a defined protocol when the filter graph changes from one state to another. For more information, see the topic Data Flow for Filter Developers in the DirectShow SDK documentation.

Multithreading

To use DirectShow, you must know something about multithreaded programming. For a simple DirectShow application, it is enough to know that data travels through the graph on threads that are separate from the application thread. But if you plan to write any filters, be prepared to work with threads, critical sections, events, and other concepts. You may be tempted to ignore these issues, but your filter is very likely to perform incorrectly or, worse, cause deadlocks in your application. After you understand the issues, writing a filter will become much easier.

Note The Multimedia Streaming APIs largely shield you from multithreading issues, which is one advantage of using those APIs.

The following are general guidelines for threading in DirectShow filters:

Source filtersMost source filters create a separate thread for each output pin on the filter. The thread enters a loop in which it fills buffers with data and delivers them to the next filter.

Transform filtersMost transform filters do not create any threads. They process data on the same thread that the upstream filter is using to deliver the data. Some transform filters create separate threads for each output pin. This is not recommended unless it is necessary. For example, a filter that splits interleaved data into separate streams will usually create separate threads, so that one stream is not blocked while waiting for the other.

Renderer filtersGenerally, renderer filters do not create threads.

When it pauses or runs, a filter creates any threads that it needs, and closes them when it stops.

Pin Connection Negotiations

When two filters connect, the pins negotiate what type of connection to establish. The exact details depend on the filters involved, but typically the pins must decide the following:

The type of data that will be delivered (such as audio or video), and the format of the data.

The size of the buffers that will be used, the number of buffers to create, and the required memory alignment.

Which of the filters will allocate the buffers.

This paper describes some of these issues. For more details, consult the DirectShow documentation.

Writing a Sample Grabber Filter

An easy way to get data out of a DirectShow filter graph is to write a custom "sample grabber" filter. Connect the filter to the stream of data you want to monitor, and then run the filter graph. As the data passes through the filter, the application can manipulate the data however you want.

Possible uses for sample grabber filters include the following:

Decoding an entire file into a memory buffer.

Getting a poster frame from a video file.

Capturing still images from a live video stream.

Decoding a video file into a Microsoft DirectDraw® buffer.

The DirectShow 8.0 SDK included a sample grabber filter, but did not provide the source code. The DirectShow 8.1 SDK includes the source code for a revised version of the sample grabber filter, as an SDK sample, under the name GrabberSample Filter Sample.

Where To Begin?

Your first choice is whether to write a transform filter or a renderer. A transform filter can connect to another filter downstream, which enables you to render the data, write it to a file, and perform other operations. However, because a transform filter requires an extra connection downstream, it can be more complex to implement correctly. A renderer filter requires just one input connection.

This paper describes how to write a transform filter, but many of the same ideas would apply to a renderer filter.

The filter presented in this paper is a "trans-in-place" filter, which means that it modifies the data directly in the buffers that it receives, rather than making a copy to a new buffer. It uses the DirectShow base class library.

To write a trans-in-place filter, perform the following steps:

Define a new class that derives from the CTransInPlaceFilter class.

Optionally, you can make your filter a real COM object that performs self-registration. To do so, you will need an IDL file or header file with the CLSID definition, a DEF file that exports your DLL functions, and a static class method to create your filter. For more information, see the topics How to Create a DLL and How to Register DirectShow Filters in the DirectShow SDK documentation.

Override two pure virtual methods in CTransInPlaceFilter: the Transform method and the CheckInputType method.

The CTransInPlaceFilter class handles a lot of other tasks automatically, such as: negotiating pin connections and buffers, reconnecting pins when necessary, moving data from the input pin to the ouput pin, and supporting multiple threads. Reading the C++ code for the base classes is a good way to learn more about DirectShow filters. If you want to do something more complex, you may need to override additional methods of CTransInPlaceFilter.

Override the CheckInputType Method

The CheckInputType method in your filter determines which media types to accept and which to reject. During the pin connection process, the upstream pin will propose a variety of media types. Your filter can accept or reject any media type. When DirectShow builds a filter graph, it automatically tries to find filters listed in the registry, to make the connection work. For example, if your filter accepts only uncompressed video, and the application tries to connect it to an AVI file source, DirectShow will insert the appropriate video decompressor.

Format yypes

If your filter accepts only MEDIATYPE_Video with subtype MEDIASUBTYPE_RGB24, it will not necessarily connect with a format type of FORMAT_VideoInfo. Several other video format types exist, including Format_VideoInfo2 and FORMAT_DvInfo. You must decide which formats your filter will handle, and accept or reject the various format types accordingly.

Format blocks and inverted DIBs

For uncompressed video types, the upstream filter might deliver inverted device-independent bitmaps (DIBs). It will specify this at connection time in the biHeight member of the BITMAPINFOHEADER structure of the format block. Therefore, if your filter requires a particular DIB orientation (inverted or non-inverted), be sure to check the biHeight member and reject any type that your filter does not handle.

Many decompressors can decode using either orientation, and will propose both types. If you accept the media type without checking the orientation, the pins will connect using whichever orientation the decompressor proposes first.

Setting the media type from the application

In a sample grabber filter, it makes sense for the application to control which media types the filter will accept. Using this approach, the application performs the following steps:

The application calls a custom method on the filter to specify the type of data that you want. This could be an exact format, or a general description that allows a range of possible formats (for example, 24-bit RGB video of any size).

The application connects the sample grabber to the other filters in the graph. During pin negotiation, the CheckInput method attempts to match the proposed media types with the type that was specified by the application in step 1.

The application calls another custom method to retrieve the actual media type used for the connection.

For example, in step 1 the application might specify 24-bit RGB. In step 2, the pins will connect using a specific video size, such as 320 × 240 pixels. In step 3, the application retrieves the media type to determine the video size. Without this information, the application cannot interpret the data it receives.

You must define a custom COM interface on your filter that contains these two methods. The DirectShow Sample Grabber filter uses the ISampleGrabber interface; you can use this as a guide when you create your own filter.

Override the transform method

One of the parameters to the CTransInPlaceFilter constructor method is a flag that specifies whether your filter modifies the data it receives. If you pass the value false, you are obliged not to change the data in any way. Otherwise, you are free to modify the data inside the Transform method.

The Transform method receives a pointer to the IMediaSample interface of a media sample. This method is called by the CTransInPlaceFilter::Receive method. After the Transform method returns, the Receive method calls CBaseOutputPin::Deliver on the output pin to deliver the sample.

If the Transform method returns S_FALSE, the base class signals a quality-control change. However, in this case, the Receive method returns S_OK (not S_FALSE) and the upstream filter keeps delivering. If the Transform method returns an error code, the base class signals a streaming error to the filter graph, and the filter graph stops. You should not return an error code unless there is a genuine streaming error. If you simply want to halt the stream, override the Receive method and return S_FALSE from Receive.

Working with Multithreading

Your application will always run on a separate thread from the one that delivers data to your filter. If you want to retrieve data synchronously in your application, you must take this multithreading into account. The following are suggestions for handling some common scenarios:

Decode an entire file

If you want to decode an entire compressed file and get each block of uncompressed data in order, you probably do not need to worry much about threading. Create a global buffer in your application, and write the Transform method so that it writes into that buffer. As an alternative, have Transform call a callback method whenever it receives a sample. Write to your global buffer in the callback method. In your application, set up the callback, run the graph until it stops, and you are done.

Decode a section of a file

This scenario is similar to decoding an entire file, but the application needs to set the start and stop positions using the IMediaSeeking::SetPositions method. An alternative is to return S_FALSE from the Receive method, to signal the source filter to stop delivering data.

Decode random sections of a file

If you want to decode a portion of a file and then seek to another location and decode again, the process becomes more complicated. When you seek the filter graph or change from one graph state to another, the application must wait for the graph state to become stable.

When you seek the graph (using IMediaSeeking or IMediaPosition), the call starts from the renderer filter and travels synchronously upstream until it reaches the source filter. The source filter asynchronously stops pushing data, sends a flush downstream, seeks to the new position, and starts sending data again.

To get a single frame of data, override the Receive method to return S_FALSE. In your application, pause the graph and seek to the desired time. The source will respond by seeking, and then it will send one sample downstream.

If you want your application to process samples synchronously, instead of asynchronously, use events. Set the event in the Transform method, and wait for it in your application. For example, you might use a loop like the following:

while (not done)
Seek the filter graph.
Wait for the event to be signaled.

This example assumes that you process the data completely inside the Transform method or inside your callback method. If you want to process the data inside of the application loop, you will need a second event.

Without the second event, the Transform method would return immediately, because it runs on a different thread. Then other filters could write new data into the sample while your application was still processing the old data.

Note Another option is to call AddRef on the sample inside the Transform method, and then call Release on the sample from your application. By keeping a reference count on the sample, you prevent it from returning to the "free" list. However, this does not prevent downstream samples from modifying the sample. For more information about reference counting and the IMediaSample interface, see Samples and Allocators in the SDK documentation.

Sample Application Code

The following code is a console application that uses the sample grabber filter:

Speeding Up the Connection Time

If you set up the sample grabber to accept audio types and then connect a file source to the input pin, the connection process will work (assuming the file has an audio stream), but it takes a long time. This is because the DirectShow Intelligent Connect process cannot guess what media types the filter accepts, so it resorts to trying them all. It loads all the video and audio decoders on your system, and tries to place each one in the graph between the file source and your filter. It tries the video decoders first, so it takes awhile to reach the audio decoders.

You can dramatically reduce this problem by indicating the preferred media type of the filter inside the CBasePin::GetMediaType method. This will give the DirectShow connection logic a hint about which codecs to try.

Override the Filter Constructor Method

The CTransInPlaceFilter class automatically creates the input and output pins of the filter, using the CTransInPlaceInputPin and CTransInPlaceOutputPin classes, respectively. In order to override the GetMediaType method, you must modify the filter. First, define a new class named CSampleGrabberInPin that derives from CTransInPlaceInputPin. Then, in the CSampleGrabber constructor method, create a new instance of CSampleGrabberInPin and assign it to the m_pInput member variable of the filter.

Override the EnumMediaType Method

When the upstream filter connects to the sample grabber, it calls IPin::EnumMediaTypes on the input pin of the sample grabber. At this point, usually the output pin is still disconnected. If so, the CTransInPlaceInputPin class (which overrides EnumMediaTypes from the CBasePin class) returns the error code VFW_E_NOT_CONNECTED. As a result, GetMediaType is never called. To get around this, override EnumMediaTypes. If the output pin is disconnected, create an enumerator object, as is done in the CBasePin method. Otherwise, call the CTransInPlaceInputPin version of the method.

Override the GetMediaType Method

In the GetMediaType method, fill in just the major type of the media type parameter. If you fill in anything more, it will crash some third-party codecs.

Forcing the Filter To Deliver to Your Buffer

In some cases, you can force the sample grabber to deliver samples to a buffer chosen by the application. To understand how this works, you must understand the allocator mechanism used in DirectShow, and be somewhat familiar with how Intelligent Connect works.

The following is a summary of what you must do:

Define a new class named CSampleGrabberAllocator that derives from the CMemAllocator class.

Override the GetAllocatorRequirements, Alloc, and ReallyFree methods, forcing them to provide a memory allocator that points to the memory buffer of your application.

Override the NotifyAllocator method on your input pin, refusing any allocators other than your custom allocator.

Override the GetAllocator method to return your custom allocator.

Provide a protected method in your filter to determine whether the application has specified read-only mode.

Provide a public method in your filter for the application to use for specifying the delivery buffer.

Allocators

When two pins connect, they must agree on a memory buffer transport, in which to pass samples downstream; this is called an allocator. Each pair of connected pins uses one allocator. When a transform filter copies a sample from its input pin to its output pin, it is copying between two different allocators. When a filter performs an in-place transform, on the other hand, it is using the same allocator on both pins. It is possible—though unlikely—for a sample to travel from the source filter down to the renderer with no memory copies, if every pin connection along the way uses the same allocator.

An allocator has the following properties:

PrefixThe number of spare bytes that must be allocated in front of the buffer.

AlignmentWhat modulus the buffer must be aligned upon.

Buffer countThe number of separate buffers that the allocator will create. This enables the upstream pin to deliver to multiple memory buffers on its thread, and enables downstream pins to hold on to buffers without blocking the input pin. In the custom allocator presented here, the buffer count must be one.

SizeThe maximum size of each buffer.

In a trans-in-place filter, the output pin is almost guaranteed to use the same allocator as the input pin, so it is not necessary to provide any additional code for the output pin.

Handling Format Changes

In DirectShow, the format of a stream can change while the graph is running, without any pin reconnections. An upstream filter might request a format change because the source media has switched formats, or a downstream filter might request a format change for greater efficiency. The Video Renderer filter, for example, always connects with an RGB type, compatible with GDI. When streaming begins, it tries to switch to a YUV type for DirectDraw.

To request a format change, a filter does the following:

Calls IPinConnection::DynamicQueryAccept or IPin::QueryAccept on its upstream or downstream neighbor, specifying the new media type.

If the other pin returns S_OK, the filter attaches the new media type to the next sample by calling IMediaSample::SetMediaType.

In the CTransInPlaceFilter class, the call to QueryAccept eventually results in a call to CheckInputType. (This happens even if the request comes from the downstream filter; refer to the source code for details.) For the sample grabber, this implementation can cause unexpected behavior. Suppose that you configure the sample grabber to accept a broad range of media types, for example, any video type. If the downstream filter requests a format change, such as from an RGB type to a YUV type, the sample grabber will accept the new type, and the next sample that you receive will have a format you did not expect.

Possible ways to handle format changes include:

The filter rejects the new format.

The filter checks for new formats inside the Transform method and informs the application that the format has changed.

The application checks for new formats inside the callback method.

To check for a format change, call IMediaSample::GetMediaType on each sample. Normally the media type is NULL. The first sample after a format change has the new media type; subsequent samples have NULL types again.

Performance considerations

Rejecting a format change can affect performance. For example, one developer found that the sample grabber was degrading performance even though his callback function did nothing. The problem was that he configured the Sample Grabber to accept only RGB types, which prevented the Video Renderer from switching to a YUV type. When he removed the Sample Grabber from the graph, the Video Renderer connected directly to a decoder, which did accept the YUV type. Rendering YUV types is faster on many video cards, so he saw a performance boost without the Sample Grabber.

On the other hand, if your application does not render the samples that it receives, these considerations might not apply.

Limitations of the DirectShow Sample Grabber

The Sample Grabber that ships with DirectShow has several limitations. If you understand them, you can modify the GrabberSample Filter source code to suit the needs of your application.

One-shot mode

In one-shot mode, the Sample Grabber returns S_FALSE when it receives a sample, as described earlier in this paper. The S_FALSE return value informs the upstream filter to stop sending data. This mechanism has some drawbacks:

Although the upstream filter stops sending data, it does not send an EC_COMPLETE event to the Filter Graph Manager. Therefore, the application is not notified when the filter receives the sample. (If the application set a callback, however, it can wait for that to be called.)

If the upstream filter uses a worker thread to deliver samples to an output queue, it will continue to do so.

A better approach is for the application to return S_FALSE in the callback method, and for the filter to use that value as the return value in the Receive method. Designing the filter that way would remove the need for a separate one-shot mode.

Video formats

For video types, the Sample Grabber requires a VIDEOINFOHEADER format. It cannot connect to filters that require other format types, such as VIDEOINFOHEADER2 or DVINFO. Therefore, it is not compatible with MPEG-2 or DV video, or with field-based (interlaced) video.

Buffered mode

The buffered mode of the Sample Grabber is not particularly useful. If the application needs to copy samples to a buffer, it can do so in the callback.

Format changes

The application can specify a partial media type for the Sample Grabber, or a complete media type, but it cannot specify two different media types, such as "MEDIASUBTYPE_RGB24 or MEDIASUBTYPE_UYVY." This implementation limits how the Sample Grabber can respond to format changes.