Part IV of this series introduces the final set of extensions to the basic unit testing application. These extensions are:

Fixture setup and tear down attributes

Processing time measurements

Test repetition

Memory utilization measurements

I've worked on a lot of applications that interface with hardware and other applications that require optimizing analysis algorithms such as network tracing and real time image processing. I think this has given me a different perspective with regards to unit testing that you won't find in the mainstream discussions. There's certainly an argument to be made as to whether functionality like this should even be part of a unit test attribute suite, instead implemented as assertions in the test code. My argument for including this functionality as part of unit test attributes is the following:

This means that what's really being measured includes not only the function under test, but the wrapper that calls that function. Obviously, this has unintentional side-effects, especially when the unit test itself executes memory and/or time consuming code not part of the actual code under test. The best way to handle this situation is to implement a two stage test sequence--the first stage does the setup and the second stage invokes the method(s) under test.

Code that activates the JIT compiler (for example, with generics, from what I've read, the JIT compiler will replace the generic IL with the specific type and compile to native code the first time the generic time is constructed) results in a first time performance hit. This is true for all Microsoft intermediate language (MSIL) code--it gets translated to native processor instructions by the JIT compiler the first time the code is loaded. You can see this performance difference using MUTE--the first time you run the tests, the performance is notably slower than the subsequent runs.

Obviously, issues such as other worker threads, other applications and services, network performance, server performance, etc., all affect execution time. Most of the execution time issues are addressed by the ability to specify a test repetition count. The test runner throws out the best and worst time samples and reports the average time.

Environments that implement garbage collection (GC) make it nearly impossible to accurately track the memory allocated by a function within a thread. Calling the GC.Collect() method or other functions also does not guarantee a correct value because the garbage collection runs on an internal CLR thread. You can see this happening in MUTE. If you change the code to call GC.Collect() before the delegate call:

As the size of a collection grows, space to maintain the list's elements is increased (it's capacity, in other words). When the Clear() function is used, the objects contained by the list are de-referenced (let's assume for the sake of argument that nothing else is referencing the objects in the list) and the GC can reclaim them. However, the internal buffers used by the collection are not reclaimed. In the case of the ArrayList collection, you can manually reclaim the buffer space using the Capacity property, and setting it to zero which sets the internal buffers to the default array size, which is 16. The Hashtable collection (and any collection implementing IDictionary) does not have a corresponding function.

the above code allocates about 10MB of memory. When vendor.Clear is called, there still remains about 2.7MB of allocated memory! The Vendor class maintains both an ArrayList and a Hashtable (sort of like the way a SortedList works). When the ArrayList Capacity property is reset: parts.Capacity=0, the allocated memory is further reduced to 2.2MB. Unfortunately, there is no way to reclaim the buffers used by the Hashtable.

Personally, I think this points to a problem with the way collections are implemented in the .NET framework. It should be possible to reclaim the buffers. Let's say that the next list of parts that the vendor object manages contains 10 parts (perhaps because the part list has been filtered). If the first list contained 100,000 parts, there's 2MB being wasted on maintaining a collection of 10 parts. Now, you all say "woohoo" because you've got 1G of RAM on your system. Well, I come from the days when memory was expensive, both in physical dollars and in usage. One of the reasons we have so much bloat in our applications is because of sloppy implementations like Hashtable. Time to write my own collection classes, I say.

In both cases, the above example assumes that only one instance of the attribute is associated with any given class or method. This is obviously the case for attributes that don't have any parameters. If your attribute takes parameters, then you may want to set AllowMultiple to true, as it the case with the Requires attribute. This attribute also demonstrates managing attribute parameters:

Define what the attribute does in the UTCore assembly, TestUnitAttribute.cs file. All attributes are derived from the TestUnitAttribute class. (Yes, this should be refactored, splitting the implementation into a base class and some interfaces, I think). For example, the "CodeProject" class attribute would be created as:

The SelfRegister method provides the attribute with the means to set the state in one or more of:

the test fixture

the class

the method

Obviously, if the attribute is associated with a class, then the method item is not valid. The following two rules apply (and also indicate where some refactoring would make things a bit easier to use):

Since there's a one-to-one correlation between a test fixture and a class, I usually put class attribute options in the TestFixture object

Since there's a many-to-one correlation between method attributes and the method, I put method attribute options in the MethodItem object.

This is a cheap and dirty way of handling new attributes, and should really be refactored so that there's more of a messaging mechanism used. The class and method attributes could then be independently managed, and the messaging could be used to provide custom extensions without changing the core fixture and method classes. Any takers?

The TestUnitAttribute class already has the attribute object initialized before the framework calls SelfRegister. To access the attribute, cast it to the appropriate UnitTest assembly attribute and extract the desired information. For example:

This step is only necessary if you want to change the way in which the tests inside the fixture are run. For example, in the previous article, I discussed running tests in order as part of a test process. Typically though, tests are run in an unpredictable order, although consistent. A runner extension might truly randomize the test order. Other extensions might support multithreaded testing, in which several test fixtures are run simultaneously in order to test semaphores, mutexes, etc. Anyways...

In TestFixture.cs, there is a call to the test runner factory, which creates the appropriate test runner depending on the fixture (class) attributes:

Modify the CreateTestFixtureRunner factory if necessary. The current implementation supports running a process (a sequence of tests) and running tests independently of each other. This is a bare-bones implementation:

The TestFixtureRunner class implements the RunTest method, which should always be used to run the the actual unit test. It requires an instance to the class containing the unit test, constructed by calling:

object instance=tf.SetUpClass();

and the TestAttribute of the method under test. This is the [Test] attribute associated with the method, regardless of any other attributes that may also be associated with the method.

Iterating through all the tests in the test fixture is straightforward, and at minimum:

Currently, these simply instantiate the class and invoke the fixture set up and tear down methods, if defined. Again, this code should be refactored to use a messaging or event mechanism to allow for easy extension of the fixture attributes.

Additional functionality specified by method attributes are either handled in the MethodItem.cs file or as part of a new test runner. If you're extending the method invocation directly, this would be done in the Invoke method. Note however that attributes that test for a certain condition, such as memory usage, processing time, handles used, etc., are actually implemented as part of the RunTest method found in the TestFixtureRunner.cs file. Tests should set the method's TestAttribute state and result message so that the GUI can properly display the results:

The TestSetUp and TestTearDown attributes specify functions that are run before and after each test in the fixture. Conversely, the FixtureSetUp and FixtureTearDown attributes specify functions that are run before and after all tests have run in the fixture.

An application that I've developed for one of my clients involves interfacing to different hardware modules using TCP/IP. There are usually 30 to 60 of these modules sitting on the network, each configured to do different things--handle bill acceptors, unlock turnstiles and doors, report alarms, provide punch-clock services, report on system status, etc. Instead of having all this hardware laying around at home, I have a simulator that I wrote that runs as a separate application, either locally or on a separate computer. The unit tests that verify the packet I/O between the application and the modules requires starting up the simulator and shutting it down when the tests are complete. This is easily handled in the test fixture setup and teardown functions, and saves a lot of time as compared to doing this for each test in the fixture.

Measuring the processing time of a function is not straightforward. First off, you can't use the DateTime.Now.Ticks property because it doesn't have the necessary resolution. While TimeSpan.TicksPerSecond reports an interval of 100ns, this is not the resolution of the DateTime.Now.Ticks property. A simple test illustrates this fact. Using the class:

Validating the processing time is dubious because processing time varies so much depending on the machine, what it's doing and other technologies with which the unit tests are interfacing. However, this does not mean that testing the processing time of a function is without merit when used appropriately. Several appropriate applications come to mind, such as:

Detecting a highly inefficient algorithm;

Assisting in detecting Quality of Service (QoS) problems;

Meeting some baseline performance in a controlled environment.

I have dealt with unit testing in each of these cases--a network analysis application for satellite switch rings, bit rate degradation resulting from rain fade in an Internet over satellite simulator, and real time updating of status information to a database. While the performance of an algorithm varies from machine to machine, having a minimum "operations per second" criteria is very useful, especially when tweaking some low-level code that ends up having major repercussions in the performance of an algorithm.

The MinOperationsPerSecond attribute can be applied to any unit test to validate performance. For example:

Performance testing can definitely benefit from repeat testing to average out the vagaries of measuring time in a multi-tasking operating system. The Repeat attribute informs the test runner that a test should be repeated by the specified count, optionally with a delay between each repetition. For example:

it's pretty clear that the implementation is has a severe problem (in this case, I implemented a really dumb function that walks through each element in the collection of parts until a match is found).

As I mentioned in the introduction, I do a lot of work with hardware, and there's simply no other way to test hardware than to repeat something over and over again. More times than I'd like to remember, I've had problems in my code because one out of every thousand times, there would be a hardware glitch that reported erroneous values. Other uses abound--there's nothing like physically unplugging the network cable or pulling the power plug on the server to see how your software on the client side handles the fault. Monitoring network loading is another application which requires repetition. The uses abound if one stops thinking in terms of rigid test-once analysis.

As I discussed in the introduction, memory allocation is pretty much impossible to track in a garbage collecting environment. A GC environment also creates a dilemma when monitoring memory, and a little analysis of the problem is helpful at this point so we can select the appropriate solution.

In a classical memory management scheme, where the programmer is required to free allocations, memory has only two states:

allocated

unallocated

In system that use garbage collection, the programmer doesn't need to free allocations. Memory still has two states:

referenced

unreferenced

but these states are not the same as the allocated/unallocated states. In terms of physical memory, a GC system has three states:

allocated (referenced)

allocated (unreferenced)

unallocated (unreferenced)

It is the allocated but unreferenced state that causes so much confusion when determining how much memory is "in use" at any given time. This memory is allocated but awaiting to be reclaimed by the GC. Does this memory count toward the unallocated total or does it count toward the allocated total? Depending on what the intent of monitoring memory is, the answer is different. Is a memory test supposed to check that:

after a function runs, no unexpected memory remains referenced?

a function uses a reasonable amount of memory?

a function properly cleans up so that memory is most efficiently utilized (this involves issues such as I mentioned regarding collection buffers)?

a function properly disposes of unmanaged objects?

The problem with attempting to get a true count of the allocated memory in a GC system is that the test, by its very nature, interferes with the very thing we're trying to test! Like Schroedinger's cat, neither alive nor dead until we open the box and look, allocated but unreferenced memory is in this quasi-state of being neither allocated nor unallocated. Once we call GC.GetTotalMemory(true); any unreferenced memory is (ideally) reclaimed and we have a true (again ideally) count of the available memory (so, I guess the cat is always dead after we open the box). Therefore, in an ideal world, this code:

would measure how much memory still remains referenced after the utd() delegate call. However, this doesn't tell us anything about the memory utilization while the function was running, in terms of the amount of memory that it allocated, referenced, and subsequently de-referenced. Also, the world is not ideal. Rather than reclaiming all unreferenced memory, the GC starts a separate thread. The function GetTotalMemory(true) merely waits a short interval. So:

we are not guaranteed of getting an accurate count

the timing of the unit under test is thrown off because the GC is now running in the background

Another problem is that the GC reports only on the memory that it manages. The GC is oblivious to unmanaged memory such as bitmaps, COM objects, etc. In my article on IDispose I demonstrate this using a 3MB JPG image. The GC reports zero memory utilization while the object is referenced! And worse, without properly disposing the object, physical memory will continue to be utilized until none is left and the GC finally starts reclaiming it. Bitmaps and the like are an interesting problem in themselves though. They're sort of a "quasi-managed" resource since the wrapping class implements the IDispose interface and therefore the unmanaged resources are cleaned up when the managed resource is reclaimed. This binding between managed/unmanaged resources makes the issue of resource management yet again more confusing.

It becomes clear that using the GC to test memory allocations is pointless. It is inaccurate, affects other performance measurements, and incomplete.

Remember that part of the purpose of a unit test is to guide the programmer to properly implement the functionality under test. With regards to memory utilization, the unit test needs to consider the nature of the GC and the nature of the object under test. What really needs to be determined is whether the implementation:

needs to support a manual cleanup in cases where the resources are allocated completely externally from the .NET framework, such as in a COM object;

needs to support "directed" cleanup in cases where manually cleaning up managed resources improves overall performance;

can rely entirely on the GC to eventually get around to performing cleanup.

This criteria gives us a clearer picture of what the purpose of memory testing is within the concept of a unit test.

Manual cleanup is needed for resources that are allocated completely outside of the domain of the .NET framework. This typically means COM objects or other third party programs which allocate resources and require the application to specifically free these resources. Since the GC functions are useless in tracking this kind of memory, we have to rely on system diagnostics to tell us how much memory is being used by these functions. Because these resources are completely unmanaged by the GC, there is no binding managed resource which implements IDispose, and therefore the programmer must wrap the resource in a class that either implements IDispose or provides some other mechanism to free up the resources. The unit test should include whatever code is necessary to ensure that the application interfaces with the third party functionality so that resources are reclaimed.

Directed cleanup handles cases where unmanaged resources are already wrapped by classes in the .NET framework (or by the application), thus becoming "managed". A bitmap or other GDI resource is an example of this. It is often necessary to manually direct the reclamation of the unmanaged portion of the managed resource so that memory and/or handles do not continue to be allocated without limit. Waiting for all physical memory plus all virtual memory to be consumed before the GC starts reclaiming resources results in very poor performance of not only your application, but the entire system. The unit test needs to be written in such a manner as to "document" the need for this implementation.

It is important to recognize that for directed cleanup unit tests, we do not want the GC to run. If the GC were to start reclaiming memory, then the unmanaged resources, being wrapped in a managed object, would be freed. Rather, the unit test should ensure that the directed cleanup implementation is correct.

In this case, the application is going to rely on the GC to perform all cleanup whenever it decides to start collection. The unit test does not need to measure memory or resource utilization. This "don't test" approach should only be taken when the resources are fully managed by the GC--there are no objects that interface to or wrap unmanaged resources. The only exception to this that I can think of has to do with managing large collections. In this case, directed cleanup of the collection would improve memory utilization. However, because the .NET collection classes don't provide for a complete reclamation of memory in a manual way, this is sort of pointless for now. Hopefully, when generics are implemented and we can migrate to an STL approach for containers, the .NET collection classes can be thrown away.

If you buy into the three cases (manual, directed, and automatic) that I described above, then it should be clear that the memory functions the GC provides are not appropriate, since the only thing we're really interested in is tracking unmanaged resources, whether wrapped by a managed object or not. To do this, we simply need to watch the process memory using a simple helper class:

which returns our process' physically allocated memory (we're going to ignore virtual memory allocations). The MaxKMemory is used to specify the maximum amount of memory that a function is allowed to allocate on the process heap (not the GC pool) without failing. For example:

I completely agree that the usefulness of some of these tests are dubious for most applications. In my little corner of the world however, I find them very helpful. And the real point here is that the intent of unit testing should be to provide the programmer with a suite of tools to choose from that try to automate as best as possible different testing requirements. I believe MUTE does this, and provides a good framework (albeit in need of some refactoring!) for programmers to continue extending it for their own needs.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.