I am looking for different alternative ways of developing regression scripts. In our case, we have an application that deals with a lot of data processing and provides some results to the user. results are in the form of reports and charts. the approach we follow for developing a regression script is that after a certain set of user actions when system generates reports, we save the htmls of those reports and compare those htmls with the baseline htmls generated on a version that went through manual testing. we compare baseline and targetline htmls in winmerge and do validation of differences. the advantage of this approach over assertion method is that we hardly miss any data validation in the report. but the problem is with time lost in validating html. there are many code changes which are valid but they are seen as differences. Can anyone suggest alternative ways of regression development ? comparing bitmap is also not a good way because it doesnot cover entire scrollable data on the page.
Also, we are going to develop a script which will do a coverage of application in different browsers, capture reports and compare them in later versions of software in respective browsers. what approach will you suggest for creating the baseline for this script ?

2 Answers
2

There is a big difference between data validation and testing that your web page displaying the data is functioning correctly. The data validation can be done even at a database level assuming that your UI has been tested thoroughly enough. I found a good article on data validation here: http://msdn.microsoft.com/en-us/library/gg261774.aspx that outlines a number of approaches.

Are the charts and graphs custom web pages that your developers have created, or are they using existing chart/graph plug-ins? If they are using existing plug-ins then you shouldn't need to do too much testing if you assume that the creator has already tested their product. If they are custom, then you can do very targeted testing to ensure that different types of data actually render the correct charts and graphs on your site.

Even with some automated testing to ensure that the HTML that is generated is what you are expecting and the data is correct, there can still be visual differences between browsers that can really only be observed through some manual comparison. One approach that I have seen used in the past is a script that will take screenshots of pages in different browsers and then someone can go through those screenshots and compare them. Some tools will attempt to automatically compare images, although it is difficult to get the thresholds correct there and often involves some manual intervention anyways. For tools that will take those screenshots, pretty much any web ui automation framework like selenium can navigate you to the page, then in .net you can create a function to take a screenshot of the browser window. If you have pages with scrollbars, you can even get fancy and try to scroll the window and stitch the images together, but depending on your web site's design you may not need that.

There are many approaches you can use here. If you have the ability to transform the generated report to any format that works as a (field, value) pair (such as CSV, XML, etc) you can use that as the basis for your comparison.

Another approach is to use automation to capture each HTML field containing data, and validate that the data matches your baseline. This would require some date-checking if you're outputting date information - I'll often use a regex to first validate the date, then strip it so that I don't get invalid differences reported.

The key thing you need to look for, in my opinion, is the ability to compare each data field and report on each one.

For instance, our CSV baseline comparisons will report that differences were found, and for each file that differs from baseline output a formatted list something like this:

Result record 5 differs from baseline record 5 in field Foo. Baseline
20, Result 999.

Result record 5 differs from baseline record 5 in
field Bar. Baseline -1, Result !@#$???

And so forth. If there are no differences, you don't even need to do the manual check - and if you're familiar enough with the data, you can isolate the issue with just this information. If not, that's what WinMerge is for - as well as for baseline updates where needed.

This kind of result reporting, particularly if you don't include expected differences like date of report, can help you zero in on any problems very quickly. This also translates very well across multiple formats since you'd code your baseline check to look for the (field, value) pairs that you were interested in, regardless of which format they were in.