This article brings to an end a four-part series through which I present my venture into the barely explored field of automated website usability evaluation. As I have stated in my previous articles, the best way to test for usability is to do so using real human users. This form of usability testing and the associated commercial software that allows usability tests to be carried out even remotely have been researched quite extensively. Thus, my quest into researching and coming up with a solution that can automate usability evaluation commenced out of sheer curiosity as I wanted to try a different approach which, to my knowledge, had never been tried before.

Part 3 – The Solution: Based on the reasoning presented in Part 2, I developed a prototype which I called USEFul (USability Evaluation Framework) which, when given the URL of a website, it retrieves it and analyzes it in order to determine whether it is breaking any usability guidelines. This method is the same method used by usability experts when evaluating using the usability inspection method. More information about USEFul can be found in the articleUSEFul – A Framework To Automate Website Usability Evaluation (Part 1)

The Experiments using the USEFul Framework

This article assesses assess the effectiveness of the USEFul framework in its implementation of web site usability evaluation. Since web site usability professionals are scarce this assessment was conducted through a comparison between the results generated by the USEFul framework and the published evaluations of the same web sites conducted by web site usability professionals.

A number of tests were conducted. However, the main one evaluated the set of web sites featured in the book “Homepage Usability – 50 Websites Deconstructed” by Nielsen and Tahir (Nielsen & Tahir, 2002). The reason for this choice was based on the following reasons:

Nielsen is considered to be a web usability guru and has been hailed as “one of the world’s foremost experts in web usability” (Hamilton, 2000). In this evaluation, the results reported by Nielsen will be used as the results generated by a usability expert.

Nielsen evaluates the web sites featured in this book by referencing web site usability guidelines. This usability evaluation technique is the same technique used by the USEFul framework. Therefore this eliminates the possibility of any differences in the list of identified violations due to the use of different techniques.

The book illustrates clearly both usability violations and good usability traits that have been identified by Nielsen and Tahir and has received numerous positive reviews.

The set of guidelines used by Nielsen and Tahir for this evaluation is a subset of the HHS Research-Based Web Design & Usability Guidelines (Leavitt & Shneiderman, 2006). The majority of the guidelines implemented in the USEFul framework are from the HHS guidelines. Thus, the utilisation of the same set of guidelines used by human usability experts overcomes the possibility of discrepancies in the results due to the use of different sets of guidelines.

So as to ensure that the versions and contents of the web sites evaluated were identical to the ones evaluated by Nielsen and Tahir, the Internet Archive’s Way Back Machine was used. Using this tool, the exact web sites were retrieved by utilising the dates present on most of the screenshots in the book.

The Results of the USEFul Framework

Due to the large dataset and the detail with which the evaluations were carried out, a total of 8 results were drawn:

Result 1 – Usability Violations identified by USEFul vs. Implementable Guidelines: The USEFul framework was able to correctly identify 91.48% of the guideline-related usability violations when compared to Nielsen and Tahir’s manual evaluation. It is also significant to state that a code inspection demonstrated that 8.52% discrepancy was mainly due to the poor coding of the tested web sites.

Result 2 – Usability Violations identified by USEFul vs. Total Usability Violations: Only 47.7% of the usability violations identified by Nielsen and Tahir were directly related to guidelines that can be automated. This resulted in the number of violations reported by the USEFul framework to just 43.64% of the total usability violations identified by Nielsen and Tahir.

Result 3 – Total Usability Violations identified by USEFul Since the USEFul framework checks for the presence of each guideline in the database, it performs a consistent evaluation and so it was able to identify more usability violations in each of the tested web sites. In fact, it was found that if these violations were considered, the framework is able to convert the above negative result into 25.88% more violations identified.

Result 4 – Relationship between Total Usability Violations and Page Size: As can be seen in the scatter graphs below, there is a strong positive correlation when the number of usability violations identified by USEFul are plotted against the number of lines of code in the respective websites. This implies that the larger the web site being evaluated, the more violations will be found by the USEFul framework.

Violations identified by Nielsen & Tahir plotted against lines of codeViolations identified by USEFul plotted against lines of code

Result 5 – Time Required to Identify Usability Violations: As can be expected from any computerised system, the USEFul framework requires less time to identify usability violations than the usability experts. By conducting several tests, it was estimated that a usability expert would approximately take 2.129 seconds to identify and locate a usability violation as opposed to USEFul which only takes 0.097 seconds. This relationship can be demonstrated in the line graph below:

Time taken to identify usability violations

Result 6 – Number of Usability Violations Identified with and without USEFul: The bar graphs below show the average number of usability violations per website identified by Nielsen and Tahir as opposed to the USEFul framework alone. The color convention should be interpreted as follows: Green and Amber are usability violations whose patterns can be implemented in the framework; Red & Site Specific are violations whose patterns cannot be automated and hence, only human usability experts can identify; Additional Violations are those violations that have been identified by USEFul because of the structured way by which it analyzes every single line of code.

Average number of violations identified per web site

Result 7 – Effort Required to Identify the Same Number of Usability Violations: Since USEFul is able to identify a considerable number of violations that the usability expert identifies, it was calculated that the usability expert only needs to make 56.35% effort as opposed to 100% effort to identify the same number of usability violations in a site.

Result 8 – Performance of a Non-Usability Expert using USEFul: By analyzing the data obtained in this experiment, it was calculated that by relying on the results obtained by the USEFul framework alone, a non-usability expert would be able to identify 69.52% of the usability violations identified by the usability experts.

Conclusion

Through its implementation, the USEFul framework has demonstrated that usability evaluation can indeed be automated. Automation is in the form of the retrieval of the web site, its parsing to distinguish the web site code from the content, the checking of whether the site violates any of the stored guidelines and the reporting of which guidelines were violated. Thus from the automation aspect, it can be concluded that the framework automates the usability evaluation process in its entirety.

From the results obtained in the experiments, it can also be concluded that the USEFul framework is very effective in identifying usability aspects that violate usability guidelines.

Since the USEFul framework fully automates the evaluation process and explains the violations in layman’s terms, it can be operated and interpreted by a non-usability expert. Additionally, the separation of the evaluation logic from the guidelines that are referenced makes the framework easier to maintain. This is because new guidelines can be added, modified or deleted from the current set of guidelines in the database – thus requiring no programming skills.

All of these facilities make the USEFul framework appealing to a wider audience and hence contributes towards the mainstreaming of usability.

Share This Post

About Justin Mifsud

I am a user interface designer and user experience consultant by day and blogger by night. I own and run this blog, Usability Geek, where I evangelize about the importance of making the web a usable place and, more importantly, how to do it. More about me