Abstract

A recent study suggested the use of the screen layout elements of balance, unity, and sequence as a part of a computational model of interface aesthetics. It is argued that these three elements are the most contributed terms in the model. In the current study, a controlled experiment was designed and conducted to systematically investigate effects of these three elements (balance, unity, and sequence) on the perceived interface aesthetics. Results showed that the three elements have significant effects on the perceived interface aesthetics. Significant interactions were also found among the three elements. A regression model relating the perceived visual aesthetics to the three elements was constructed. When validating the model using standard questionnaire scores of real web pages, high correlations were found between the values computed by the model and scores of questionnaire items related to visual layout of the web pages, indicating that layout-based measures are good at assessing the classical dimension of website aesthetics.

1. Introduction

Traditionally, the focus of the field of Human Computer Interaction (HCI) is toward functionality and usability aspects in interface design. However, recently, there was a new wave in HCI field emphasizing the importance of aesthetic aspects in HCI and interface design, and in engineering research and design in general [1–3]. This shift is motivated by the increased awareness of the importance of users’ likability and aesthetic aspects in system acceptability [1].

The attention to the importance of aesthetics in interface design and its effect on users’ impressions of usability of the system began with findings of the study conducted by Kurosu and Kashimura [4]. Using different designs of an automated teller machine interface, they found high correlation between users’ prior perception of usability (they called it apparent usability) and users’ perception of visual aesthetics of the interface. Participants perceived the visually appealing interface designs as easier to use. Later, Tracinsky [5] repeated their experiment in a different context using more rigorous approaches; the same high correlation was also found in all the tested cases. Furthermore, this strong relationship between user perception of interface aesthetics and perceived usability remains intact even after actual use of the system [6]. Lindgaard et al. [7] showed that first impressions of perception of visual appeal of websites formed very quickly within 50 milliseconds. It remains stable even after considerably longer exposures [8]. Phillips and Chapparro [9] found that users’ impression of usability of websites is mostly influenced by visual appeal of the site. Users rated sites with high visual appeal and low usability as easier to use and gave lower rates to sites with low visual appeal and high usability.

Besides positive effects of aesthetics on perceived usability, some even argue that visually appealing interfaces might also have positive effects on performance. For example, Moshagen et al. [10] found a significant effect of highly aesthetic websites on completion time in a low usability condition when participants completed search tasks. Sonderegger and Sauer [11] showed that visual appearance of cell phones had a positive effect on performance, leading to reduced completion time and number of errors for the visually appealing design.

One line of research in interface aesthetics is concerned with determining what features in the interface design trigger users’ precipitin of aesthetics of the interface. It also tries to explore the possibility of expressing changes in such features using numerical values and use these numerical values to assess users’ perception of interface aesthetics. One approach argues that physical layout of visual objects on the screen may play a role in users’ perception of aesthetics. This approach builds on earlier quantitative measures of aesthetics (e.g., Birkhoff’s aesthetic measure [12]) and principles of Gestalts laws for visual design [13]. The procedure involves expressing visual design features (like symmetry, balance, unity …etc.) using mathematical formulas and combine calculated values for all features to build an overall measure that would reflect aesthetic level of the interface design.

The current study follows a similar approach. The purpose of the study is to systematically investigate effects of the three screen design elements of balance, unity, and sequence on users’ perception of interface aesthetics. To accomplish this goal, the experimental procedure of this study, first, used simple abstract black and white screens to systematically assess effects of these three elements on perceived aesthetics. The reason for using abstract screens is to be able to easily manipulate and study the related elements in a controlled environment that would insure obtaining statistically valid results. This procedure was also used in similar previous studies [14, 15].

Next, the results obtained from testing with the abstract screens were used to build a regression model relating the three elements to users’ perceived aesthetics. Finally, the model was validated using standard aesthetics questionnaire scores of 42 web pages obtained from a previous study [16].

Before presenting experimental work of this study, a brief summary of the different approaches to measure interface aesthetics and related work concerned with developing quantitative measures for interface aesthetics is given in Section 2.

2. Measures of Interface Aesthetics

In general, two approaches to measure interface aesthetics can be distinguished in the literature. The first is an objective approach relating screen design layout elements to users’ perception of visual aesthetics (e.g., [14, 17]). The second approach is a subjective approach, utilizing questionnaire-based instruments to measure users' perception of visual aesthetics [18].

2.1. Screen Layout-Based Measures

Methods in this approach are motivated by earlier aesthetic measures developed by Birkhoff [12], Tullis’ quantitative techniques for evaluating screen design [19], and Gestlest theory for visual design [13, 22].

Supports of this approach [14, 22] argue that developing quantitative measures that can provide numerical values for different designs based on interface and screen design characteristics can be very helpful in many design situations. These numerical tools can be extremely helpful in early stages of design. They can assist in preparing design alternatives and can reduce the number of prototypes that will undergo tests with human users in later stages of design. However, these tools are not meant to be replacement to human designers but are intended to serve as numerical tools to help designers and researchers evaluate different design alternatives without the need to use human participants and to understand the extent to which their designs would affect usability. Moreover, these measures can provide researchers with quantitative tools that can help in systematical study of different design aspects and give a numerical basis for direct comparison between different design proposals. These measures can also be useful in cases where on-the-fly designs are needed for nonprofessional designers as in online tools for designing websites [15].

Several attempts have been conducted in the past few years to develop such measures; Ngo et al. [17] developed a mathematical model to measure screen aesthetics. The model consists of fourteen proposed measures of screen aesthetics: balance, symmetry, equilibrium, unity, sequence, density, proportions, cohesion, simplicity, regularity, economy, homogeneity, rhythm, and order. The value of each measure can be calculated using formulas based on the layout of visual objects on the screen. The average of all these measures represents the overall aesthetic value of the screen. When testing these measures using real computer screens, high correlation was found between the model’s computed aesthetic value and users’ perceived aesthetics of the interface.

In one study in which the model was applied to data entry screens [20], a total of 57 screens with different aesthetic values were tested and multiple regression was used to fit subjective ratings of the screens (obtained from subjective ratings of seven participants) to the measures (calculated by the model). Although the procedure did not involve a controlled experiment to complete the regression analysis, it was enough to enable the use of a t-test to test the significant of each term (measure) in the model. Results of these tests showed that the regression model was statistically significant and that the measures of balance, unity, and sequence are the most contributed terms in the model. However, interactions among the different terms (measures) of the model and how they might affect users’ aesthetic perception were not addressed in the study.

Another study that used controlled experiments is the work of Bauerly and Liu [21]. In their study, they used a factorial design to test the effects of symmetry and number of compositional elements on interface aesthetics. Basically, their findings were similar to Ngo et al. [22] study. However, it was difficult to practically compare their findings with Ngo et al. [22] study, because they used a different approach and different formulas to calculate the values of the two tested measures in their experiments.

The model developed by Ngo et al. [22] can be considered as one of the most successful attempts to develop aesthetic interface measures based on interface layout. However, one difficulty related to practical application of the model is the relatively large number of measures (14 measures) and the associated formulas needed to calculate each of them. In a practical application of the model, Zain et al. [23] designed a computer application to incorporate five of the fourteen measures proposed by Ngo et al. [22]. The five selected measures were balance, equilibrium, symmetry, sequence, and rhythm. They applied the software to language learning web pages. Findings of their study showed some accordance with users rating, but no statistical test was used to get conclusive results. The reason for these inconclusive results could be due to the fact that not all the significant measures, as detected in Ngo and Byrne [20] study, were included in their software and that the possibility of interactions among the measures was not considered.

2.2. Questionnaire-Based Measures

Supporters of this approach argue that the complexity and interrelated relationships among the screen design elements make it difficult to use them to quantitatively measure aesthetics [18]. It would be more convenient to use questionnaire-based instruments to measure users’ subjective perception of aesthetics. Two widely accepted of such instruments are the classical and expressive instrument developed by Lavie and Tractinsky [18] and the Visual Aesthetics of Website Inventory (VisAWI) tool developed by Moshagen and Thielsch [16]. Both were designed to measure perceived visual aesthetics of websites.

Lavie and Tractinsky [18] found two dimensions of the perceived website aesthetics, termed “classical aesthetics” and “expressive aesthetics”. The classical aesthetics emphasizes orderly and clear design and are closely related to many of the usability and interface design rules and guidelines. The expressive aesthetics dimension is linked to the designers’ creativity and originality and to the ability to break design conventions. These two dimensions were the basis for developing quantitative questionnaire-based instrument to measure website interface aesthetics. The classical dimension includes the items “aesthetic”, “pleasant”, “symmetric”, “clear”, and “clean”, while the expressive aesthetics includes the items “creative”, “fascinating”, “original”, “sophisticated”, and “uses special effects”.

VisAWI was constructed to serve as a new tool to measure perceived website aesthetics. It was designed to provide a tool that would cover border aspects of perceived websites aesthetics that were not adequately presented in early instruments. The instrument is based on four interrelated facets of perceived visual aesthetics of websites: simplicity, diversity, colorfulness, and craftsmanship. Simplicity comprises visual aesthetic aspects such as balance, unity, and clarity. It is closely related to the classical aesthetics dimension. The Diversity facet comprises visual complexity, dynamics, novelty, and creativity. It is closely related to the expressive aesthetics dimension. The colorfulness facet represents aesthetic impressions perceived from the selection, placement, and combination of colors. Craftsmanship comprises the skillful and coherent integration of all relevant design dimensions. Each of the first two facets is presented by five items in the questionnaire, while each of the last two facets has four items.

3. Study Objectives

Objectives of the current study are first, to design and conduct a controlled experiment to test effects of the layout elements of balance, unity, and sequence on interface aesthetics. These measures were chosen based on findings of [20]. The possibility of interactions among these measures will also be tested. Second, use these elements to build and validate a regression model representing users' perceived visual aesthetics.

The balance element in screen design can be achieved by maintaining equal weights of visual objects in the screen: top and bottom, left and right [22]. Unity is the extent by which visual objects on the screen seems to belong together as one object [22]. Sequence corresponds to the arrangement of visual objects in a screen in a way that facilitates eye movement. The eyes movements usually follow the pattern associated with reading. In cultures that read from left to right, the eyes will start from the upper left and move back and forth across the screen to the lower right [22]. Moreover, bigger objects in the screen have more visual weight and the eyes move from bigger to the smaller objects on the screen.

Ngo et al. [22] have developed formulas to calculate numerical values for each of these elements. The formulas were developed so that each element (measure) can have a value ranges from zero (for the lowest screen aesthetics) to one (for the highest screen aesthetics). These formulas are going to be used in the experimental part of the current study to calculate the required values for the three elements. The formulas for the three elements with hypothetical examples shown their uses are given in the appendix.

4. Method

4.1. Design of the Experiment

An experiment was designed and conducted to test effects of the three screen layout elements of balance, unity, and sequence on participants’ perceived aesthetic value of interface design.

A factorial design was utilized with the three screen elements as the main factors. Each of the three factors was tested at two levels (high and low) that are supposed to cover the whole range of each factor. The used design is a 23 within-subject factorial design with repeated measures. This design produces eight experimental conditions representing the factorial combinations of the three factors each at two levels (23 = 8 conditions).

The three factors: balance, unity, and sequence represent the independent variables and the dependent variable is participants’ ratings of interface aesthetics.

This type of factorial design was used because it is relatively easier to apply and because it can give reliable results with relatively small number of participants.

4.2. Screen Designs

Eight black and white screen models representing the eight experimental combinations (3 factors each at 2 levels) were prepared. Each screen has an “on-the screen” size of 1024 pixel by 1024 pixel. Four squares were used as the screen objects to be manipulated to produce the required experimental conditions. The reason for using square shapes is to eliminate effect of aesthetic proportions that may show in case of using irregular shapes. A relatively small number of only four objects was used in each screen to simplify objects manipulation required to produce the experimental conditions.

The required numerical value of each factor was calculated using the formulas developed by Ngo et al. [22] (examples of how the calculations were carried out are given in the appendix). Although, theoretically, the two levels of each factor are supposed to represent the extreme values (0 for low and 1 for high); it was practically difficult to do that. To overcome this difficulty, a range was used to represent each level, with the low level below 0.25 and the high level above 0.75.

Table 1 shows the different factor levels (+ for high and − for low) and values associated with the eight screen designs. It also shows the overall aesthetic measure value of each screen, obtained by calculating the average of the values of the three factors. Figure 1 represents the eight screen models associated with the eight experimental conditions. They are presented with the same order in Table 1; for example, screen 1 represents the condition of all the factors at the “high” level () and screen 2 represents the condition of all factors at the “low” level (). The remaining screens represent the different combinations of “high” and “low” levels for the three factors (as explained in Table 1).

Table 1: The eight experimental conditions and the associated factors levels and values.

4.3. Participants

Thirteen graduate students of engineering (10 males and 3 females) volunteered to participate in the experiment, with a mean age of 29.3 years and standard deviation of 6.1 years.

4.4. Apparatus

An IBM compatible PC with a 17′′ LCD display with 1280 × 1024 pixels screen size and depth of colors of 32 bit true colors was used in the experiment. The operating system was Microsoft Windows XP. Microsoft Office PowerPoint 2003 was used as a display screen.

4.5. Procedure

The eight screens were presented randomly on a computer display to each participant using a PowerPoint presentation, with the participant controlling the progress of the presentation. The participants were instructed to rate each screen based on their personal preferences using a 10 point scale, with 10 representing “most beautiful” and 1 representing “least beautiful”. Each experimental trail started with the experimenter explaining the purpose of the experiment and reading short written instructions explaining the nature of the experiment and the task to be performed. Next, all the eight screens were quickly presented to the participant. After that, each screen was presented separately and the participant had to view the screen and write his/her rating on a paper form. Participants were encouraged to rate each screen as fast as possible based on their intuitions and first impressions.

5. Result and Discussion

5.1. Participants Ratings

Participants’ average aesthetic ratings of each screen are presented in Table 2 next to the corresponding calculated aesthetic values. Participants' ratings were divided by 10 to make them compatible with the computed values of aesthetic measure. Comparing these ratings to the calculated aesthetic measures, some accordance between both can be noticed, except for screen 2; a relatively high average rating was given to this screen, which was a bit surprising, since this screen is supposed to represent the lowest level of interface aesthetics.

A relatively high correlation coefficient of 0.84 was found between participants’ ratings and the measured values of aesthetics. This confirms with finding of previous studies.

5.2. Analysis of Variance

Analysis of variance results are shown in Table 3. All three elements: balance, unity, and sequence have significant effects on the perceived interface aesthetics ( values ). Only the two way interactions involving the unity element were found significant ( value ). No significant effect between balance and sequence was found ( value =.215). The three-way interaction was not significant ( value =.933). Power of the test of 0.994 (at ) was calculated using an average estimated effect value of 1.224, indicating that the used sample size of 13 participants was enough for obtaining statistically valid results.

Table 3: Analysis of variance results.

Implication of the significant effects of the three elements can be better explained by interpreting main factors effects and interactions plots presented in Figure 2. Average effects of the main factors are plotted in Figure 2(a); with all three factors, participants’ average ratings of interface aesthetics increase with the increase of the value of the factor from the low level to the high level. Balance has the largest effect, closely followed by unity and lastly sequence with a relatively smaller effect.

Figure 2: Average effects and interactions plots.

Plots of the two-way interactions effects among the factors are shown in Figures 2(b) and 2(c). These plots indicate that with each pair of factors the effect of one factor is larger at the high level of the other factor; with the low level the effect is very small. For example, looking at Figure 2(b), at the high level of balance, unity changes from a smaller value (5) at its low level to a larger value (7.81) at the high level. With the low level of balance, the plot shows a very small change in unity (from 4.3 to 4.5).

A matter that needed further investigation is the surprisingly high average rating given to screen 2, the screen that was supposed to represent the low levels of the three factors and consequently the lowest value of screen aesthetics. One possibility is that high ratings of screen 2 could be due to hidden effects of other layout elements. Two possible elements that might cause this effect are symmetry and density. To investigate this possibility, values of these two elements were calculated (using Ngo’s formulas), and analysis of variance was repeated with both of them as covariate factors, results showed no significant effect of neither. Based on this, high ratings given to screen 2 could be attributed to random experimental errors. Nevertheless, other possibilities including effects of additional screen elements should be investigated in case any of them has an effect that was overlooked in this experiment.

5.3. Constructing the Regression Model and Validating It Using Real Websites

Based on results of analysis of variance, a regression model relating the significant elements and interactions to the perceived aesthetic values was constructed. The model is shown below (1):
where B: Balance, U: Unity, S: Sequence.

The model has only five terms and only values of the three elements need to be substituted in the model to get the equivalent value of perceived aesthetics. The model was used to calculate values of the eight screens of the study and compare the results with actual values of participants’ ratings. The comparison is shown in Table 4. One can see that the predicted values calculated by the model and the actual values of participants’ ratings are very close. High correlation (r = 0.981) was found between actual and predicted values.

Table 4: Actual and predicted aesthetic values of the eight screens.

To validate the results, the regression model was used to calculate visual aesthetics of forty-two web pages already used in a previous study [16] to develop the VisAWI questionnaire-based measure of visual aesthetics of websites. These 42 web pages were used in [16] to validate the VisAWI questionnaire and compare it with classical and expressive aesthetics questionnaire. Aesthetic values calculated for the 42 web pages by the regression model were compared to scores of VisAWI and classical/expressive questionnaires already available in [16]. Correlation analysis was conducted to see how the objective layout-bases measures proposed in this study relate to standard questionnaire-based measure of visual aesthetics. The procedure used to compute the values of the three elements (balance, unity, and sequence) is the same as the one used to calculate their values for the eight abstract screens. Visual information on each page was divided into hypothetical visual objects, layout data obtained from these objects (area, distance from central axis … etc.) were input to the computational formulas for computing the three elements (see the appendix for the formulas and examples of calculations). Figure 3 shows an example of how a web page was divided into visual objects.

Figure 3: An Example of how a web page is divided into visual objects ((a) shows the original web page; (b) shows the page divided into visual objects).

The reason why these 42 web pages are utilized in this study is that they already cover a wide variety of websites with different levels of visual aesthetics. In addition, questionnaire scores for a large sample size are already available for these pages; scores of a total of 512 participates were used to validate the questionnaire. Of the participants, 347 (67.8%) were female. Age ranged from 15 to 82 years .

Table 5 summarizes descriptive statistics for the calculated values of the three measures, their average, and values calculated by the model for the 42 web pages.

Table 5: Descriptive statistics for the measures and the model for the 42 web pages.

Table 6 shows correlation coefficients between the measures and questionnaire scores for the 42 web pages of [16] study. From the table, one can see that all significant correlations are with the questionnaire items related to screen layout. The measures of unity and the model are significantly correlated with the classical and the simplicity measures; both including items related to visual layout and clarity of the design.

Table 6: Correlations between the measures and questionnaire scores.

From the three layout measures (balance, unity, and sequence), only unity has high correlations with the questionnaire measures. No significant correlations were found between balance and sequences and the questionnaire measures. This might be explained by looking at the interactions plots in Figure 2 and descriptive statistics in Table 5. High values for both balance and sequence were calculated for the 42 web pages; values of balance range from 0.516 to 0.950 with an average value of 0.792, and values of sequence are all above 0.75 with an average of 0.970. On the other hand, unity has lower values: from 0.163 to 0.684 with an average of 0.417. Interpretation of interaction plots (Section 4.2) suggests that the effect of one factor is larger at the high levels of the other factors. For the 42 web pages, both balance and sequence have higher values than balance. Hence, unity will have larger impact on perceived aesthetics. This was reflected in that the high correlations unity has with the related questionnaire measures.

6. Conclusions and Extensions

This study was designed and conducted to investigate effects of three elements of screen layout (balance, unity, and sequence) on the perceived interface aesthetics. Results showed that the three elements have significant effects on perceived interface aesthetics. Significant effects of interactions among the three elements were also found. A regression model relating perceived visual aesthetics to the three elements was constructed. When validating the model using standard questionnaire scores of real web pages, high correlations were found between the values computed by the model and scores of questionnaire items related to visual layout of the web pages. This indicates that although the formulas used in this study were originally developed for data entry screens, they can also be applied to websites. It also indicates that the layout-based measures tested in this study can adequately predict aesthetics aspects related to the classical and the simplicity dimensions of website aesthetics. However, it is still not clear how much weight classical aesthetics aspects have on the overall user perception of visual aesthetics. Findings of recent studies point out to a possible effect of context of use [24, 25]. They indicated that classical aesthetics will have a dominant effect in case of more traditionally designed and information-oriented websites. Therefore, it is recommended to limit the use of layout-based measures (such as the ones tested in this study) to assess visual aesthetic aspects related to classical aesthetic dimensions. Nevertheless, this should not prevent investigating other interface design features that could relate to the expressive dimensions of visual aesthetics.

Several issues still need to be considered when interpreting findings of this study. First, the formulas used to calculate the three elements do not include effect of color, although Ngo et al. [17] suggested adding effect of colors as part of the balance element, but, it is still not clear how to express effect of colors using numerical values. Second, in validating the results, only simple correlation coefficients were used to investigate the relationship between the layout measures and questionnaire scores. However, for the significant correlations to be confirmed, further testing using more rigorous procedures is needed. Third, the procedure used to divide the web pages into visual objects was a bit arbitrary based on authors’ perception of the pages. General criteria and systematic methods should be established to make it easy to apply the formulas to any web page. Finally, since web pages were used in this study, findings are only applicable to visual aesthetics of websites.

Appendix (or) Appendices

A. The Used Formulas with Examples of Calculations

This section lists the formulas developed by Ngo et al. [17] to calculate screen balance, unity, and sequence. A hypothetical abstract screen, similar to the screens used in the study, is used to give examples of how the formulas were used to calculate values of each of the three elements.

A.1. Balance

The balance is computed as the difference between the total weighting of components on each side of the horizontal and vertical axis and is given by
where BM stands for Balance Measure, BMvertical and BM horizontal are the vertical and horizontal balances with
where L, R, T, and B stands for left, right, top, and bottom, respectively, aij is the area of object i on side j, dij is the distance between the central lines of the object and the frame, and nj is the total number of objects on the side.

Example This example shows how balance of a hypothetical screen shown in Figure 4 computed using the above formulas

A.2. Unity

The formula for unity is
where UM stands for Unity Measure, is the extent to which the objects are related in size with
and is a relative measure of the space between groups and that of margins with
where , and are the areas of object i, the layout, and the frame, respectively, stands for the number of sizes used, and n is the number of objects on the frame.

Example This example (Figure 5) shows how unity of the hypothetical screen is computed using the above formulas

A.3. Sequence

The formula for calculating sequence is
with
with
where UL, UR, LL, and LR stand for upper-left, upper-right, lower-left, and lower-right, respectively and aij is the area of object i on quadrant j. Each quadrant is given a weighting in q.

ExampleThis example (Figure 6) shows how sequence of the hypothetical screen is computed using the above formulas

Acknowledgments

The authors would like to thank M. Moshagen and M. T. Thielsch (authors of [16]) for providing the screenshots and questionnaire scores for the 42 web pages. Authors would also like to thank the three anonymous reviewers for their helpful comments on previous versions of this manuscript.

D. Chand, L. Dooley, and E. Tuovinen, “Gestalt theory in visual screen design—a new look at an old subject,” in Proceedings of the 7th World Conference on Computer in Education, Australian Computer Society, Copenhagen, Denmark, 2001.