PURPOSE: To describe the judgement of the concreteness of a set of 162 Brazilian Portuguese words, prior to the elaboration of a speech recognition test, as well as to verify the influence of variables such as the frequency of occurrence of the words and age and undergraduate program year of the participants on the concreteness ratings. METHODS: Fifty undergraduate Speech-Language Pathology and Audiology students from a public university rated the concreteness of a set of 162 words using a seven-point scale where the lowest concreteness degree was represented by number one and the highest by number seven. Participants were free to choose any number in the scale. RESULTS: The results showed a tri-modal distribution of values, suggesting the classification of three categories, according to the concreteness rating. The low concreteness category ranged from 1.76 to 3.45; the medium concreteness category, from 3.46 to 4.95; and the high concreteness rating, from 4.96 to 6.70. Positive correlation was found between the concreteness rating and the coefficient of variation, whereby the higher the rating attributed to a word, the lesser variation in the responses. No significant correlation was found between concreteness ratings and the frequency of occurrence of words. The influence of age and undergraduate year was significant for some correlations. CONCLUSION: Results showed three concreteness categories, and suggest that concreteness can be considered an independent attribute of words, since their frequency of occurrence, as well as participants' age and undergraduate program year did not influence the ratings attributed. The words classified in the high concreteness category were subsequently used for the elaboration of a speech recognition test.

Defining the degree of a word's concreteness may appear to be a simple task. Classifying a word as concrete or abstract means placing in the former category those words to which physical properties such as form or matter are attributed and in the latter, those for which such properties are lacking. Studies(1-3) have associated the concept of concreteness to the question of sensory experience whether it be imaginary or actually corresponding to material objects or persons that can be perceived by the senses.

In fact, the concept of concreteness is a complex one and that becomes evident when certain words that defy this notion of a rigid dichotomy between concrete and abstract are analyzed. A good example is the word governo (government) which can be held to be 'abstract' insofar as it is impossible to precisely indicate whom or what it is but, at the same time, 'concrete' insofar as it involves a set of concrete entities such as people, buildings and certain locations(4).

Some researchers(5,6) consider that the difference between concrete and abstract words, in the light of a variety of cognitive tasks, would lie in the way they are processed by the nervous system; in their view concrete words are processed faster and more precisely than is the case with abstract nouns.

Two concurrent sets of theories attempt to explain the differences in the processing of concrete and abstract words; one set involves a multiple coding approach like the dual coding theory and the other uses single mode models such as the theory of context availability.

The concreteness associated to the verbal material used has a notable influence in any tasks involving language and memory and consequently it is a variable that must be taken into consideration not only as an independent factor in experiments but also as an experimental control factor(3). Various studies have shown the influence and importance of the concreteness of words in various aspects of language disturbances such as difficulties in learning to read(7,8), age of acquiring written word recognition skills(9), sentence comprehension(10), in conditions such as dementia(11-13), as well as in lexical decision(6,14) and
memory(1,5,15,16) tests. Controlling for this variable in elaborating tests enhances the reliability of the results obtained. Accordingly, when elaborating tests involving spoken language such as speech recognition tests, it is important that the concreteness of the words being used in the procedure should be taken into account alongside the other factors.

There have been studies(1-4,17) that attempted to measure the degree of concreteness of a word using subjective judgments whereby individuals assess the words and attribute them a score on a numerical scale(2-4,17,18), that ranges between two concepts, concrete and abstract(1), while others have included a third term contemplating the eventual impossibility of making a judgment(10). One such study of concreteness patterns conducted in Brazil(3) made use of 909 Brazilian Portuguese words and found that there was no detectable influence on the concreteness ratings associated to gender or age of the individuals, or to the normal frequency of occurrence of the words in written material or to the size of the category of attribution in the ratings. Although those authors used such a considerable number of words in their study, most of the words of interest to the present research were not to be found in their lists.

Thus the present paper sets out to describe an evaluation made of the degree of concreteness of a selected group of 162 Portuguese language words for the purpose of subsequently elaborating a speech recognition test. The aim was also to investigate the possible influence of the frequency of occurrence of the words analyzed, and the age and course year of the participants making the assessments.

METHODS

A survey study was made as part of the study discipline Hearing Disorders of the Speech-Language Pathology and Audiology Department of the Universidade Federal de São Paulo (UNIFESP), in August 2009. The project that originated the study was duly analyzed and approved by the Research Ethics Committee, and the process filed under the number 0948/09.

Fifty volunteers took part, only one of whom was a male (2%), with ages ranging from 17 to 24, average 19.86. All participants signed a term of Free and Informed Consent and were undergraduate students in the Undergraduate Program I Speech-Language Pathology and Audiology at the abovementioned institution. Of the group, 24 (48%) were enrolled in the 1st year of the course, 20 (40%) in the 2nd year and six (12%) in the 4th year. The undergraduates undertook the assessment according to their availability and the time the researcher had available to dedicate to the survey. That explains the heterogeneity of the groups in regard to the course year. Ninety four percent of the participants were from municipalities in the state of São Paulo and the other 6% were from the cities of Niteroi (RJ), Manaus (AM) and Recife (PE), Brazil.

For test purposes 162 words were selected from among a group of 347 obtained from the corpora studies(19) undertaken by a research group at the Universidade Federal de Minas Gerais (UFMG) as part of the project Avaliação Sonora do Português Atual - ASPA (Sound Assessment of Contemporary Portuguese). The university's ASPA project analyzed a text corpus originally developed by the post-graduate studies program in Applied Linguistics and Language Studies (LAEL) at the Pontíficia Universidade Católica de São Paulo (PUC-SP). All the words selected from that survey were two-syllable nouns with the stress on the first syllable and of frequent occurrence (over 50 occurrences per million; an arbitrarily defined threshold). The criteria applied to the original group of 347 words taken from the survey database to obtain the 162 words used in this research were to eliminate: a) proper nouns (43 eliminations) because it is not possible for the subjects involved to evoke images on the basis of an isolated presentation of a proper noun without an accompanying context; b) words in the plural form with the same root as others in the singular (54 eliminations), consequently, only singular forms were used; and c) words whose final consonant was [s] (27 words eliminated) because the word group was destined for subsequent use in speech recognition tests to be applied to elderly people who have notorious difficulty in identifying this frequently occurring phoneme so that its presence could introduce a variable that might complicate the test results; and d) incomplete words or words belonging to a category of speech other than noun (3 words excluded). A further 58 words happened to coincide with words that had already been classified as concrete or abstract in a national study whose findings have been validated and accepted and accordingly they were not used in the concreteness assessment.

The words were set out in a single test instrument consisting of three ordinary sheets of white paper. At the head of the first sheet there was a space for participants to fill in their details (age, gender, place of birth, undergraduate program and course year), followed by a succinct set of instructions of the task in hand and the numbered words that were to be assessed. Below that came a list of 40 of the words. The second sheet continued the word list with 68 words and on the third sheet were the remaining 54 words. To the right of each word there was a scale ranging from 1 to 7 with value one corresponding to "highly abstract" and value seven to "highly concrete".

The task was applied in single sessions to three groups formed according to the course years in which the students were enrolled. The instructions were similar to those used in the national study referred to(3), and were as follows: "Below you will find a list of words. Your task is to judge the concreteness of each word. Usually words that refer to material objects that can be perceived by the senses are more concrete than those associated to abstract concepts that cannot be experienced directly by the senses. Furthermore, it is easier for us to form a mental image when we think of more concrete words than when we think of more abstract ones. For example, it is possible to perceive with our senses and create an image of the word 'martelo' (hammer) but in contrast it is far more difficult to create an image or associate any sensory impressions to the word 'quantidade' (quantity). In this case the word 'martelo' is more concrete and the word 'quantidade' more abstract. In your assessment of the degree of concreteness you must use the scale provided from one to seven where the number seven is attributed to words that are highly concrete and the number one to those that are highly abstract. You are free to use any number on the scale. In assessing each word, draw a circle around the number that best expresses your perception of the word's degree of concreteness. Do you have any doubts?".

It was also made clear that each participant should work alone in silence and that no information should be exchanged with other participants. There was no time limit set for completing the task.

The statistical method adopted was testing for Equality of Proportions to analyze the proportions between the classes identified in this case according to the degrees of concreteness. Pearson's Correlation Coefficient was used to analyze the correlation between the mean values for concreteness and the Coefficient of Variation (CV) and also the correlations of the frequency of occurrence, and the ages and course years of the participants in the concreteness assessment

It should be mentioned that the CV is a statistical measurement that indicates the degree of dispersion of the data around the mean value so that ideally it should be as low as possible. A low value means the results are homogeneous.

The level of statistical significance adopted was 0.05 (5%) and all the confidence intervals were designed to ensure 95% of statistical reliability.

RESULTS

It was observed that participants took 15 minutes, on average, to complete the task; similar to the time registered by another national study in which each participant assessed 151 or 152 words.

Thus each word in the test was judged 50 times with the exception of the words vista [view-sight] and tema [theme] which were only judged 49 times due to failure to mark a score for them by one of the participants.

The average values obtained for the concreteness of the words ranged from 1.76 to 6.7 (Table 1). The words that were awarded the lowest scores for concreteness were feita [occasion] (1.76), deixa [hint] (1.82), modo [manner] (1.90), tipo [type] (2.04) and ato [act] (2.18) and those attributed the highest average scores for concreteness were padre [Catholic priest] (6.5), boca [mouth] (6.56), ouro [gold] (6.60), bola [ball] (6.68) and foto [photo] (6.70).

Statistical analysis revealed a (tri-modal) distribution with three peaks for the concreteness values (Figure 1).

The words were grouped according to their average scores into three groups corresponding to low, medium and high concreteness. The division was based on a visual analysis considering the range from 1.76 to 3.45 as constituting the low concreteness group (white columns), 3.46 to 4.95 as medium concreteness group (gray columns) and 4.96 to 6.70 as the high concreteness group (black columns).

Equality of Two Proportions testing was used to determine the proportions among the three categories, low, medium and high concreteness which were found to be respectively 37.7%, 30.2% and 32.1%, and the differences were not found to be statistically significant (Table 2).

Considering the correlation between the average concreteness value obtained for each word and the Coefficient of Variation, a negative value of 95.3% was obtained (p<0.001).

Figure 2 shows the dispersion and correlation of the average concreteness values plotted against the Coefficient of Variation.

Statistical analysis showed a value of -9.9%, (p=0.211) for the correlation of the frequency of occurrence of the words and their attributed concreteness.

The influence of the variables, age group and course year on the concreteness assessments was analyzed using Pearson's Correlation Coefficient and revealed differences of (p<0.005) for some correlations (Table 3). However those values were discarded as the quality of correlation was poor in most cases or just fair in one case only (for the word passe [pass]).

DISCUSSION

The results showed that for the set of words selected for this study the concreteness assessments tended to fall into three large groups (Figure 1) of similar proportions (Table 2) unlike the findings of other studies that report a polarized pattern with two dominant categories (concrete versus abstract) in spite of the fact that they had also offered the participants a range of intermediate score points to award so that there was a possibility for other forms of distribution to have appeared in them(3,4,17).

This difference, whereby the concreteness scores tended to form three groups rather than two can possibly be explained by the inherent linguistic characteristics of the selected material. From the extensive universe of the Portuguese Language words the selection was restricted to nouns only and, furthermore, to two-syllable words with the stress on the first syllable. In the other studies referred to, words belonging to different grammatical classes were present and there were also variations in the numbers of syllables and the position of the stressed syllable within the words

Another aspect that deserves mention and that may have contributed towards the emergence of an intermediate category of concreteness was the attitude of the participants. After they had completed the test, several of them remarked that they had found some of the words hard to assess, insofar as, depending on the context they imagined for them (as for example a film (movie) seen at the cinema or a film for a camera), they considered them to be more endowed with concreteness or more with abstractness. Faced with that difficulty they adopted one of two postures: either they attributed a value associated to the word in the context in which it first sprang to mind, which would tend to be the more concrete option, or they decided to attribute an intermediate value (in the 3 to 4 range).

The correlation between the average concreteness values and the Coefficient of Variation (Figure2) shows that the greater the score awarded to the word, that is, the more concrete it was judged to be; the less variation there was among the answers. Thus it could be said that the students were invariably surer of themselves when they considered a word to be very concrete and so there was greater homogeneity among the answers. On the other hand, in the case of abstract words there was more uncertainty and doubt surrounding their assessment, which led to greater discrepancies of opinion and greater variations in the answers given.

The literature review only revealed one study(1) that presented an analysis of the extent of agreement on the part of participants in their judgments of word concreteness. That study showed that participants were in agreement for 95.6% of the words submitted to their judgment and there was a similar pattern tending to determine just two groupings, concrete and abstract, as found in other studies, differing therefore from the results of the present study, especially in the results associated to words with lesser degrees of concreteness (more abstract words).

It is possible that the scale used for scoring may be the explanation for such differences; the referred study(1) only offered two (opposing) points on the scale for assessing concreteness and not seven as was the case with the present study. Furthermore there were variations in the lengths (number of syllables) and the position of the stressed syllable in the words used in those studies.

The findings showed no significant influence of the frequency of occurrence of words on the assessment made of their concreteness. It is worth reiterating that in this work all the words submitted to assessment had a frequency of occurrence in the range of 50 to 2,218 times per million so that they can all be considered words of frequent occurrence. This corroborates the results of previous studies(1,3,18), and indicates that concreteness can best be considered to be an attribute of the words themselves and not correlated to their frequency of occurrence.

The relation between the degree of concreteness attributed and the participants' graduate course year and their respective ages shows that, albeit it was revealed to be significant for a few of the words, the correlations were not systematic and when they did exist they were of poor or merely of fair quality. Such low quality of correlation detracts from the importance of those results and suggests that the data in question should be ignored.

The absence of correlation was expected because of the very slight variation in the ages of the participants and the difference in course years of only three years (1 to 4). In a separate study also using the Portuguese language(3), the influence of the participants' ages was analyzed on its own and found to exert no influence on the assessments made of the concreteness of the test words even when the age variations were considerably greater than in this study (ages 16 to 33 as opposed to 17 to 24).

CONCLUSION

The results revealed three distinct categories of degree of concreteness (low, medium and high) suggesting that concreteness can be considered an independent attribute of the words themselves, given that neither their frequency of occurrence nor the participants' age group or graduate course year influenced the judgment of concreteness. Those words classified as having a high degree of concreteness were subsequently used in the elaboration of a speech recognition test.