Within a broader field of psycholinguistics, word processing is settled in the crossfire, between morpheme-based (morphological) and word-based (lexical) models. The same dichotomy initiated in the theoretical linguistics has spread to younger, experimentally oriented fields. If morphemes are cognitively relevant, then the central questions are how and when are words decomposed into constituent morphemes. Conversely, for the word-based approach, the above problems are almost irrelevant. At the same time, questions regarding cognitively relevant properties (or attributes) of words become central. Serbian is characterized by rich morphology, which makes it suitable for contrasting morphological and lexical position, to understand the importance of each. The research conducted in the Laboratory for Experimental Psychology is based on Information Theory, aiming at finding the unique principle at the core of various manifestations of morpho-lexical organization of language. By combining techniques of quantitative linguistics and experimental psychology, we seek estimates of probabilities of word forms, as well as quantitative descriptions of various aspects of morphological organization. The aim is to understand their cognitive implications. In other words, we attempt to propose formal operationalizations of morpho-lexical properties, relying on Information Theory. Those operationalizations are then tested against behavioral (or neural) response data, where morpheme-based and word-based models implicate the essential differences regarding the architecture and functioning of mental lexicon.

Word frequencies follow a very specific type of distribution, usually referred to as the Large Number of Rare Events (LNRE) distribution. A standard set of statistical tools become problematic, or even inapplicable, for quantitative word analysis, if this specific distributional property is not taken into account. Following Zipf, Mandelbrot and others, we can have proper lexical quantifications. Then, we can learn more about stylistic variations, diachronic trajectories, synchronic differences and so on.

Milin P., & Ilić, N. (2003). Text as Binary Sequence: A Case of Characteristics Constant of Text. Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03), 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL-03). 47-52. PDF

COMPUTATIONAL MODELING OF WORD PROCESSING

Today, traditional psycholinguistic experiment and a computational model must be seen as complementary tools, used to understand the complex non-linear dynamics of language processing. In an experiment, we carefully control properties of input stimuli (words), while monitoring the outcomes (behavioral response measures). Then, we infer about psychological structures and/or functions, involved in processing the input and the outcome. However, this inference is indirect, established through input-output (cor)relation. Computational modeling, on the other hand, allows for full specification of the structures and/or functions. If fed with the identical input, the model gives an output that should tightly match the outcome from a behavioral experiment. Points of departure can, in principle, be used to pose new research questions.

Word ambiguity is one of the most pervasive language phenomena. It is almost impossible to find a word with a single meaning, because the meaning of most words is more or less dependent on the context in which a word occurs. The question that cognitive psychologists ask is to what extent, and in which way this language phenomenon is relevant for the models of language processing. In the Laboratory for Experimental Psychology, word ambiguity is addressed from the standpoint of Information Theory, and word ambiguity is viewed as the uncertainty of word meaning. Based on that, various estimates of the probabilities of word meanings are collected. The aim of this is the pursuit of the quantitative description that depicts cognitive complexity of different forms of word ambiguity. This approach is in accordance with the growing view of the cognitive system as a system capable of detecting statistical patterns in the environment.

The meaning of a word is influenced by a context in which a word occurs. One of the problems in research for processing ambiguous words is finding a way for valid and full listings of the different meanings of ambiguous words. At the same time, treating different word meanings as discrete categories is questioned. One solution to these problems is applying vector based semantic analyses when describing ambiguous words. This quantitative technique relies on statistic description of word meanings based on the frequencies of various contexts a word appears in, which is based on the frequencies of the surrounding words.

Word concreteness is defined as the extent to which it is possible to experience the object denoted by a given word by senses. It is one of the variables that has been traditionally investigated in psycholinguistic and cognitive psychology research. Results have suggested that concrete words are processed faster than those that can not be experienced by senses (i.e. abstract words). The standard procedure for collecting subjective concreteness judgments is based on the general estimate, the one that encompasses all of our senses (e.g. a word cat is concrete because we can see, touch, smell, hear a cat; a word truth is abstract, because we can not see, touch, smell truth). In the Laboratory for Experimental Psychology we have been developing a new approach in measuring word concreteness. This new approach implies collecting separate judgments for different sensory modalities. Based on collected measures, we derive several quantitative descriptors of word concreteness. Demonstrating the cognitive relevance of the collected measures is of significance for the theories of representation of concepts in long-term memory, and particularly relies on the Embodied Cognition approach.