Cognitive Sciences Stack Exchange is a question and answer site for practitioners, researchers, and students in cognitive science, psychology, neuroscience, and psychiatry. It's 100% free, no registration required.

As suggested in the answer to this question, experimental results show that training is most effective when it follows an easy-to-difficult schedule.

What theories and specifically computational models account for that?

In the simple computational models commonly used to demonstrate learning classifications for example, there is no such effect, as far as I know:

Binary Perceptron - Typical examples (closest to the cluster means, or far apart along the optimal separating vector) are the most informative, leading to a prediction that easy trials would result in faster learning.

Some learning algorithms only update the network when an error is made, leading to a prediction that more difficult trials would result in faster learning.

Support Vector Machinces (SVM) - only the few most difficult examples affect the final result, again leading to a prediction that more difficult trials would result in faster learning.

Naive Bayes classifier - Typical examples help estimating the parameters with less errors, leading to a prediction that easy trials would result in faster learning.

I don't think any of those models is actually striving for validity with any biologically- or psychologically-based learning process. If anything, we are "small-margin" classifiers, as humans can detect nuances between pairs of items that something like a well-trained SVM could not.
–
Chuck SherringtonFeb 5 '12 at 6:18

4 Answers
4

Another way of thinking about this is that by progressing easy-to-hard, different intermediate knowledge structures are called into existence in the course of processing. These knowledge structures, built from an agent's encounter with easy problems, can prove useful in its encounter with subsequent and more difficult problems.

This idea has been around for a long time in a variety of cognitive traditions. It's intimately related to developmental theories of, for instance, Piaget, and Mandler (2006); and to the schema literature, which is vast, but of which Minsky (1986) and Schank & Abelson (1977) are exemplars.

This is all very vague, though. The best (or at least, the most precise) way to think about knowledge progression is through hierarchical reinforcement learning. The idea there is that most non-trivial tasks are inherently hierarchical; and so learning how to do a task requires one to learn how to do its constituent sub-tasks. The more tasks you learn how to do, the greater the 'toolbox' you will be able to bring to subsequent tasks.

With regard to the original question, easier examples of some task induce the acquisition of the controllers (knowledge structures) that will aid the performance of later, more complicated, tasks, and the acquisition of later and more complicated knowledge structures. Oudeyer et al. (2007) and Barto (2009) describe this process in detail, the former using a situated robotic agent. (If we replace 'controllers' with 'rules' then the process becomes comparable to the rule search and chunking process used in ACT-R and Soar, as mentioned in Jeromy Anglim's answer)

The basic effect can be accounted for by connectionist models. See for example
Suret & McLaren (2002). To quote the abstract:

This paper details an associative model that is applied to human
learning on an artificial dimension. A variety of phenomena, including
peak-shift, transfer along a continuum and summation / generalization
are considered and simulation results are presented that give a close
fit to empirical data.

SVM training is typically done in a batch processing, and thus the order of data presentation doesn't matter. You should consider online learning algorithms, for example, the perceptron learning rule. These algorithms are in general stochastic gradient descent optimization procedures, and easy examples early on with larger learning step would be much more efficient (faster convergence towards the correct answer) than learning with difficult examples first.

The thing that makes most gradient descent algorithms learn faster at the beginning is simply the larger learning rate parameter, forced onto the algorithm, and not the difficulty of examples.
–
Ofri RavivDec 16 '12 at 21:23

@OfriRaviv Yes, the learning rate is often reduced over time, but it does interact with the order of presented data. In an extreme situation, you don't want to fall into a local minima where you can only solve one difficult case.
–
MemmingDec 28 '12 at 14:05

Interesting question. I've written up a discussion of the model-based training literature and how it relates structuring task difficulty with practice. That said, I feel it's only a start and my apologies that it is more pitched at cognitive tasks than perceptual tasks.

A summary of model-based training systems

Fu et al (2006) have a paper on real-time model-based training systems, in which they review some of the work on model-based training systems:

There has been a long history of applying cognitive theory of learning
and skill acquisition to model-based training systems (e.g., Anderson
et al., 1995; Graesser et al., 2004; Hill and Johnson, 1993; Sleeman
and Brown, 1982). The key idea of a model-based training system is
that instructions should be given based on a cognitive model of the
competence that the trainee is being asked to learn. In other words,
the cognitive model should incorporate the underlying skills that
allow the model to per- form the task the trainee is expected to
perform. Based on the model, the system can monitor actions of the
trainee and infer the intentions of the trainee by mapping actions of
the trainee to components of the model. In other words, a model of
com- petence provides an explanation of actions as trainees interact
with the system. Immediate feedback or real-time instructions can then
be given to the trainee to facilitate learning.

John R. Anderson and colleagues on cognitive tutors using the computation model ACT-R

Anderson et al (1995) summarise their work on cognitive tutors teaching LISP programming , geometry and algebra. Their system incorporates eight instructional design principles a few of which pertain to task difficulty.

First, let's look at the design principles:

represent student competence as a production set

communicate the goal structure underlying the problem solving

provide instruction in the problem-solving context

Promote an abstract understanding of the problem-solving knowledge

Minimise the working memory load

Provide immediate feedback on errors

Adjust the grain size of instruction with learning

Facilitate successive approximations to the target skill

I think principles 5 and 8 directly relate to structuring task difficulty with practice, and probably others relate more indirectly.
Minimising working memory (i.e., principle 5) requires providing instruction in manageable components.
Facilitating successive approximations (i.e., principle 8) involves progressively providing less support to the learning and is an example of increasing task difficulty with practice.

Hill Jr., R.W., Johnson, W.L., (1993). Designing an intelligent tutoring system based on a reactive model of skill acquisition. Proceedings of the International Conference on AI and Education, Edinburgh, 1993.