Once the probes were fully advanced into the brain, we observed a decline in the compression force over time.

However, the compression force never decreased to zero.

This may indicate that chronically implanted probes experience a constant compression force when inserted in the brain, which may push the probe out of the brain over time if there is nothing to keep it in a fixed position.

Yet ... the Utah probe seems fine, up to many months in humans.

This may be a drawback for flexible probes [24], [25]. The approach to reduce tissue damage by reducing micromotion by not tethering the probe to the skull can also have this disadvantage [26]. Furthermore, the upward movement may lead to the inability of the contacts to record signals from the same neurons over long periods of time.

We did not observe a difference in initial insertion force, amount of dimpling, or the rest force after a 3-min rest period, but the force at the end of the insertion was significantly higher when inserting at 100 μm/s compared to 10 μm/s.

No significant difference in histological response observed between the two speeds.

Tissue damage, evaluated as the size of the hole left by the needle after retraction, bleeding, and tissue fracturing, was found to increase for increasing insertion speeds and was higher within white matter regions.

A statistically significant difference in hole areas with respect to insertion speed was found.

While there are no previous needle insertion speed studies with which to directly compare, previous electrode insertion studies have noted greater brain surface dimpling and insertion forces with increasing insertion speed [43–45]. These higher deformation and force measures may indicate greater brain tissue damage which is in agreement with the present study.

There are also studies which have found that fast insertion of sharp tip electrodes produced less blood vessel rupture and bleeding [28,29].

These differences in rate dependent damage may be due to differences in tip geometry (diameter and tip) or tissue region, since these electrode studies focus mainly on the cortex [28,29].

In the present study, hole measurements were small in the cortex, and no substantial bleeding was observed in the cortex except when it was produced during dura mater removal.

Any hemorrhage was observed primarily in white matter regions of the external capsule and the CPu.

Rapid deformation results in greater pressurization of fluid filled spaces if fluid does not have time to redistribute, making the tissue effectively stiffer. This may occur in compacted tissues below or surrounding the needle and result in increasing needle forces with increasing needle speed.

In general, experience replay can reduce the amount of experience required to learn, and replace it with more computation and more memory – which are often cheaper resources than the RL agent’s interactions with its environment.

Transitions (between states) may be more or less

surprising (does the system in question have a model of the environment? It does have a model of the state & action expected reward, as it's Q-learning.

Pose a useful example where the task is to learn (effectively) a random series of bits -- 'Blind Cliffwalk'. By choosing the replayed experiences properly (via an oracle), you can get an exponential speedup in learning.

Prioritized replay introduces bias because it changes [the sampled state-action] distribution in an uncontrolled fashion, and therefore changes the solution that the estimates will converge to (even if the policy and state distribution are fixed). We can correct this bias by using importance-sampling (IS) weights.

These weights are the inverse of the priority weights, but don't matter so much at the beginning, when things are more stochastic; they anneal the controlling exponent.

There are two ways of selecting (weighting) the priority weights:

Direct, proportional to the TD-error encountered when visiting a sequence.

Ranked, where errors and sequences are stored in a data structure ordered based on error and sampled &propto;1/rank.

Somewhat illuminating is how the deep TD or Q learning is unable to even scratch the surface of Tetris or Montezuma's Revenge.

Agent learns based on memory 'clips' which are combined using some pseudo-bayesian method to trigger actions.

These clips are learned from experience / observation.

Quote: "..more complex behavior seems to arise when an agent is able to “think for a while” before it “decides what to do next.” This means the agent somehow evaluates a given situation in the light of previous experience, whereby the type of evaluation is different from the execution of a simple reflex circuit"

BG analogous to the anterior forebrain pathway (AFP), which is necessary for song learning in young birds. Requires lots of practice and feedback. Studies suggest e.g. that neural activity in the AFP is correlated with song variability, and that the AFP can adjust ongoing activity in effector motor pathways.

DA activity, through action on D1 and D2 receptors on the 2 different types of MSN, affects the temporal difference learning scheme in which DA represents the difference between expectation and reality.

These neurons have a static 5-10 Hz firing rate, which can be modulated up or down. (Morris et al 2004).

"The model suggests that the chronic dopamine depletion in the striatum of PD patients is perceived as encoding a continuous state where reality is worse than predictions." Interesting theory.

Alternately, abnormal DA replacement leads to random organization of the cortico-striatal network, eventually leading to dyskinesia.

Recent human studies have found oscillatory neuronal correlation only in tremulous patients and raised the hypothesis that increased neuronal synchronization in parkinsonism is an epi-phenomenon of the tremor of independent oscillators with the same frequency (Levy et al 2000).

Hum. might be.

In rhesus and green monkey PD models, a major fraction of the primate pallidal cells develop both oscillatory and non-oscillatory pair-wise correlation

Their theory: current DBS methods overcome this probably by imposing a null spatio-temporal firing in the basal ganglia enabling the thalamo-cortical circuits to ignore and compensate for the problematic BG".

Deep Brain stimulation improves mobility/dexterity and dyskinesia of patients in general, via an increase in rate and decrease in reaction time, but it does not let the patient match force output to the object being manipulated (that is, the force is too large).

The excessive levels of grip force present in the stimulation 'off' state, and present from the early stages of the disease, however, were even more marked with STN stimulation on.

STN DBS may worsen the ability to match force characteristics to task requirements. (position control is improved?).

quite fascinating.

See also PMID-19266149[1]Distal and proximal prehension is differentially affected by Parkinson‘s disease The effect of conscious and subconscious load cues

asked PD and control patients to lift heavy and light objects.

While controls were able to normalize lift velocity with the help of both conscious and subconscious load cues, the PD patients could use neither form of cue, and retained a pathological overshoot in lift velocity.

Hence force control is remarkably affected in PD, which is consistent with the piper rhythm being absent / usually present for isometric contraction.

One of the pioneering studies of electrophysiology in awake behaving animals; single electrode juice reward headposting: many followed.

{960} looked at conduction velocity, which we largely ignore now -- most highly mylenated axons are silent during motor quiescence and show phasic activity during movement.

Lower conduction velocity PTNs show + and - FR modulations. Again from [5]

[6] showed that PTN activity preceded EMG activity, implying that it was efferent rather than afferent feedback that was controlling the fr. as expected.

task: wrist flexion & extension under load.

task in monkey's home cage for a period of three months; monkeys carried out 3000 trials or more of the task (must have had strong wrists!)

Head fixated the monkeys for about 10 days prior unit recordings; "The monkeys learned to be quite cooperative in reentering the chair in the morning, since entrance to the chair was rewarded by the fruit juice of their choice (grape, apple, or orange). Indeed, some monkeys continued to work even in the presence of free water!

Maybe I should give mango some Hawaiian punch as well?

Mesured antidromic responses with a permanent electrode in the ipsilateral medullary pyramid.

Used glass insulated platinum-iridium electrodes [11]

traces are clean, very clean. I wonder if good insulation (in this case, glass) has anything to do with it?

controlled for displacement by varying the direction of load; PTNs seem to directly control muscles.

Fire during acceleration and movement for no load

Fire during load and co-contraction when loaded.

FR also related to δF/δt : FR higher during a low but rising force than a high but falling force.

more than 100 PTN recorded from the precentral gyrus, but only 31' had clear and consistent relation to performance on the task.

16 units on extension loads, 7 units flexion loads

It was only one joint afterall..

Cells responding to the same movement (flexion or extension) were often founf on the same vertical electrode tract.

Very little response to joint position.

Very clean moculations -- neurons are almost silent if there is no force production; FR goes up to 50-80Hz.

Prior to the exp Evart expected a position tuning model, but saw clear evidence of force tuning.

Group 1 muscle afferents have now been shown to project to the motor cortex of both monkey [1] and cat [9]. Make sense, as if the ctx is to control force, it needs feedback regarding its production.

Caveats: many muscles were involved in the study, mainly due to postural effects, and having one or two controls poorly delineates what is going on in the motor ctx.

Plus, all the muscles controlling the figers come into play -- the manipulandum must be gripped firmly, esp to resist extension loads.

the responses of a substantial fraction of neurons in the primary visual cortex evolve from those that relate solely to the physical attributes of the stimuli to those that accurately predict the timing of reward.. wow!

rats. they put goggles on the rats to deliver full-fields retinal illumination for 400ms (isn't this cheating? full field?)

recorded from deep layers of V1

sensory processing does not seem to be reliable, stable, and reproducible...

rewarded only half of the trials, to see if the plasticity was a result of reward delivery or association of stimuli and reward.

after 5-7 sessions of training, neurons began to respond to the poststimulus reward time.

this was actually independent of reward delivery - only dependent on the time.

reward-related activity was only driven by the dominant eye.

individual neurons predict reward time quite accurately. (wha?)

responses continued even if the animal was no longer doing the task.

is this an artifact? of something else? what's going on? the suggest that it could be caused by subthreshold activity due to recurrent connections amplified by dopamine.

the result: reinforcement learning can function effectively in large populations of neurons if there is a trace of the population activity in addition to the reinforcement signal. this trace must be per-synapes or perhaps per-neuron (as has been anticipated for some time). very important result, helps with the 'specificity' problem.

in human terms, the standard reinforcement learning approach is analogous to having a class of students write an exam and being informed by the teacher on the next day whether the majority of students passed or not.

this learning method is slow and achieves limited fidelity; in contrast, behavioral reinforcement learning can be reliable and fast. (perhaps this is a result of already-existing maps and or activity in the cortex?)

reinforcement learning is almost the opposite of backpropagation, in that in backprop, a error signal is computed per neuron, while in reinforcement learning the error is only computed for the entire system. They posit that there must be a middle ground (need something less than one neuron to compute the training/error signal per neuron, othewise the system would not be very efficient...)

points out a good if obvious point: to learn from trial and error different responses to a given stimulus must be explored, and, for this, randomness in the neural activities provides a convenient mechanism.

they use the running mean as an eligibility trace per synapse. then change in weight = eta * eligibility trace(t), evaluated at the ends of trials.

implemented an asymmetric rule that updates the synapses only slightly if the output is reliable and correct.

also needed a population signal or fed-back version of the previous neural behavior. Then individual reinforcement is a product of the reinforcement signal * the population signal * the eligibility trace (the last per synapse). Roughly, if the population signal is different than the eligability trace, and the behavior is wrong, then that synapse should be reinforced. and vice-versa.

debugging your brain - how to discover what you don't understand. a very intelligent viewpoint, worth rereading + the comments. look at the data, stupid

quote: how to represent the problem is perhaps even more important in research since human brains are not as adept as computers at shifting and using representations. Significant initial thought on how to represent a research problem is helpful. And when it’s not going well, changing representations can make a problem radically simpler.

automated labeling - great way to use a human 'oracle' to bootstrap us into good performance, esp. if the predictor can output a certainty value and hence ask the oracle all the 'tricky questions'.

Quote: Machine learning is a victim of it’s common success. It’s hard to develop a learning algorithm which is substantially better than others. This means that anyone wanting to implement spam filtering can do so. Patents are useless here—you can’t patent an entire field (and even if you could it wouldn’t work).

Problem is that online course only imperfectly emulate the social environment of a college, which IMHO are useflu for cultivating diligence.

The unrealized potential of the research lab Quote: Muthu Muthukrishnan says “it’s the incentives”. In particular, people who invent something within a research lab have little personal incentive in seeing it’s potential realized so they fail to pursue it as vigorously as they might in a startup setting.

in primates, includes the medial caudate, which has been shown in fMRI to respond to reward prediction error. Neural activity in the caudate is attenuated when a monkey reaches optimal performance.

dorsal parts of the striatum (according to web: caudate, putamen, globus pallidus in primates) connect to the dorsal prefrontal and motor cortices

(according to them:) this corresponds to the putamen in primates. Activity in the putamen reflects performance but not learning.

activity in the putamen is highest after successful learning & accurate performance.

used muscimol (GABAa agonist, silences neural activity) and AP-5 (blocks NMDA based plasticity), in each of the target areas.

dorsal striatum is involved in performance but not learning

Injection of muscimol during acquisition did not impair test performance

Injection of muscimol during test phase did impair performance

Injection of AP-5 during acquisition had no effect.

in acquisition sessions, muscimol blocked instrumental response (performance); but muscimol only has a small effect when it was injected after rats perfected the task.

Idea: consistent behavior creates a stimulus-response association in extrastriatal brain areas, e.g. cerebral cortex. That is, the basal ganglia is the reinforcement signal, the cortex learns the association due to feedback-driven behavior? Not part of the habit system, but make and important contribution to goal-directed behavior.

This is consistent with the observation that behavior is initially goal driven but is later habitual.

Actually, other studies show that plasticity in the dorsal striatum may be detrimental to instrumental learning.

The number of neurons that fire just before the execution of a response is larger in the putamen than the caudate.

most deficits following dopamine-depleting lesions are not easily explained by a defective reward signal (e.g. parkinsons, huntingtons) -> implying that DA has two uses: the labeling of reward, that the tonic enabling of postsynaptic neurons.

I just anticipated this, which is good :)

It is still a mystery how the neurons in the midbrain determine to fire - the pathways between reward and behavior must be very carefully segregated, otherwise we would be able to self-simulate

the pure expectation part of it is bound play a part in this - if we know that a certain event will be rewarding, then the expectation will diminish DA release.

Internal Model Control is used in industry to predict future system states before they actually occur. for example, the fly-by-wire technique in aviation makes decisions to do particular manuvers based on predictable forthcoming states of the plane. (Like a human)

if you learn a reaction/reflex based on a conditioned stimulus, the presentation of that stimulus sets the internal state to that motivated to achieve the primary reward. there is a transfer back in time, which, generally, is what neural systems are for.

animals avoid foods that fail to influence important plasma/brain parameters, for example foods lacking essential amino acids like histidine, threonine, or methionine. In the case of food, the appearance/structure would be used to predict the slower plasma effects, and hence influence motivation to eat it. (of course!)

midbrain groups:

A8 = dorsal to lateral substantia nigra

A9 = pars compacta of substantia nigra, SNc

A10 = VTA, media to substantia nigra.

The characteristic polyphasic, relatively long impulses discharged at low frequencies make dpamine neurons easily distinguishable from other midbrain neurons.

The probability of choosing an alternative in a long sequence of repeated choices is proportional to the total reward derived from that alternative, a phenomenon known as Herrnstein's matching law.

We hypothesize that there are forms of synaptic plasticity driven by the covariance between reward and neural activity and prove mathematically that matching (alternative to reward) is a generic outcome of such plasticity

models for learning that are based on the covariance between reward and choice are common in economics and are used phenomologically to explain human behavior.

this model can be tested experimentally by making reward contingent not on the choices, but rather on the activity of neural activity.

Maximization is shown to be a generic outcome of synaptic plasticity driven by the sum of the covariances between reward and all past neural activities.

I just had dinner with Jesse, and the we had a good/productive discussion/brainstorm about algorithms, learning, and neurobio. Two things worth repeating, one simpler than the other:

1. Gradient descent / Newton-Rhapson like techniques should be tried with genetic algorithms. As of my current understanding, genetic algorithms perform an semi-directed search, randomly exploring the space of solutions with natural selection exerting a pressure to improve. What if you took the partial derivative of each of the organism's genes, and used that to direct mutation, rather than random selection of the mutated element? What if you looked before mating and crossover? Seems like this would speed up the algorithm greatly (though it might get it stuck in local minima, too). Not sure if this has been done before - if it has, edit this to indicate where!

2. Most supervised machine learning algorithms seem to rely on one single, externally applied objective function which they then attempt to optimize. (Rather this is what convex programming is. Unsupervised learning of course exists, like PCA, ICA, and other means of learning correlative structure) There are a great many ways to do optimization, but all are exactly that - optimization, search through a space for some set of weights / set of rules / decision tree that maximizes or minimizes an objective function. What Jesse and I have arrived at is that there is no real utility function in the world, (Corollary #1: life is not an optimization problem (**)) -- we generate these utility functions, just as we generate our own behavior. What would happen if an algorithm iteratively estimated, checked, cross-validated its utility function based on the small rewards actually found in the world / its synthetic environment? Would we get generative behavior greater than the complexity of the inputs? (Jesse and I also had an in-depth talk about information generation / destruction in non-linear systems.)

Put another way, perhaps part of learning is to structure internal valuation / utility functions to set up reinforcement learning problems where the reinforcement signal comes according to satisfaction of sub-goals (= local utility functions). Or, the gradient signal comes by evaluating partial derivatives of actions wrt Creating these goals is natural but not always easy, which is why one reason (of very many!) sports are so great - the utility function is clean, external, and immutable. The recursive, introspective creation of valuation / utility functions is what drives a lot of my internal monologues, mixed with a hefty dose of taking partial derivatives (see {780}) based on models of the world. (Stated this way, they seem so similar that perhaps they are the same thing?)

To my limited knowledge, there has been some work as of recent in the creation of sub-goals in reinforcement learning. One paper I read used a system to look for states that had a high ratio of ultimately rewarded paths to unrewarded paths, and selected these as subgoals (e.g. rewarded the agent when this state was reached.) I'm not talking about these sorts of sub-goals. In these systems, there is an ultimate goal that the researcher wants the agent to achieve, and it is the algorithm's (or s') task to make a policy for generating/selecting behavior. Rather, I'm interested in even more unstructured tasks - make a utility function, and a behavioral policy, based on small continuous (possibly irrelevant?) rewards in the environment.

Why would I want to do this? The pet project I have in mind is a 'cognitive' PCB part placement / layout / routing algorithm to add to my pet project, kicadocaml, to finally get some people to use it (the attention economy :-) In the course of thinking about how to do this, I've realized that a substantial problem is simply determining what board layouts are good, and what are not. I have a rough aesthetic idea + some heuristics that I learned from my dad + some heuristics I've learned through practice of what is good layout and what is not - but, how to code these up? And what if these aren't the best rules, anyway? If i just code up the rules I've internalized as utility functions, then the board layout will be pretty much as I do it - boring!

Well, I've stated my sub-goal in the form of a problem statement and some criteria to meet. Now, to go and search for a decent solution to it. (Have to keep this blog m8ta!) (Or, realistically, to go back and see if the problem statement is sensible).

(from abstract) The resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP.

This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker.

STDP is prevalent in the cortex ; however, it requires a second signal:

Their notes on the Fetz/Baker experiments: "Adjacent neurons tended to change their firing rate in the same direction, but also differential changes of directions of firing rates of pairs of neurons are reported in [17] (when these differential changes were rewarded). For example, it was shown in Figure 9 of [17] (see also Figure 1 in [19]) that pairs of neurons that were separated by no more than a few hundred microns could be independently trained to increase or decrease their firing rates."

Their result is actually really simple - there is no 'control' or biofeedback - there is no visual or sensory input, no real computation by the network (at least for this simulation). One neuron is simply reinforced, hence it's firing rate increases.

Fetz & later Schimdt's work involved feedback and precise control of firing rate; this does not.

This also does not address the problem that their rule may allow other synapses to forget during reinforcement.

They do show that exact spike times can be rewarded, which is kinda interesting ... kinda.

Tried a pattern classification task where all of the information was in the relative spike timings.

Had to run the pattern through the network 1000 times. That's a bit unrealistic (?).

The problem with all these algorithms is that they require so many presentations for gradient descent (or similar) to work, whereas biological systems can and do learn after one or a few presentations.

Next tried to train neurons to classify spoken input

Audio stimului was processed through a cochlear model

Maass previously has been able to train a network to perform speaker-independent classification.

Neuron model does, roughly, seem to discriminate between "one" and "two"... after 2000 trials (each with a presentation of 10 of the same digit utterance). I'm still not all that impressed. Feels like gradient descent / linear regression as per the original LSM.

A great many derivations in the Methods section... too much to follow.

learning to compensate for forces applied to the hand influenced how participants predicted target motion for interception.

subjects were trained on a robotic manipulandum that applied different force fields; they had to use the manipulandum to hit a accelerating target.

There were 3 force feilds: rightward, leftward, and null. Target accelerated left to right. Subjects with the rightward force field hit more targets than the null, and these more targets than the leftward force field. Hence motor knowledge of the environment (associated accelerations, as if there were wind or water current...) influenced how motion was perceived and acted upon.

perhaps there is a simple explanation for this (rather than their evolutionary information-sharing hypothesis): there exists a network that serves to convert visual-spatial coordinates into motor plans, and later muscle activations. The presence of a force field initially only affects the motor/muscle control parts of the ctx, but as training continues, the changes are propagated earlier into the system - to the visual system (or at least the visual-planning system). But this is a complicated system, and it's hard to predict how and where adaptation occurs.

they say that the only way to deal with reinforcement or general-type learning in a high-dimensional policy space defined by parameterized motor primitives are policy gradient methods.

article is rather difficult to follow; they do not always provide enough details (for me) to understand exactly what their equations mean. Perhaps this is related to their criticism that others's papers are 'ad-hoc' and not 'statistically motivated'

RL with good function-approximation methods for evaluating the value function or policy function solve many problems yet...

RL is bedeviled by the curse of dimensionality: the number of parameters grows exponentially with the size of a compact encoding of state.

Recent research has tackled the problem by exploiting temporal abstraction - decisions are not required at each step, but rather invoke the activity of temporally extended sub-policies. This is somewhat similar to a macro or subroutine in programming.

This is fundamentally similar to adding detailed domain-specific knowledge to the controller / policy.

Ron Parr seems to have made significant advances in this field with 'hierarchies of abstract machines'.

I'm still looking for a cognitive (predictive) extension to these RL methods ... these all are about extension through programmer knowledge.

They also talk about concurrent RL, where agents can pursue multiple actions (or options) at the same time, and assess value of each upon completion.

Next are partially observable markov decision processes, where you have to estimate the present state (belief state), as well as a policy. It is known that and optimal solution to this task is intractable. They propose using Hierarchal suffix memory as a solution ; I can't really see what these are about.

It is also possible to attack the problem using hierarchal POMDPs, which break the task into higher and lower level 'tasks'. Little mention is given to the even harder problem of breaking sequences up into tasks.

this type of model is essentially TD(0); it does not involve 'eligibility traces', but still is capable of learning.

remind us that these cells have been found, but there are many other different types of responses of dopmamine cells.

storage of these predictions involves the basolateral nuclei of the amygdala and the orbitofrontal cortex. (but how do these structures learn their expectations ... ?)

dopamine release is associated with motor effects that are species specific, like approach behaviors, that can be irrelevant or detrimental to the delivery of reward.

bonuses, for the authors = fictitious quantities added to rewards or values to ensure appropriate exploration.

resolution of DA activity ~ 50ms.

Romo & Schultz have found that there are phasic increases in DA activity to both rewarded and non-rewarded events/stimuli - something that they explain as 'generalization'. But - maybe it is something else? like a startle / get ready to move response?

They suggest that it is a matter of intermediate states where the monkey is uncertain as to what to do / what will happen. hum, not sure about this.

Orbitofrontal neurons showed three principal forms of reward-related activity during the performance of delayed response tasks,

responses to reward-predicting instructions,

activations during the expectation period immediately preceding reward and

responses following reward

above, reward-predicting stimulus in a dopamine neuron. Left: the animal received a small quantity of apple juice at irregular intervals without performing in any behavioral task. Right: the animal performed in an operant lever-pressing task in which it released a touch-sensitive resting key and touched a small lever in reaction to an auditory trigger signal. The dopamine neuron lost its response to the primary reward and responded to the reward-predicting sound.

learning-related changes occur significantly earlier in the striatum than the cortex in a cue-reversal task. she says that this is because the basal ganglia instruct the cortex. I rather think that they select output dimensions from that variance-generator, the cortex.

there is a strong hyperkinetic pathway that projects directly to the subthalamic nucleus from the motor cortex. this controls output of the inhibitor pathway (GPi)

GABA input from the GPi to the thalamus can induce rebound spikes with precise timing. (the outputs are therefore not only inhibitory).

striatal neurons have up and down states. recommended action: simultaneous on-line recording of dopamine release and spike activity.

interesting generalization: cerebellum = supervised learning, striatum = reinforcement learning. yet yet! the cerebellum has a strong disynaptic projection to the putamen. of course, there is a continuous gradient between fully-supervised and fully-reinforcement models. the question is how to formulate both in a stable loop.

striosomal = striatum to the SNc

http://en.wikipedia.org/wiki/Substantia_nigra SNc is not an disorganized mass: the dopamergic neurons from the pars compacta project to the cortex in a topological map, dopaminergic neurons of the fringes (the lowest) go to the sensorimotor striatum and the highest to the associative striatum

this is concerned with memory cells, cells that 'remember' or remain permanently changed after learning the force-field.

In the above figure, the blue lines (or rather vertices of the blue lines) indicate the firing rate during the movement period (and 200ms before); angular position indicates the target of the movement. The force-field in this case was a curl field where force was proportional to velocity.

Preferred direction of the motor cortical units changed when the preferred driection of the EMGs changed

evidence of encoding of an internal model in the changes in tuning properties of the cells.

"We demonstrate here that many of these cells show similar large continuously graded changes in discharge when the monkey compensates for inertial loads which pull the arm in 8 different directions"

the mean activity of the sample population under any condition of movement direction and load direction can be described reasonably well by a simple linear summation of the movement-related discharge without any loads, and the change in tonic activity of the population caused by the load, measured prior to movement

their data support the dual kinematics/dynamics encoding in the motor cortex.

cells are not tuned to the direction of the absolute force, but rather to the direction of both the visual cue and change in force (dF/dt) as measured using linear regressions in an isometric force task.

PMID-14610628[0] A critical evaluation of the force control hypothesis in motor control.

the target of this review is the inverse dynamics model of motor control, which is very successful in robots. however, it seems that the mammalian nervous system does things a bit more complicated than this.

they agree that motor learning is most likely the defining feature of the cortex (i think that the critical and essential element of the cortex is not what control solution it arrives at, but rather how it learns that solution given the anatomical connections development has endowed it with.

they also find issue with the failure to incorporate realistic spinal reflexes into inverse-dynamics models.

However, we find little empirical evidence that specifically supports the inverse dynamics or forward internal model proposals per se.

We further conclude that the central idea of the force control hypothesis--that control levels operate through the central specification of forces--is flawed.

Abstract: Numerous studies have found correlations between measures of neural activity, from single unit recordings to aggregate measures such as EEG, to motor behavior. Two general themes have emerged from this research: neurons are generally broadly tuned and are often arrayed in spatial maps. It is hypothesized that these are two features of a larger hierarchal structure of spatial and temporal transforms that allow mappings to procure complex behaviors from abstract goals, or similarly, complex sensory information to produce simple percepts. Much theoretical work has proved the suitability of this organization to both generate behavior and extract relevant information from the world. It is generally agreed that most transforms enacted by the cortex and basal ganglia are learned rather than genetically encoded. Therefore, it is the characterization of the learning process that describes the computational nature of the brain; the descriptions of the basis functions themselves are more descriptive of the brain’s environment. Here we hypothesize that learning in the mammalian brain is a stochastic maximization of reward and transform predictability, and a minimization of transform complexity and latency. It is probable that the optimizations employed in learning include both components of gradient descent and competitive elimination, which are two large classes of algorithms explored extensively in the field of machine learning. The former method requires the existence of a vectoral error signal, while the latter is less restrictive, and requires at least a scalar evaluator. We will look for the existence of candidate error or evaluator signals in the cortex and basal ganglia during force-field learning where the motor error is task-relevant and explicitly provided to the subject. By simultaneously recording large populations of neurons from multiple brain areas we can probe the existence of error or evaluator signals by measuring the stochastic relationship and predictive ability of neural activity to the provided error signal. From this data we will also be able to track dependence of neural tuning trajectory on trial-by-trial success; if the cortex operates under minimization principles, then tuning change will have a temporal relationship to reward. The overarching goal of this research is to look for one aspect of motor learning – the error signal – with the hope of using this data to better understand the normal function of the cortex and basal ganglia, and how this normal function is related to the symptoms caused by disease and lesions of the brain.