Throughout our everyday routine we must make actions in the face of uncertainty. From a decision theoretic standpoint, optimal actions are those that maximize the value associated with the task. However, in order for humans to act optimally, it necessitates the brain has an accurate representation of both the reward and probability associated with each outcome. Previous research investigating how humans use value structure to perform reaching movements has exclusively focused on asymptotic performance, ignoring how this structure is learned. Therefore, this project investigates how value is learned by requiring subjects to reach to targets that appear after completing a portion of their movement towards the possible target locations. Since subjects have no information about the target at the beginning of the reach, their initial trajectories provide a way to quantify reach plans. Value is manipulated by varying either the probability or reward associated with each target. Subjects are awarded points for correctly acquiring the target, no points for reaching to the incorrect target, and are penalized points for taking too much time. Subjects receive bonus money after the experiment that is based on their point total, assuring that value structure in this paradigm has actual utility. Furthermore, we developed a model that learns through the subject's experience what initial biases result in maximal points. We can use the model to make predictions about the biases people should use and what experience is important for forming value estimates. The results show that as the difference in value between the targets increases, subjects' biases also increase at a rate that closely matches the maximum-point predictions. Moreover, changes in biases across trials are better predicted by recent experience, rather than global experience. Together, this suggests that people learn value structure through recent experience, and this knowledge is used to guide reach planning.