Abstract: Integration of goal-driven, top-down attention and image-driven, bottom-up attention is crucial for visual search. For instance, in robot navigation, it is important to detect goal-relevant targets like road signs and landmarks, and to simultaneously notice unexpected visual events like sudden obstacles and accidents. Yet, previous research has mostly focused on models that are purely top-down or bottom-up. Here, we propose a new model that combines both. The bottom-up component computes the visual salience of scene locations in different feature maps extracted at multiple spatial scales. The top-down component uses accumulated statistical knowledge of the visual features of the desired search target and background clutter, to optimally tune the bottom-up maps so as to maximize target detection speed. The results of testing on 600 artificial search arrays and 300 natural scenes show that the model's predictions are consistent with a large body of available literature on human psychophysics of visual search. The promising results suggest that our model may provide good approximation to how humans combine bottom-up and top-down cues such as to optimize visual search behavior.