Localization and tracking of humans are essential research topics in robotics. In particular, Sound Source Localization (SSL) has been of great interest. Despite the numerous reported methods, SSL in a real environment had mainly three issues; robustness against noise with high power, no framework for selective listening to sound sources, and tracking of inactive and/or noisy sound sources. For the first issue, we extended Multiple SIgnal Classification by incorporating Generalized Eigen Value Decomposition (GEVD-MUSIC) so that it can deal with high power noise and can select target sound sources. For the second issue, we proposed Sound Source Identification (SSI) based on hierarchical Gaussian mixture models and integrated it with GEVD-MUSIC to realize a function to listen to a specific sound source according to the sort of the sound source. For the third issue, auditory and visual human tracking were integrated using particle filtering. These three techniques are integrated into an intelligent human tracking system. Experimental results showed that integration of SSL and SSI successfully achieved human tracking only by audition, and the audio-visual integration showed considerable improvement in tracking by compensating the loss of auditory or visual information.